Benchmarking / Monitoring
GitHub - langfuse/langfuse: 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23 ⭐ 24k
-
event is the basic building block. An event is used to track discrete events in a trace.
-
span represents durations of units of work in a trace.
-
generation logs generations of AI models incl. prompts, token usage and costs.
-
agent decides on the application flow and can for example use tools with the guidance of a LLM.
-
tool represents a tool call, for example to a weather API.
-
chain is a link between different application steps, like passing context from a retriever to a LLM call.
-
retriever represents data retrieval steps, such as a call to a vector store or a database.
-
evaluator represents functions that assess relevance/correctness/helpfulness of a LLM's outputs.
-
embedding