Langfuse just got faster →
DocsEvaluationExperimentsExperiments Data Model

Experiments Data Model

This page describes the data model for experiment-related objects in Langfuse. For an overview of how these objects work together, see the Concepts page. For score and score config objects, see the Scores data model.

For detailed reference please refer to

Objects

Datasets

Datasets are a collection of inputs and, optionally, expected outputs that can be used during Dataset runs.

Datasets are a collection of DatasetItems.

Dataset object

AttributeTypeRequiredDescription
idstringYesUnique identifier for the dataset
namestringYesName of the dataset
descriptionstringNoDescription of the dataset
metadataobjectNoAdditional metadata for the dataset
remoteExperimentUrlstringNoWebhook endpoint for triggering experiments
remoteExperimentPayloadobjectNoPayload for triggering experiments

DatasetItem object

AttributeTypeRequiredDescription
idstringYesUnique identifier for the dataset item. Dataset items are upserted on their id. Id needs to be unique (project-level) and cannot be reused across datasets.
datasetIdstringYesID of the dataset this item belongs to
inputobjectNoInput data for the dataset item
expectedOutputobjectNoExpected output data for the dataset item
metadataobjectNoAdditional metadata for the dataset item
sourceTraceIdstringNoID of the source trace to link this dataset item to
sourceObservationIdstringNoID of the source observation to link this dataset item to
statusDatasetStatusNoStatus of the dataset item. Defaults to ACTIVE for newly created items. Possible values: ACTIVE, ARCHIVED

DatasetRun (Experiment Run)

Dataset runs are used to run a dataset through your LLM application and optionally apply evaluation methods to the results. This is often referred to as Experiment run.


DatasetRun object

AttributeTypeRequiredDescription
idstringYesUnique identifier for the dataset run
namestringYesName of the dataset run
descriptionstringNoDescription of the dataset run
metadataobjectNoAdditional metadata for the dataset run
datasetIdstringYesID of the dataset this run belongs to

DatasetRunItem object

AttributeTypeRequiredDescription
idstringYesUnique identifier for the dataset run item
datasetRunIdstringYesID of the dataset run this item belongs to
datasetItemIdstringYesID of the dataset item to link to this run
traceIdstringYesID of the trace to link to this run
observationIdstringNoID of the observation to link to this run

Most of the time, we recommend that DatasetRunItems reference TraceIDs directly. The reference to ObservationID exists for backwards compatibility with older SDK versions.

End to End Data Relations

An experiment can combine a few Langfuse objects:

  • DatasetRuns (or Experiment runs) are created by looping through all or selected DatasetItems of a Dataset with your LLM application.
  • For each DatasetItem passed into the LLM application as an Input a DatasetRunItem & a Trace are created.
  • Optionally Scores can be added to the Traces to evaluate the output of the LLM application during the DatasetRun.

See the Concepts page for more information on how these objects work together conceptually. See the observability core concepts page for more details on traces and observations. See the Scores data model for more details on score and score config objects.

Function Definitions

When running experiments via the SDK, you define task and evaluator functions. These are user-defined functions that the experiment runner calls for each dataset item. For more information on how experiments work conceptually, see the Concepts page.

Task

A task is a function that takes a dataset item and returns an output during an experiment run.

See SDK references for function signatures and parameters:

Evaluator

An evaluator is a function that scores the output of a task for a single dataset item. Evaluators receive the input, output, expected output, and metadata, and return an Evaluation object that becomes a Score in Langfuse.

See SDK references for function signatures and parameters:

Run Evaluator

A run evaluator is a function that assesses the full experiment results and computes aggregate metrics. When run on Langfuse datasets, the resulting scores are attached to the dataset run.

See SDK references for function signatures and parameters:

For detailed usage examples of tasks and evaluators, see Experiments via SDK.

Local Datasets

Currently, if an Experiment via SDK is used to run experiments on local datasets, only traces are created in Langfuse - no dataset runs are generated. Each task execution creates an individual trace for observability and debugging.

We have improvements on our roadmap to support similar functionality such as run overviews, comparison views, and more for experiments on local datasets as for Langfuse datasets.


Was this page helpful?