Introduction

What You Can Do with Experiments

Building reliable LLM applications means knowing whether a new prompt, model, or change of flow actually makes things better.

Experiments in Traceloop provide teams with a structured workflow for testing and comparing results across different prompt, model, and evaluator checks, all against real datasets.

What You Can Do with Experiments

Run Multiple Evaluators

Execute multiple evaluation checks against your dataset

View Complete Results

See all experiment run outputs in a comprehensive table view with relevant indicators and detailed reasoning

Compare Experiment Runs Results

Run the same experiment across different dataset versions to see how it affects your workflow

Custom Task Pipelines

Add a tailored task to the experiment to create evaluator input. For example: LLM calls, semantic search, etc.

Evaluator Slugs Result Overview

⌘I

Learn

Self-host

Datasets

Playgrounds

Evaluators

Experiments

Monitoring

Prompt Management

Settings

Integrations

What You Can Do with Experiments

Run Multiple Evaluators

View Complete Results

Compare Experiment Runs Results

Custom Task Pipelines

Learn

Self-host

Datasets

Playgrounds

Evaluators

Experiments

Monitoring

Prompt Management

Settings

Integrations

​What You Can Do with Experiments

Run Multiple Evaluators

View Complete Results

Compare Experiment Runs Results

Custom Task Pipelines

What You Can Do with Experiments