SDK Initialization
First, initialize the Traceloop SDK.Make sure you’ve created an API key and set it as an environment variable
TRACELOOP_API_KEY
before you start. Check out the SDK’s getting started
guide for more information.Basic Experiment Structure
An experiment consists of:- A dataset to test against
- A task function that defines what your AI system should do
- Evaluators to measure performance
Task Functions
Create a task function that define how your AI system processes each dataset row. The task is one of the experiments parameters, it will run it on each dataset row. The task function signature expects:- Input: An optional dictionary containing the dataset row data
- Output: A dictionary with your task results
Ensure that the evaluator input schema variables are included in the task output dictionary.
You can add extra attributes to the task output even if they are not evaluator input parameters—these will also be logged to the platform.
Running Experiments
Use theexperiment.run()
method to execute your experiment by selecting a dataset as the source data, choosing the evaluators to run, and assigning a slug to make it easy to rerun later.
experiment.run()
Parameters
dataset_slug
(str): Identifier for your datasetdataset_version
(str): Version of the dataset to use, experiment can only run on a published versiontask
(function): Async function that processes each dataset rowevaluators
(list): List of evaluator slugs to measure performanceexperiment_slug
(str): Unique identifier for this experimentstop_on_error
(boolean): Whether to stop on first error (default: False)wait_for_results
(boolean): Whether to wait for async tasks to complete, when not waiting the results will be found in the ui (default: True)