A custom evaluator requires three main components: a prompt, an input schema, and an output schema.
Prompt
The prompt serves as an LLM-as-a-judge, instructing the model to evaluate your data against the criteria you define. Use Jinja2 templating syntax {{var_name}} to reference variables from your input schema.
Input Schema
Defines the variables available to your prompt template. Each variable declared here can be referenced in your prompt using {{var_name}}.
Output Schema
Defines the structure of the evaluation result. The model returns its assessment as structured output matching this schema.
Additionally, you’ll specify an LLM provider, model, and any provider-specific settings.
Type "Bearer" followed by a space and JWT token.
Custom evaluator configuration
Display name of the evaluator
LLM provider (e.g., openai, anthropic)
Model to use for evaluation
Prompt messages for the LLM judge
Schema defining evaluator inputs
Schema defining evaluator outputs
URL-safe identifier (auto-generated if not provided)
Description of what the evaluator does
Temperature setting for the LLM
Top P (nucleus sampling) setting
Maximum tokens in the response
Frequency penalty to reduce repetition
Presence penalty for topic diversity
Stop sequences