Create custom evaluator

curl --request POST \
  --url https://api.traceloop.com/v2/evaluators/custom-evaluator \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "name": "<string>",
  "provider": "<string>",
  "model": "<string>",
  "messages": [
    {
      "role": "<string>",
      "content": "<string>"
    }
  ],
  "input_schema": [
    {
      "name": "<string>",
      "type": "string",
      "description": "<string>",
      "label": "<string>",
      "enum_values": [
        "<string>"
      ]
    }
  ],
  "output_schema": [
    {
      "name": "<string>",
      "type": "string",
      "description": "<string>",
      "label": "<string>",
      "enum_values": [
        "<string>"
      ]
    }
  ],
  "slug": "<string>",
  "description": "<string>",
  "temperature": 123,
  "top_p": 123,
  "max_tokens": 123,
  "frequency_penalty": 123,
  "presence_penalty": 123,
  "stop": [
    "<string>"
  ]
}
'

{
  "evaluator_id": "<string>",
  "version_id": "<string>",
  "slug": "<string>",
  "type": "<string>"
}

Evaluators

Create custom evaluator

A custom evaluator requires three main components: a prompt, an input schema, and an output schema.

Prompt

The prompt serves as an LLM-as-a-judge, instructing the model to evaluate your data against the criteria you define. Use Jinja2 templating syntax {{var_name}} to reference variables from your input schema.

Input Schema

Defines the variables available to your prompt template. Each variable declared here can be referenced in your prompt using {{var_name}}.

Output Schema

Defines the structure of the evaluation result. The model returns its assessment as structured output matching this schema.

Additionally, you’ll specify an LLM provider, model, and any provider-specific settings.

POST

evaluators

custom-evaluator

Create custom evaluator

curl --request POST \
  --url https://api.traceloop.com/v2/evaluators/custom-evaluator \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "name": "<string>",
  "provider": "<string>",
  "model": "<string>",
  "messages": [
    {
      "role": "<string>",
      "content": "<string>"
    }
  ],
  "input_schema": [
    {
      "name": "<string>",
      "type": "string",
      "description": "<string>",
      "label": "<string>",
      "enum_values": [
        "<string>"
      ]
    }
  ],
  "output_schema": [
    {
      "name": "<string>",
      "type": "string",
      "description": "<string>",
      "label": "<string>",
      "enum_values": [
        "<string>"
      ]
    }
  ],
  "slug": "<string>",
  "description": "<string>",
  "temperature": 123,
  "top_p": 123,
  "max_tokens": 123,
  "frequency_penalty": 123,
  "presence_penalty": 123,
  "stop": [
    "<string>"
  ]
}
'

{
  "evaluator_id": "<string>",
  "version_id": "<string>",
  "slug": "<string>",
  "type": "<string>"
}

Authorizations

Authorization

string

header

required

Type "Bearer" followed by a space and JWT token.

Body

application/json

Custom evaluator configuration

name

string

required

Display name of the evaluator

provider

string

required

LLM provider (e.g., openai, anthropic)

model

string

required

Model to use for evaluation

messages

object[]

required

Prompt messages for the LLM judge

Show child attributes

input_schema

object[]

required

Schema defining evaluator inputs

Show child attributes

output_schema

object[]

required

Schema defining evaluator outputs

Show child attributes

slug

string

URL-safe identifier (auto-generated if not provided)

description

string

Description of what the evaluator does

temperature

number

Temperature setting for the LLM

top_p

number

Top P (nucleus sampling) setting

max_tokens

integer

Maximum tokens in the response

frequency_penalty

number

Frequency penalty to reduce repetition

presence_penalty

number

Presence penalty for topic diversity

stop

string[]

Stop sequences

Response

evaluator_id

string

Unique identifier for the created evaluator

version_id

string

Version identifier

slug

string

URL-safe slug for the evaluator

type

string

Type of the evaluator

Create a new organization Execute char-count evaluator

⌘I

API Reference

Tracing

GDPR & Privacy

Costs

Warehouse

Organizations

Evaluators

Metrics

Auto-monitor-setups

Create custom evaluator

Authorizations

Body

Response