
Evaluator Types
Style
Character Count
Analyze response length and verbosity to ensure outputs meet specific length requirements.
Character Count Ratio
Measure the ratio of characters to the input to assess response proportionality and expansion.
Word Count
Ensure appropriate response detail level by tracking the total number of words in outputs.
Word Count Ratio
Measure the ratio of words to the input to compare input/output verbosity and expansion patterns.
Quality & Correctness
Answer Relevancy
Verify responses address the query to ensure AI outputs stay on topic and remain relevant.
Faithfulness
Detect hallucinations and verify facts to maintain accuracy and truthfulness in AI responses.
Answer Correctness
Evaluate factual accuracy by comparing answers against ground truth.
Answer Completeness
Measure how completely responses use relevant context to ensure all relevant information is addressed.
Topic Adherence
Validate topic adherence to ensure responses stay focused on the specified subject matter.
Semantic Similarity
Validate semantic similarity between expected and actual responses to measure content alignment.
Instruction Adherence
Measure how well the LLM response follows given instructions to ensure compliance with specified requirements.
Measure Perplexity
Measure text perplexity from logprobs to assess the predictability and coherence of generated text.
Uncertainty Detector
Generate responses and measure model uncertainty from logprobs to identify when the model is less confident in its outputs.
Conversation Quality
Evaluate conversation quality based on tone, clarity, flow, responsiveness, and transparency.
Context Relevance
Validate context relevance to ensure retrieved context is pertinent to the query.
Security & Compliance
PII Detection
Identify personal information exposure to protect user privacy and ensure data security compliance.
Profanity Detection
Flag inappropriate language use to maintain content quality standards and professional communication.
Sexism Detection
Detect sexist and discriminatory content.
Prompt Injection
Detect prompt injection attacks in user inputs.
Toxicity Detector
Detect toxic content including personal attacks, mockery, hate, and threats.
Secrets Detection
Monitor for credential and key leaks to prevent accidental exposure of sensitive information.
Formatting
SQL Validation
Validate SQL queries to ensure proper syntax and structure in database-related AI outputs.
JSON Validation
Validate JSON responses to ensure proper formatting and structure in API-related outputs.
Regex Validation
Validate regex patterns to ensure correct regular expression syntax and functionality.
Placeholder Regex
Validate placeholder regex patterns to ensure proper template and variable replacement structures.
Agents
Agent Goal Accuracy
Validate agent goal accuracy to ensure AI systems achieve their intended objectives effectively.
Agent Tool Error Detector
Detect errors or failures during tool execution to monitor agent tool performance.
Agent Flow Quality
Validate agent trajectories against user-defined natural language tests to assess agent decision-making paths.
Agent Efficiency
Evaluate agent efficiency by checking for redundant calls and optimal paths to optimize agent performance.
Agent Goal Completeness
Measure whether the agent successfully accomplished all user goals to verify comprehensive goal achievement.
Intent Change
Detect whether the user’s primary intent or workflow changed significantly during a conversation.


