Evaluation Overview
Aggregate metrics across all models and evaluation runs
Semantic Similarity
90.5%
+1.8%vs last week
Hallucination
4.9%
-0.9%vs last week
Tone Alignment
86.9%
+2.1%vs last week
Business Accuracy
89.5%
+1.4%vs last week
Safety Compliance
94.1%
+0.7%vs last week
Performance Trend
Recent Eval Runs
Customer Support Q&A Eval
completed2/11/2026500 tests91.2% pass
Medical FAQ Hallucination Check
completed2/10/2026200 tests96.5% pass
Legal Compliance Audit
completed2/10/2026350 tests88.9% pass
Sales Tone Calibration
running2/9/2026150 tests
Product Description Accuracy
failed2/9/2026100 tests