API Agent Evaluation Dashboard
Compare LLM performance across evaluation datasets
Select Reports Folder
Run:
--
▼
Select Evaluation Run
×
Date
Models
Datasets
Test Cases
File Viewer
×
Leaderboard
Test Results
Dataset
All
LLM Model
All
Eval Method
All
Status
All
Score
All
Pass (≥ 0.8)
Partial (0.5 – 0.8)
Fail (< 0.5)
Test ID
▲▼
Test Case Name
▲▼
Dataset
▲▼
LLM Model
▲▼
Eval Method
▲▼
Status
▲▼
Score
▲▼
Criteria Met
▲▼