AgentLens

Agent Debugger Prompt Debugger Compliance

Evaluation-first monitoring — worst responses surface first

Total Calls (24 h)
Avg Quality (24 h)
Total Cost (24 h)
Bad Response Rate

Quality Trend (last 24 h)

Bad Response Categories

Cost ↔ Quality by Model

Root-Cause: Worst Prompts

Worst Responses

Quality Hall. Flags Model Output preview Why