Macfoo
AI Agent Benchmark Dashboard
Testing the Finn AI Assistant
⏳
Loading Benchmark Data...
Fetching latest evaluation results from database
Provider Performance Trends
⏳
Loading Trend Data...
🎯
Category Scores
Detailed breakdown by test category
Detailed Analytics
⚡
Response Time Analysis
Latency metrics and optimization insights
📊
Success by Question Type
Performance breakdown by query category
🤖 Agent Comparison
Side-by-side comparison of Finn browser agent vs direct LLM performance across comprehensive benchmarks.
📈 Progress Tracking
Monitor Finn's improvement over time with detailed metrics, trends, and category-specific insights.
🎯 Deep Analysis
Dive into individual test results, grading rubrics, and performance patterns to optimize agent behavior.