Macfoo

AI Agent Benchmark Dashboard

Testing the Finn AI Assistant

⏳

Loading Benchmark Data...

Fetching latest evaluation results from database

Provider Performance Trends

⏳

Loading Trend Data...

🎯

Category Scores

Detailed breakdown by test category

⚡

Response Time Analysis

Latency metrics and optimization insights

📊

Success by Question Type

Performance breakdown by query category

Side-by-side comparison of Finn browser agent vs direct LLM performance across comprehensive benchmarks.

Monitor Finn's improvement over time with detailed metrics, trends, and category-specific insights.

Dive into individual test results, grading rubrics, and performance patterns to optimize agent behavior.