MacFoo LogoMacFoo

Macfoo

AI Agent Benchmark Dashboard

Testing the Finn AI Assistant

Loading Benchmark Data...
Fetching latest evaluation results from database

Provider Performance Trends

Loading Trend Data...
🎯
Category Scores
Detailed breakdown by test category

Detailed Analytics

Response Time Analysis
Latency metrics and optimization insights
📊
Success by Question Type
Performance breakdown by query category

🤖 Agent Comparison

Side-by-side comparison of Finn browser agent vs direct LLM performance across comprehensive benchmarks.

📈 Progress Tracking

Monitor Finn's improvement over time with detailed metrics, trends, and category-specific insights.

🎯 Deep Analysis

Dive into individual test results, grading rubrics, and performance patterns to optimize agent behavior.