Hub
    Docs
Try for Free

Benchmark Hub

Featured Benchmarks

VibeCode Arena

VibeCode Arena

🏢
BenchFlow
12
Pokemon Gym

Pokemon Gym

🏢
BenchFlow
56
JFK Arena

JFK Arena

🏢
BenchFlow
6
PaperBench

PaperBench

🏢
OpenAI
1
WebArena

WebArena

🏢
Carnegie Mellon University
0
SWE-Bench

SWE-Bench

🏢
Princeton NLP
0
RareBench

RareBench

🏢
chenxz1111
0
Bird-SQL

Bird-SQL

🏢
AlibabaResearch
0
MedQA-CS

MedQA-CS

🏢
Bio-NLP
0
WebCanvas

WebCanvas

🏢
iMeanAI
0
MMLU-Pro

MMLU-Pro

🏢
TIGER-AI-Lab
0

All Benchmarks

  • Hub
  • Contact
DiscordGitHubXLinkedIn
  • agent
  • code
  • commonsense
  • embedding
  • general
  • knowledge
  • language
  • long-context
  • multimodal
  • performance
  • reasoning
  • retrieval
  • safety
  • tool-calling
  • vision

All Benchmarks

16
  • 🏢
    kirk111Webcanvas
    agent
    Updated 7 months ago
    0
  • 🏢
    lilaobaUpload
    agent
    Updated 8 months ago
    0
  • 🏢
    Benchflowmedqa-cs
    agent
    Updated 9 months ago
    0
  • 🏢
    shireenchandMLE-Bench
    agent
    Updated 9 months ago
    0
  • 🏢
    BenchflowBird
    agent
    Updated 9 months ago
    0
  • 🏢
    BenchflowMMLU-PRO
    agent
    Updated 9 months ago
    0
  • 🏢
    BenchflowSwebench
    agent
    Updated 9 months ago
    0
  • 🏢
    BenchflowRarebench
    agent
    Updated 9 months ago
    0
  • 🏢
    BenchflowWebcanvas
    agent
    Updated 9 months ago
    0
  • 🏢
    Benchflow2BF-Webarena
    agent
    Updated 9 months ago
    0
  • 🏢
    BenchflowBF-Webarena2
    agent
    Updated 9 months ago
    0
  • 🏢
    lilaobaBF-Webarena
    agent
    Updated 9 months ago
    0
  • 🏢
    Benchflowwebarena
    agent
    Updated 9 months ago
    0
  • 🏢
    BenchFlowMMLU-Pro
    agent
    Updated 10 months ago
    0
  • 🏢
    xdotliswebench
    code
    Updated a year ago
    0
  • 🏢
    xdotliwebarena
    agent
    Updated a year ago
    0