Arena (arena.ai, also accessible at lmarena.ai) is an AI model evaluation platform that lets users compare AI models side by side through blind, anonymous testing. Users submit a prompt, two unknown models respond, and the user votes for the better answer — after which the models\' identities are revealed. These votes feed into public leaderboards used by researchers, developers, and enterprises to understand which AI models perform best on real-world tasks. On January 28, 2026, the platform officially rebranded from LMArena to Arena.
What Is Arena?
Arena originated as Chatbot Arena, an open research project launched in 2023 by UC Berkeley researchers. It became the most widely cited source of human-preference AI model rankings. The platform moved to its own domain lmarena.ai in September 2024, incorporated as an independent company in April 2025, and completed a $150 million Series A funding round in January 2026 at a $1.7 billion valuation. The commercial product, AI Evaluations, allows enterprises and AI labs to run structured evaluations through Arena\'s community. The platform is accessible via lmarena.ai and arena.ai.
Who Makes Arena?
Arena was founded by Anastasios Angelopoulos (CEO) and Wei-Lin Chiang (CTO), both UC Berkeley researchers, along with professor Ion Stoica — co-founder of Databricks. Investors include Andreessen Horowitz, Felicis Ventures, UC Investments, Lightspeed Venture Partners, and Kleiner Perkins. As of early 2026, Arena has over 5 million monthly active users across 150 countries and processes more than 60 million conversations per month.
Key Features
- Blind battle mode — Submit a prompt and compare two anonymous AI models simultaneously. Vote for the better response, then see which models you compared. Eliminates brand bias in AI evaluation
- Public leaderboards — Continuously updated Elo-based rankings across text, coding, vision, image generation, and other capability categories. Covers models from OpenAI, Google DeepMind, Anthropic, Meta, and others
- Direct model testing — Choose specific models to test yourself, outside of the anonymous battle format
- Multi-modal arenas — Separate evaluation tracks for text, web development, vision, text-to-image, and video (video support added January 2026)
- AI Evaluations (commercial) — Enterprise service that uses Arena\'s community to run structured, large-scale evaluations for model labs and businesses
- Pre-release model testing — AI labs use Arena to test unreleased models anonymously before public launch. Notable examples include GPT-5 ("summit") and Gemini 2.5 Flash Image ("Nano Banana")
Pricing
Source: lmarena.ai and arena.ai, verified March 2026.
- Free — Full access to all public arenas, leaderboards, and direct model testing. No account required for basic use
- AI Evaluations (enterprise) — Commercial service for structured model evaluations. Custom pricing based on scope. Contact Arena directly via arena.ai for enterprise access
Arena vs Competitors
Arena is the only major public platform that produces human-preference AI rankings through crowdsourced blind testing at scale. Alternatives like MMLU, HumanEval, and HELM are automated benchmarks that test specific capabilities but do not capture real-world user preference. Scale AI and Surge AI offer private human evaluation services but at much higher cost and without public leaderboards. For developers and teams that want a free, fast signal of which AI model performs best on real prompts, Arena has no direct equivalent.
Pros & Cons
Pros:
- Free access to the most widely cited AI model rankings
- Blind testing methodology removes brand bias from comparisons
- Covers text, coding, vision, image, and video evaluation categories
- Used by major AI labs for pre-release model testing and feedback
- 5M+ monthly users provide statistically significant ranking signals
Cons:
- Crowdsourced voting can be gamed — model providers have historically submitted tuned variants specifically for Arena performance
- Rankings reflect average user preference, which may not match specialized use cases (legal, medical, coding)
- Does not evaluate cost, latency, API reliability, or safety — only perceived quality
- No enterprise SLA or guaranteed model availability on the free platform
Who Should Use Arena?
Arena is useful for developers evaluating which AI model to use for a new project, researchers tracking the state of the art in language model capabilities, and product teams that want a fast, unbiased starting point before running internal evaluations. It is also valuable for anyone who wants to stay current on the AI model landscape without manually testing every new release. For production decisions, Arena rankings should be treated as a starting filter, not a final answer — always validate on your own data and use cases.
Bottom Line
Arena is the best free resource for understanding how AI models compare on real-world human preference. The blind battle format and continuously updated public leaderboards make it the most practical and unbiased starting point for AI model selection. Use it to narrow your shortlist, then run internal tests on your specific use case before committing.