Anthropic AI Leaderboard Tops LinkedIn Crosscheck

LinkedIn Labs is opening its AI model comparison platform, Crosscheck, to all U.S. members this week after a rollout that had been limited to Premium subscribers. Three AI models from Anthropic hold the top three positions on the platform’s live leaderboard, according to data captured from Crosscheck on May 21, 2026. The company plans to extend access to its more than 1.3 billion members worldwide in the coming months.

Claude Opus 4.6 leads the overall rankings with a score of 1,176.6, followed by Claude Sonnet 4.6 at 1,151.1 and Claude Opus 4.7 at 1,149.8. The leaderboard, which covers 23 AI models evaluated across all job titles and industries, shows the full top ten as follows:

AI Models on the Rise

LinkedIn Crosscheck Leaderboard · May 21, 2026

Rank	AI Model	Company	Score
1	Claude Opus 4.6	Anthropic	1,176.6
2	Claude Sonnet 4.6	Anthropic	1,151.1
3	Claude Opus 4.7	Anthropic	1,149.8
4	GLM 5	Z.ai	1,093.7
5	Gemini 3.5 Flash	Google	1,090.3
6	GPT-5.5	OpenAI	1,056.0
7	GLM 5.1	Z.ai	1,051.1
8	Nova 2 Lite	Amazon	1,049.1
9	GPT-5.4	OpenAI	1,034.7
10	Grok 4.3	xAI	1,032.5

Note: Scores reflect LinkedIn Crosscheck leaderboard data cited in the article. Rankings may shift as new model comparisons are recorded.

Scores shift continuously as new evaluations are recorded. Several models in the top ten showed point movements of between one and 20 points at the time the data was captured. Rankings reflect the platform’s state as of May 21, 2026.

The platform is designed to help workers find which AI model performs best for their specific job, not just which one scores well on general tests. A software engineer, a marketing manager, and a compliance officer may get very different results from the same AI model — and Crosscheck is built around that premise.

Users submit a work prompt, receive responses from two AI models at once, and pick the better one. LinkedIn calls each comparison a “battle.” The results feed into a leaderboard that currently filters by job role and industry. Filters for seniority, location, language, and prompt complexity are planned but not yet available, according to LinkedIn.

The platform is built on the Bradley-Terry statistical model, a method used in AI research to turn head-to-head comparisons into ranked scores. LinkedIn extended that base with three additions to handle the specific demands of professional evaluation.

The first addresses model updates. Because AI models are frequently revised, LinkedIn applies time-decay weighting to keep rankings current. A comparison from 90 days ago carries half the weight of one completed today. One from six months ago counts for a quarter. Older data stays in the system but loses influence gradually, rather than being removed and breaking the statistical chains that connect models that have never competed directly.

The second addresses thin data. When a professional segment has few recorded comparisons — say, legal officers in the technology industry — a model with a perfect early record could appear dominant based on a handful of results. LinkedIn applies a regularization adjustment that starts from a conservative assumption: treat any model as roughly average until the data says otherwise. A model needs a sustained record across many comparisons before it earns a top ranking in a niche segment.

The third addresses how rankings are reported. Instead of precise numeric positions, Crosscheck groups models into tiers. A new tier begins only when the gap between two models clears a 95% confidence threshold. When data is limited and scores are close, models share a tier rather than being separated by a ranking the evidence cannot support.

To determine which matchups need more data, the platform uses an active sampling system that steers evaluations toward uncertain or newly added pairings. LinkedIn cited prior research indicating this approach can reach reliable rankings with up to 35% fewer total comparisons.

Trust in the results depends partly on who is doing the evaluating. LinkedIn said it uses its professional identity verification and content moderation systems to reduce the risk of manipulation. Evaluators are identified members with career and education histories on file, and all prompts pass through the same abuse-detection systems used across the platform.

LinkedIn was direct about two risks it has not solved. AI providers could test multiple model versions privately and submit only the best performer, pushing up scores in ways that more data would not correct. Providers could also use access to the platform’s evaluation data to tune their models specifically for Crosscheck, producing gains that might not hold in real work. LinkedIn said it is developing submission policies to address the first issue and that its segmented design makes the second harder to pull off, though not impossible.

For now, the leaderboard supports filtering by role and industry. LinkedIn said remaining dimensions are built and ready but will only go live once enough evaluation data exists in each segment to produce reliable results.

Comments

VT Newsroom

A global media for the latest news, entertainment, music fashion, and more.

Anthropic Sweeps Top Three on LinkedIn’s New AI Leaderboard as Platform Opens to All U.S. Members

AI Models on the Rise

Latest news

Related news

Weekly News

LEAVE A REPLY Cancel reply

ABOUT US

FOLLOW US