The Version Shell Game

Anthropic

March 4, 2024

Felony

89.3K views

When Anthropic announced Claude 3 Opus in March 2024, their benchmark charts showed it decisively beating GPT-4. There was just one problem: they compared against the original GPT-4 from March 2023, not the GPT-4 Turbo variant that had been publicly available since November 2023.

Evidence: The Version Shell Game — EXHIBIT A • Click to enlarge

Chart source: Anthropic Claude 3 Model Card

The Violation

1
Compared Claude 3 Opus (March 2024) against original GPT-4 (March 2023), ignoring GPT-4 Turbo
2
GPT-4 Turbo had been available for 4 months at time of comparison
3
Charts labeled 'GPT-4' without specifying which version, implying most current
4
Omission made competitive gap appear larger than reality

Why This Matters

In fast-moving AI development, comparing against year-old baselines while ignoring current versions creates a false narrative of competitive advantage. When readers see 'GPT-4' they assume the current version, not a model from a year prior that had already been superseded.

Community Verdict

"Why is Anthropic comparing Claude 3 to GPT-4 and not GPT-4 Turbo? Turbo has been out since November. This seems deliberately misleading."
— OpenAI Community Forum discussion

"The model card doesn't specify GPT-4 vs GPT-4 Turbo. Given the timing, it's clearly the older version, which changes the story significantly."
— ML researcher on X/Twitter

The Defense

Imagined company response

"Anthropic's technical documentation did eventually clarify model versions in footnotes, but the main promotional charts and headlines emphasized 'beating GPT-4' without version specificity."

Related Crimes

OpenAI

Felony

The Mega Chart Screwup

xAI

Felony

The Missing cons@64

The Benchmark Bait and Switch

Google

Felony

The Arena Advantage

Spot a similar crime?

Help us document chart crimes in the wild