Evidence
Back to Cases
CASE #002

The Version Shell Game

Anthropic
March 4, 2024
Felony
89.3K views
Source

When Anthropic announced Claude 3 Opus in March 2024, their benchmark charts showed it decisively beating GPT-4. There was just one problem: they compared against the original GPT-4 from March 2023, not the GPT-4 Turbo variant that had been publicly available since November 2023.

Evidence: The Version Shell Game
EXHIBIT A • Click to enlarge

Chart source: Anthropic Claude 3 Model Card

The Violation

  1. 1

    Compared Claude 3 Opus (March 2024) against original GPT-4 (March 2023), ignoring GPT-4 Turbo

  2. 2

    GPT-4 Turbo had been available for 4 months at time of comparison

  3. 3

    Charts labeled 'GPT-4' without specifying which version, implying most current

  4. 4

    Omission made competitive gap appear larger than reality

Why This Matters

In fast-moving AI development, comparing against year-old baselines while ignoring current versions creates a false narrative of competitive advantage. When readers see 'GPT-4' they assume the current version, not a model from a year prior that had already been superseded.

Community Verdict

"Why is Anthropic comparing Claude 3 to GPT-4 and not GPT-4 Turbo? Turbo has been out since November. This seems deliberately misleading."

OpenAI Community Forum discussion

"The model card doesn't specify GPT-4 vs GPT-4 Turbo. Given the timing, it's clearly the older version, which changes the story significantly."

ML researcher on X/Twitter

The Defense

Imagined company response

"Anthropic's technical documentation did eventually clarify model versions in footnotes, but the main promotional charts and headlines emphasized 'beating GPT-4' without version specificity."

Spot a similar crime?

Help us document chart crimes in the wild

The Version Shell Game | Chart Crimes