Anthropic Benchmarked Its Latest AI Model Using Pokémon

Sabnam Hossain

9 months ago

Anthropic Benchmarked Its Latest AI Model Using Pokémon

Anthropic claimed to have tested its most recent model, the Claude 3.7 Sonnet, on the Game Boy classic Pokémon Red in a blog post. The model can play Pokémon continually because the manufacturer gave it basic memory, screen pixel input, and function calls to press buttons and move across the screen.

Claude 3.7 Sonnet’s capacity for “extended thinking” is one of its distinctive qualities. Like DeepSeek’s R1 and OpenAI’s o3-mini, Claude 3.7 Sonnet may “reason” through difficult problems by using more computer power, but at the expense of more time.

That came in handy in Pokémon Red.

Claude 3.7 Sonnet defeated three Pokémon gym leaders and earned their badges, in contrast to Claude 3.0 Sonnet, a prior iteration of Claude that was unable to leave the house in Pallet Town, where the story starts.

It’s unclear how much processing power and how long it took Claude 3.7 Sonnet to achieve those achievements. According to Anthropic, the model completed 35,000 actions in order to get to Surge, the final gym leader.

Before time, an entrepreneurial developer will undoubtedly discover the truth.

More than anything else, Pokémon Red is a milestone for toys. Nonetheless, games have long been used as benchmarks for artificial intelligence. Several new platforms and applications have emerged in the last few months to assess models’ gaming skills in games like Pictionary and Street Fighter.

Read Also:

News Source