Final month, I wrote about Mercor’s new benchmark measuring AI brokers’ capabilities on skilled duties like legislation and company evaluation. On the time, the scores had been fairly dismal, with each main lab scoring below 25%, so we concluded attorneys had been secure from AI displacement, a minimum of for now.
However AI capabilities can change so much in a few weeks.
This week’s launch of Opus 4.6 shook up the leaderboards, with Anthropic’s new mannequin scoring simply shy of 30% in one-shot trials, and a mean of 45% when given a couple of extra cracks on the downside. Notably, the discharge included a bunch of latest agentic options, together with “agent swarms,” which can have helped with this sort of multi-step problem-solving.
Regardless, the rating is a large leap from the earlier state-of-the-art, and an indication that progress on basis fashions isn’t slowing down. Mercor CEO Brendan Foody, who was significantly impressed, mentioned, “leaping from 18.4% to 29.8% in a couple of months is insane.”
Thirty % remains to be a great distance from 100%, so it’s not like attorneys should be frightened about getting changed by machines subsequent week. However they need to be so much much less assured than they had been final month!
Thanks for studying! Be a part of our neighborhood at Spectator Daily
















