On Thursday, OpenAI released GPT-5.4, a brand new basis mannequin billed as “our most succesful and environment friendly frontier mannequin for skilled work.” Along with the usual model, GPT-5.4 can be out there as a reasoning mannequin (GPT-5.4 Considering) or optimized for top efficiency (GPT-5.4 Professional).
The API model of the mannequin can be out there with context home windows as massive as 1 million tokens, by far the most important context window out there from OpenAI.
OpenAI additionally emphasised improved token effectivity, saying GPT-5.4 was in a position to remedy the identical issues with considerably fewer tokens than its predecessor.
The brand new mannequin comes with considerably improved benchmark outcomes, together with file scores in pc use benchmarks OSWorld-Verified and WebArena Verified. The brand new mannequin additionally scored a file 83% on OpenAI’s GDPval check for information work duties.
GPT-5.4 additionally took the lead on Mercor’s APEX-Brokers benchmark, designed to check skilled abilities in legislation and finance, in keeping with a press release from Mercor CEO Brendan Foody.
“[GPT-5.4] excels at creating long-horizon deliverables akin to slide decks, monetary fashions, and authorized evaluation,” Foody stated within the assertion, “delivering prime efficiency whereas working quicker and at a decrease value than aggressive frontier fashions.”
GPT-5.4 continues the corporate’s efforts to restrict hallucinations and factual errors. OpenAI stated the brand new mannequin was 33% much less prone to make errors in particular person claims when in comparison with GPT 5.2, and total responses had been 18% much less prone to include errors.
Techcrunch occasion
San Francisco, CA
|
October 13-15, 2026
As a part of the launch, OpenAI has reworked how the API model of GPT-5.4 manages device calling, introducing a brand new system known as Instrument Search. Beforehand, system prompts would lay out definitions for all out there instruments when calling the mannequin — a course of that would devour a number of tokens because the variety of out there instruments grew. The brand new system permits fashions to search for device definitions as wanted, leading to quicker and cheaper requests in techniques with many out there instruments.
OpenAI has additionally included a new safety evaluation to check its fashions’ chain-of-thought, the working commentary given by the fashions to indicate thought course of by multi-step duties. AI security researchers have lengthy apprehensive that reasoning fashions may misrepresent their chain-of-thought, and testing shows it could occur underneath the correct circumstances.
OpenAI’s new analysis exhibits that deception is much less prone to occur within the Considering model of GPT-5.4, “suggesting that the mannequin lacks the flexibility to cover its reasoning and that CoT monitoring stays an efficient security device.”
Thanks for studying! Be a part of our neighborhood at Spectator Daily


















