AI is already having a seismic impression on how software program is written, with a lot of the grunt work of programming now carried out by swarms of brokers and subagents. However as builders experiment with new interfaces and kind components for human-AI collaboration, it’s develop into laborious for even essentially the most superior AI labs to maintain up.
The present development is for agentic software program growth — programs the place AI brokers can work independently on coding duties — epitomized by the Claude Code and Cowork apps. Within the meantime, OpenAI has been regularly constructing out its Codex software, which launched as a command line tool final April and expanded to a web interface one month later.
Now OpenAI is taking a serious step towards catching up. On Monday, the corporate launched a brand new macOS app for Codex, integrating most of the agentic practices which have develop into standard prior to now yr. The brand new app is designed to work with a number of brokers in parallel, integrating agent skills and different state-of-the-art workflows. The launch additionally comes lower than two months after the launch of GPT-5.2-Codex, OpenAI’s strongest coding mannequin, which the corporate hopes can be sufficient to tempt over Claude Code customers.
“For those who actually need to do subtle work on one thing complicated, 5.2 is the strongest mannequin by far,” CEO Sam Altman advised reporters on a press name. “Nevertheless, it’s been tougher to make use of, so taking that stage of mannequin functionality and placing it in a extra versatile interface, we predict goes to matter fairly a bit.”
Whereas Altman’s confidence in GPT-5.2 is comprehensible, coding benchmarks inform a extra difficult story. GPT-5.2 does maintain the top spot on TerminalBench (a check measuring how nicely AI handles command-line programming duties), at the very least as of press time. However brokers from Gemini 3 and Claude Opus have logged roughly equal scores — decrease, however throughout the margin of error of the benchmark. Outcomes from SWE-bench, one other coding benchmark that assessments AI’s capability to repair real-world software program bugs, are comparable, exhibiting no clear benefit for GPT-5.2. Nevertheless, agentic use instances have been tough to benchmark successfully, and state-of-the-art fashions can range considerably in consumer expertise.
The Codex app additionally comes with a spread of recent options that OpenAI says will assist it obtain parity or, in some instances, outpace the assorted Claude apps. The Codex app will permit for automations that may be set to run within the background on an computerized schedule, with outcomes positioned in a queue to be reviewed when the consumer returns. Customers also can choose totally different personalities for the agent — from pragmatic to empathetic — relying on their working model.
However for the corporate, the most important promoting level is the sheer pace of growth that’s made doable by AI. “You should use this from a clear sheet of paper, model new, to make a very fairly subtle piece of software program in a couple of hours,” Altman mentioned. “As quick as I can sort in new concepts, that’s the restrict of what can get constructed.”
Techcrunch occasion
Boston, MA
|
June 23, 2026
Thanks for studying! Be a part of our neighborhood at Spectator Daily


















