On December 11, OpenAI unveiled GPT-5.2, the company's new series of flagship AI models: Instant, Thinking, and Pro, less than a month after the release of GPT-5.1. OpenAI positions GPT 5.2 as the model that has been specifically designed to deliver economic value in real-world workflows. GPT-5.2 excels at creating spreadsheets, presentations, writing code, perceiving images, understanding long contexts, using tools, and managing complex multi-step tasks. Instant is faster for writing and searching for information. Thinking is better suited for structured work, such as programming. Pro is suitable for more complex tasks.
Earlier this month, OpenAI CEO Sam Altman declared in an internal memo, “code red”, requesting that internal resources be directed to accelerate improvements to ChatGPT. The reason is simple - OpenAI was once considered the undisputed leader in AI development. However, it now faces increased competition from Google and Anthropic, which have recently introduced their own powerful models. Google's Gemini 3 model has received rave reviews for its reasoning and coding abilities. The launch of GPT-5.2 is part of OpenAI’s strategy to attract more business customers and increase revenue.
A key feature of the release is the model’s performance on GDPval, a new benchmark designed to assess AI's ability to perform real-world work across 44 different occupations. Models create presentations, fill out Excel spreadsheets, write Word documents, and even render videos - tasks that may seem simple, but they are specifically the ones that become the first steps toward widespread AI adoption.
According to OpenAI, GPT-5.2 Thinking is the first model to perform at or above the level of a human expert. In blind comparisons, evaluated by industry professionals, meaning that the people evaluating the answers did not know which answers came from AI and which came from human experts, GPT-5.2 Thinking outperformed or matched leading experts on 70.9% of tasks. Tasks included creating complex spreadsheets, developing presentations, and preparing technical documentation. OpenAI notes that the model performes these tasks more than 11 times faster and costs less than 1% of what human experts would cost.
For software engineers, OpenAI says that GPT-5.2 Thinking has set a new bar for quality.
On SWE-Bench Pro, one of the most rigorous real-world tests of software engineering skills that spans multiple programming languages and complex codebases development, GPT-5.2 Thinking solves 55.6% of tasks, compared to 50.8% for GPT-5.1. On the simplified SWE-Bench Verified, it solves 80%. This dynamic indicates a reduction in the need for manual patch refinement and increased stability when interacting with large repositories. In essence, the model turns from a support assistant into a tool capable of fixing bugs and implementing features almost autonomously.
Early testers highlight progress in front-end performance. The system demonstrates the ability to generate complex interfaces, including non-trivial 3D elements and UI components, from a single prompt. GPT-5.2 is increasingly being viewed as a comprehensive solution for full-stack tasks, going beyond a simple code generator.
The model has also received improved visual capabilities. OpenAI claims that the model now interprets visual data more accurately, including technical diagrams, dashboards, graphs, and user interfaces. The number of errors in GUI recognition and analysis has been reduced by almost half. Tasks requiring diagram reading and process explanations are now performed with increased accuracy. In addition, OpenAI claims that the new Thinking model hallucinates 30% less than its predecessor. According to OpenAI, the model demonstrates better “long-horizon reasoning indicators”.
GPT-5.2 Thinking performs significantly better with contexts up to 256k tokens. In tests using the MRCRv2 benchmark, which evaluates a model’s ability to integrate information across long documents, GPT‑5.2 Thinking sets a leading performance benchmark. It extracts the required information almost perfectly, even when it's buried in hundreds of thousands of tokens. It means it is now possible to load long contracts, multi-file projects, large reports, or correspondence, and the model doesn't lose track, and responses remain consistent. Additionally, a new compact mode has been introduced. It allows the system to function and “think” outside the standard window, which is critical for implementing long-term agent scenarios.
A significant improvement in predictability is observed when using external tools. On the Tau2-bench Telecom benchmark, GPT-5.2 achieves a 98.7% tool success rate. Even in fast mode (reasoning.effort='none'), the accuracy increased dramatically. Several corporate clients have already reported the possibility of optimizing their architectures: they are replacing a set of highly specialized small agents with a single "mega-agent" managing more than 20 tools.
And finally, the ARC Prize organization has already published the results of its ARC-AGI, which are considered challenging tests of abstract thinking and are often referred to as the "AGI exam." ARC-AGI tests the AI's ability to transfer acquired skills to similar tasks. First, the models analyse two visual puzzles in the format "condition - correctly solved version". The AI's task is to deduce the rule that was used to solve these puzzles and then use it to solve the third one. People solve such puzzles relatively easily, whereas AI struggled with them until recently.
The model ranked first in both ARC-AGI-1 and ARC-AGI-2. A year ago, the ARC Prize tested O3 (High), and it showed 88%, but the cost was $4500 per task! Now in ARC-AGI-1, the model scores 90.5% and the cost is $11.64 per task.
Pricing and API Access
The GPT-5.2 model is already integrated into the ChatGPT interface and is available to users with Plus, Pro, Business, and Enterprise subscriptions from the launch date. Developers can access it through the API under the identifiers gpt-5.2 and gpt-5.2-chat-latest. OpenAI will not be disabling older GPT-5.1 models for the time being, they will remain available for another three months.
The API pricing has been revised upwards compared to version 5.1:
Input tokens: $1.75 per 1 million tokens.
Output tokens: $14 per 1 million tokens.
Cached input tokens: a 90% discount is provided.
OpenAI emphasizes that despite the increased pricing, the final business costs may be lower. These savings are achieved by reducing the number of "unnecessary" tokens and iterations: GPT-5.2 performs the same amount of work faster and requires fewer refinements.
Bottom Line
In the face of rising competition, GPT-5.2 shows that OpenAI is doubling down on performance, predictability, and enterprise value. While pricing has increased, the efficiency improvements - fewer iterations, fewer mistakes, and stronger autonomy- suggest that total operational costs may ultimately decrease. As enterprise adoption accelerates and benchmarks continue to evolve, GPT-5.2 sets a new performance baseline that raises expectations for the next generation of AI systems.
FAQ
When was GPT-5.2 launched?
GPT-5.2 was officially released on December 11, 2025.
What versions of GPT-5.2 are available?
GPT-5.2 comes in three main versions. Instant is optimized for fast writing and information retrieval. Thinking is designed for structured work like coding and complex tasks. Pro is the most powerful, for advanced, multi-step workflows and professional use.
What are the main differences between GPT-5.1 and GPT-5.2?
GPT-5.2 shows major improvements in reasoning, long-context handling, tool use, visual interpretation, and real-world workflow automation. It performs tasks faster, with fewer hallucinations, and requires fewer refinements.
Who can access GPT-5.2?
All versions are available to ChatGPT Plus, Business, and Enterprise users, as well as developers via API from the moment of launch.
What’s new for developers in GPT-5.2?
GPT-5.2 Thinking significantly improves bug fixing, code generation, and working with large repositories. It solves 55.6% of tasks on SWE-Bench Pro and 80% on SWE-Bench Verified.
What is GDPval, and why is it important?
GDPval is a new benchmark that measures a model’s ability to perform real-world work tasks, like creating presentations, Excel sheets, or Word documents, across 44 occupations. It reflects how useful the model is in practical business scenarios.
Can GPT-5.2 really outperform human experts?
According to OpenAI, in blind evaluations, GPT-5.2 Thinking matched or outperformed human experts on 70.9% of tasks, including spreadsheets, presentations, and technical documentation.
Does GPT-5.2 work better with long documents?
GPT-5.2 Thinking handles up to 256k tokens reliably, making it suitable for large reports, contracts, multi-file codebases, and long correspondence without losing context.
What is ARC-AGI, and how did GPT-5.2 perform?
ARC-AGI is a benchmark that tests abstract reasoning and generalization, often called the “AGI exam.” GPT-5.2 ranked first in both ARC-AGI-1 and ARC-AGI-2.
How much does it cost to use GPT-5.2?
API prices have increased to $1.75 per 1 million input tokens and $14 per 1 million output tokens. However, OpenAI claims overall costs may drop due to faster execution, fewer mistakes, and fewer unnecessary tokens.