When you ask a language model a question, what happens internally is statistical prediction: the model calculates which sequence of text has the highest probability of being an appropriate response, based on patterns learned from enormous volumes of data. This mechanism is extraordinarily fast and works well for the vast majority of tasks: summarising a document, drafting an email, explaining a concept.
But there is a category of problems where this approach fails predictably. These are problems that cannot be solved by pattern association, but require chained reasoning: following steps, maintaining internal coherence, verifying hypotheses mid-process. That is precisely the type of task where reasoning models make a real difference.
The difference between generating and reasoning
A standard model generates text from left to right. Each token (word or fragment) is the most probable continuation of the previous one given the context. This process is elegant and efficient, but it has a structural weakness: the model commits to a direction from the very first token and cannot “go back” to reconsider a previous step.
Think about a multi-step maths problem. A model that generates directly may produce plausible intermediate steps without verifying that each one is correct. The final result can look reasoned even if there is a silent error in step two.
Reasoning models introduce a prior phase: before generating the visible response, the model produces an internal deliberation process. In that phase, it can test hypotheses, detect contradictions and recalculate before committing to an answer. It is not “thinking” in the human sense, but the practical effect is similar: responses that require sequential attention improve significantly.
The most useful analogy is the difference between answering by intuition and answering by analysis. To know if it is cold outside, you just look. To decide whether to accept a job offer, you sit down and work through the numbers, consider alternatives, and review the decision before communicating it. Reasoning models are the computational version of that second process.
How reasoning models work
OpenAI popularised this approach in 2024 with its o1 series, followed by o3 and later variants. Anthropic introduced extended thinking in Claude in early 2025. Other labs like Google DeepMind have developed equivalent capabilities under different names.
The core mechanism is the chain of thought: the model generates an intermediate block of text, visible or not to the user, in which it works through the problem before producing the final response. This block can include assumptions, partial calculations, error detection and changes of approach.
What distinguishes reasoning models from simply asking a standard model to “think step by step” is training. Reasoning models have been optimised specifically so that this internal process is useful, not merely decorative. They have learned when to doubt their own responses, how to detect inconsistencies and how to generate intermediate steps that genuinely reduce uncertainty.
Another relevant feature is that reasoning time can be adjustable. In some systems, the user can indicate how much time or how many “thinking tokens” they want the model to dedicate to the problem. More reasoning time generally translates to greater accuracy for complex problems, though with diminishing returns.
When to use a reasoning model
Not every problem needs a reasoning model. Using one for simple tasks is inefficient: slower, more expensive, and with no perceptible improvement in the result. The key is to identify what type of problem you have before choosing the model.
Reasoning models deliver real value in situations like these:
- Maths and logic: Multi-step problems where an intermediate error ruins the final result.
- Debugging complex code: When you need to trace an error through multiple layers of logic.
- Constrained planning: Decisions where several conditions must be satisfied simultaneously.
- Legal or contractual analysis: Interpreting texts that contain conditions, exceptions and cross-references.
- Argument evaluation: Identifying fallacies, implicit assumptions or contradictions in a line of reasoning.
By contrast, for these tasks a standard model is sufficient and more practical:
- Summarising or reformatting text.
- Answering direct factual questions.
- Drafting creative copy.
- Translating or adapting the tone of a document.
- Handling routine support queries.
The most useful criterion is to ask yourself: “Does this problem have a solution I could verify step by step if I had the time?” If yes, it probably benefits from extended reasoning.
Limitations and real costs
Reasoning models have clear advantages in certain contexts, but also drawbacks worth knowing before using them indiscriminately.
They are considerably slower. Where a standard model responds in seconds, a reasoning model can take minutes on complex problems. This makes them unsuitable for real-time applications or where response speed is critical to user experience.
They are more expensive. The internal “thinking” tokens count towards the total cost. For applications that process thousands of queries a day, the difference can be substantial.
They can over-analyse simple problems. There are documented cases where these models generate elaborate reasoning to arrive at answers that any basic model would give correctly and immediately. They do not have a perfect mechanism for detecting when a problem does not justify the effort.
They do not eliminate hallucinations. Reasoning more is not the same as reasoning correctly. These models still make errors, especially when the problem requires factual knowledge not present in their training data or when the required reasoning exceeds their capacity.
Transparency of the process is limited. Although some systems show the model’s “thinking”, that text is not a faithful transcript of the underlying computational process. It should be interpreted as an approximate representation, not an exact technical explanation.
What this means for you
The arrival of reasoning models does not fundamentally change how most users interact with AI. 80% of everyday tasks — finding information, drafting, summarising, translating — do not require extended reasoning and a fast standard model is the right choice.
What does change is the ceiling of what is possible. Certain problems that previously produced incorrect or superficial responses now have a solution. The range of tasks you can confidently delegate expands.
The practical heuristic is simple: always start with the fastest and simplest model available. If the result is unsatisfactory because the problem was too complex, move it to a reasoning model. In most cases, you will quickly learn which types of tasks justify that switch.
The reasoning model is not a replacement for the standard model. It is a specialised tool for a specific category of problems. Like any specialised tool, its value depends on knowing when to reach for it.