Cursor Basics
Cursor Max Mode: Extend Context for Large Codebases
Max Mode expands the active context window to the largest size a given model supports — up to 1 million tokens for supported models. Turn it on per-request from the model selector in the chat or agent panel. It bills at token-based rates, so one Max Mode request costs significantly more than a normal one.
On this page
What does Max Mode actually do?
Each model has a default context window — the number of tokens it reads in one pass. In normal mode, Cursor uses a working window that fits most tasks efficiently. Max Mode raises the ceiling to the model's technical maximum: Claude Sonnet 4.6 goes from 200k to 1M tokens; other models scale similarly.
A bigger context window means the agent can hold more of your codebase in view simultaneously — useful when a change touches many files, when you're debugging a sprawling call chain, or when you need the agent to reason about the whole repo without missing dependencies.
Because Max Mode uses far more tokens per call, a single agent turn can consume substantially more of your usage budget. Enable it for genuinely large tasks; for most everyday coding, the standard window is sufficient.
How do I turn on Max Mode?
- 1Open the model selector in your chat or agent panel (click the model name at the top of the panel).
- 2Find the model you want to use and toggle Max Mode on.
- 3The toggle applies per-session — you can turn it on for a large refactor and off again for quick questions.
Max Mode is available for all state-of-the-art models in Cursor. The exact context ceiling depends on the model; the model selector shows the Max Mode window size next to each option.
When should I use Max Mode?
- Task
- Fix a single bug in one file
- Max Mode needed?
- No — standard window handles it
- Task
- Refactor a function and its callers across 10+ files
- Max Mode needed?
- Yes — benefits from seeing all call sites at once
- Task
- Add a feature to a large monorepo with shared types
- Max Mode needed?
- Yes — prevents missed dependencies
- Task
- Debug a request that spans API, service and DB layers
- Max Mode needed?
- Yes — agent needs the full chain in view
- Task
- Write a new component with clear isolated scope
- Max Mode needed?
- No — standard context is more efficient
- Task
- Generate documentation for the whole repo
- Max Mode needed?
- Yes — complete picture reduces gaps
| Task | Max Mode needed? |
|---|---|
| Fix a single bug in one file | No — standard window handles it |
| Refactor a function and its callers across 10+ files | Yes — benefits from seeing all call sites at once |
| Add a feature to a large monorepo with shared types | Yes — prevents missed dependencies |
| Debug a request that spans API, service and DB layers | Yes — agent needs the full chain in view |
| Write a new component with clear isolated scope | No — standard context is more efficient |
| Generate documentation for the whole repo | Yes — complete picture reduces gaps |
Use Max Mode for tasks where missing context causes wrong assumptions, not just for any large file.
How does Max Mode relate to extended thinking?
Some models support thinking mode — an extended reasoning pass before producing output. For Sonnet 4.6 and Opus 4.6, Cursor exposes this as a separate thinking variant in the model selector. Thinking and Max Mode are independent: you can use Max Mode without thinking enabled, and vice versa.
- Max Mode
- Increases the context window — how much code the model can see at once.
- Thinking / Extended thinking
- Increases the reasoning depth — how much the model deliberates before answering.
- Max Mode + Thinking
- Both: large context and deep reasoning. Highest cost; best for the hardest architectural problems.
How is Max Mode billed?
Max Mode uses the standard token-based rates — there is no separate 'Max' surcharge. The cost is high simply because each request uses many more tokens. Sonnet 4.6 in Max Mode bills at the same per-token rate as normal mode, including when context exceeds 200k; there is no long-context multiplier for Sonnet 4.6.
If you have a monthly spend cap set, a complex Max Mode agent run can exhaust it faster than expected. Check Settings → Usage for your current consumption. Verify current pricing at cursor.com/docs/models-and-pricing.
Where does Max Mode fit among the model categories?
It helps to think of Cursor's models in three buckets. Standard models answer quickly at a normal context window — your everyday driver. Thinking models add an extended reasoning pass before they answer. Max is not a model; it is a per-request toggle that pushes whichever model you picked up to its largest context window. So Max Mode sits on top of a standard or thinking model rather than replacing it.
Normal context window.
Fast, cheapest per request.
Use for the bulk of edits and questions.
Extended reasoning before output.
Better on tricky logic and planning.
Costs more turns of deliberation.
A toggle, not a model.
Pushes context to the model's ceiling.
Reserve for genuinely large-scope tasks.
What a Max Mode chat actually costs in request terms
Cursor's usage is metered in model spend, but a handy mental model is that one normal request maps to roughly 8 cents of usage. A single Max Mode chat reads far more tokens per turn — and an agent chat is many turns — so the same conversation can map to around 30 requests of equivalent spend. That is the gap people notice when one Max Mode session draws down their allowance the way a couple dozen normal requests would.
- One normal request
- ≈ 8 cents of usage.
- One Max Mode chat
- Can map to ~30 requests of equivalent spend.
- Why
- Far more tokens read per turn, multiplied across the turns in an agent chat.
An approximation to build intuition, not an official rate card. Verify live pricing at cursor.com/docs/models-and-pricing.
More context can actually degrade accuracy: the model has more to wade through and can lose the thread among less-relevant code. So Max Mode is selective by design — reach for it on tasks that genuinely need the whole picture, and often pair it with plan mode so the large context is spent on producing a solid plan rather than a sprawling, hard-to-review edit.
Frequently asked questions
Why did one Max Mode chat cost 30 requests?
Because Max Mode reads far more tokens per turn, and an agent chat is many turns. If a normal request is roughly 8 cents of usage, a single Max Mode conversation can read enough tokens across its turns to map to about 30 normal requests of equivalent spend. Nothing is broken — the meter is just counting the much larger volume of tokens the bigger context window pulled in.
Does Max Mode make the agent more accurate or just give it more context?
Primarily more context. The model's intelligence is the same; what changes is how much of your codebase it can see in one pass. For most tasks, that directly improves accuracy on large codebases by reducing the chance the agent misses a dependency or type definition.
Can I use Max Mode with every model in Cursor?
Max Mode is available for all state-of-the-art models Cursor supports, though the maximum window size varies by model. The model selector shows Max Mode availability and the window size for each option.
Is Max Mode on by default?
No — it is off by default to keep costs predictable. You toggle it per-session in the model selector. Once toggled on, it stays on for that session until you turn it off.
Will Max Mode use my full usage quota faster?
Yes. A single complex Max Mode request can use the token equivalent of many standard requests. If you are close to a usage cap or spend limit, enable Max Mode deliberately for the tasks that need it.
Sources & last verified
- Cursor - Max Mode
- Cursor - Models and Pricing
- Cursor - Claude Sonnet 4.6 Docs
- Cursor Changelog - Simplified Pricing, Background Agent
Cursor ships frequently. Facts verified against primary sources on June 25, 2026.