Back to my writings

Context Windows Are Getting Stronger

What Opus 4.6 changes when working with Claude Code

If you use Claude Code every day, you know the routine. You start a task, glance at the status line often, and quietly hope the model finishes the job before the context window fills up. We have all been there, watching the tokens climb while the model keeps working and hoping the task completes before context rot sets in.

Most people using coding models try to stay under half the window. Fifty to sixty percent is still considered the safe zone. Push beyond that and the model will not perform at its best. It begins to lose clarity. It may hallucinate. It can still summarize what is already in the session, but it becomes harder to solve new problems well. For most of us, that is the signal not to start another task unless the existing context is truly necessary and we deliberately choose to compact.

So before starting the next large task, we clear the context and begin again. Anyone who uses Claude Code knows caching is not really the answer here. The usual move is simple. Restart the session. Start fresh. Give the model a clean working space.

That assumption may start to change.

Opus 4.6 handles long context much better

With Opus 4.6, something interesting is happening. The model is holding its reasoning ability much deeper into the context window than earlier versions.

One way this improvement shows up is in evaluations that test whether a model can find specific information buried inside large inputs. These tests simulate situations where the model has to search through long blocks of text and still identify the correct details.

It is said that in these evaluations, Opus 4.6 now performs noticeably better than earlier models, even with context windows reaching hundreds of thousands of tokens.

To be honest, I have not seen this clearly yet in my own Claude Code sessions. But if this improvement holds in real-world use, it is a welcome change. Anything that helps the model reason better in longer sessions is good for all of us.

More context can now be useful

This does not mean the context window should always be full. Shorter contexts are still easier for models to process, and many tasks will remain faster and cleaner in smaller sessions.

What is changing is the amount of usable space we will have. When working with Claude Code, you will be able to keep more information in the session before needing to compact or restart. Logs, code fragments, debugging output, and terminal traces can stay visible to the model longer.

For real debugging work, that matters. Software issues rarely exist in a single file. They show up across logs, UI behavior, stack traces, and surrounding code. When more of that information stays inside the same session, the model has a better chance of understanding the full problem instead of piecing it together from fragments.

The context window is becoming working memory

When I work inside Claude Code, I often think of the context window as the model's working memory.

Earlier models felt like they had very limited working memory. Once the window filled up, reasoning became unreliable and it was usually time to clear the session and start again. Newer models appear to be getting better at navigating larger context spaces, and if this continues to improve, Claude Code sessions could start to feel very different.

Instead of constantly clearing context just to preserve reasoning quality, the model could operate on much larger slices of the system before a reset becomes necessary. That would be a meaningful shift for anyone using Claude Code regularly.

The context window has mostly felt like a limit to manage. If models continue improving at long-context reasoning, it may start to feel more like working memory the model can actually use.

PS: Claude Code has also recently introduced a new feature called MEMORY.md inside the .claude directory. The idea is simple. As you work, the system can record small notes about mistakes, preferences, or patterns it observes, and then load those notes into the system prompt in future sessions. In theory this creates a lightweight memory layer that helps the model build on previous experience. I have not spent much time experimenting with it yet so I can control the signal-to-noise ratio. But the direction is interesting. It suggests more effort is going into scaling memory and improving how context persists across sessions.