How to Increase Ollama's Context Window Size for LLMs

8/11/2025

Unlocking Your Ollama Model's Full Potential: How to Easily Increase the Context Window Size

Hey there, fellow AI enthusiast! So, you've been playing around with Ollama & the amazing large language models it lets you run locally. It's pretty awesome, right? But have you ever been in the middle of a complex task, feeding a model a large document or a long chat history, only to have it forget the beginning of your conversation? Yeah, it's a frustratingly common problem, & it all comes down to something called the "context window."

Honestly, understanding the context window is a game-changer for getting the most out of your local LLMs. In this guide, I'm going to break down what it is, why it matters, & most importantly, show you a couple of straightforward ways to increase it in Ollama. Let's get into it.

First Off, What Exactly is a "Context Window"?

Think of the context window as a model's short-term memory. It's the amount of text (the "context") that the model can "see" at any given time when it's generating a response. This context includes your initial prompt, any documents you've provided, & the conversation history.

This "memory" is measured in tokens. A token is roughly equivalent to a word or a part of a word. For instance, the word "eating" might be broken down into two tokens: "eat" & "ing". A good rule of thumb is that 1,000 tokens is about 750 words.

Here's the thing: by default, many models in Ollama have a surprisingly small context window, often around 2048 tokens (2k). That's only a few pages of text! If your input exceeds this limit, the model starts to lose track of the earlier parts of the conversation, leading to less accurate or even nonsensical responses. This can be a real pain, especially for tasks like summarizing long documents, analyzing code, or having an extended, detailed conversation.

Why You'd Want a Bigger Context Window

The benefits of a larger context window are pretty clear once you start pushing the limits of the default settings. A bigger window allows your models to:

Handle Larger Documents: You can feed the model entire research papers, legal documents, or book chapters & have it answer questions or summarize the content accurately.
Maintain Longer Conversations: The model can remember the entire thread of a long, evolving conversation, leading to more coherent & relevant responses.
Improve Complex Reasoning: With more context, the model can better understand the nuances of your request & perform more complex reasoning tasks.
Enhance In-Context Learning: You can provide more examples in your prompt to "teach" the model what you want, leading to better-tailored outputs.

Of course, there's a trade-off. A larger context window requires more VRAM (the memory on your graphics card). So, while you might be tempted to crank it up to the max, you'll need to be mindful of your hardware's limitations. We'll touch on that a bit more later.

How to Check Your Model's Current Context Window

Before you start making changes, it's a good idea to see what you're working with. You can do this by running the

ollama show

command in your terminal. For example, to check the details of the

llama3.1

model, you would run: