Run Epic Roleplaying Sessions with Local LLMs

8/11/2025

Beyond 40 Messages: How to Run Epic, Novel-Length Roleplaying Sessions with Local LLMs

Hey everyone, let's talk about something that's both SUPER exciting & a little bit frustrating in the world of AI roleplaying: hitting the wall. You know what I mean. You're deep into an amazing story, your AI companion is brilliant, the plot is thickening... & then, suddenly, the AI forgets who you are, what you were doing, & that the dragon you were fighting wasn't, in fact, a particularly aggressive house cat.

This, my friends, is the dreaded context window limit. It's the maximum amount of text an LLM can "remember" at any given time. Once your conversation exceeds this limit, the oldest messages start to get pushed out, & with them, all the precious plot points & character development you've built up. For a while, it seemed like truly long-form, epic roleplaying sessions were just out of reach for us running models locally.

But here's the thing: it's a solvable problem. It takes a little bit of setup, a little bit of know-how, but I'm here to tell you that running roleplaying sessions that are hundreds, even thousands of messages long is not only possible, it's AMAZING. We're going to dive deep into the strategies & tools you can use to give your AI an effectively infinite memory.

The Core of the Problem: That Pesky Context Window

So, what exactly is this "context window" we're talking about? Think of it like the LLM's short-term memory. It's a fixed-size buffer that holds the conversation so far. When you send a message, the AI looks at everything in that window – your character's description, the chat history, the world info – to figure out what to say next.

The problem, as many of us have discovered, is that this window ain't infinite. Different models have different context sizes, but they all have a limit. When your roleplay gets longer than that limit, the AI starts to suffer from a kind of digital amnesia. It can't see the beginning of the conversation anymore, so it loses track of key details, character relationships, & ongoing quests. This is why after about 40 or 50 messages, you might notice the quality of the roleplay start to degrade. The AI might repeat itself, contradict earlier information, or just generally lose the plot.

So, how do we get around this? How do we give our AI a long-term memory? The answer lies in a few clever techniques & some pretty awesome tools.

Level 1: The Simple Summary – A Good First Step

The most straightforward approach to extending a roleplay is summarization. The idea is simple: as the conversation gets long, you periodically ask the AI to summarize what's happened so far. Then, you can start a new session with that summary as the starting prompt.

This is a decent starting point, & it's a feature built right into one of the most essential tools for AI roleplaying: SillyTavern. SillyTavern is a fantastic front-end that gives you a ton of control over your roleplaying experience. It has a built-in summarization feature that can automatically create a summary of your chat at set intervals.

How it works: You set a message limit, & once the chat reaches that limit, SillyTavern sends a request to your LLM to create a summary. This summary is then tucked into a special part of the prompt, keeping the AI up-to-date on the major plot points.

The downside: The quality of the summary is only as good as the model you're using. Sometimes, the AI can get things wrong in the summary – it might mix up characters or misinterpret events. A bad summary can sometimes be worse than no summary at all, as it will reinforce incorrect information. It's a good first step, but it's not the ultimate solution.

Level 2: Curated Memory – The Power of Lorebooks

If you want more control over what your AI remembers, you need to get familiar with Lorebooks. This is another amazing feature in SillyTavern, & it's a game-changer for long-term consistency.

A Lorebook is essentially a collection of manually written notes about your world, characters, items, & past events. Each entry in the lorebook has a set of keywords associated with it. When you use one of those keywords in your message, SillyTavern automatically grabs the relevant lorebook entry & stuffs it into the context for that specific turn.

Here's an example:

Let's say you have a recurring NPC named "Silas the Shady." You can create a lorebook entry for him that details his personality, his backstory, his motives, & any important interactions you've had with him. You could set the keywords to "Silas," "the shady merchant," etc.

Now, whenever you mention "Silas" in your chat, the AI will get a fresh reminder of exactly who he is & everything it needs to know about him. This is incredibly powerful for maintaining character consistency over a long period. You can create lorebook entries for:

Key Characters: Their personalities, goals, & relationships.
Locations: Descriptions of cities, dungeons, & important landmarks.
Plot Points: Reminders of quests, prophecies, & important past events.
Items: The history & abilities of magical artifacts.

Lorebooks are more work than automatic summarization, since you have to write the entries yourself, but they are FAR more reliable. You have complete control over what the AI remembers, ensuring that the most important details are never lost.

And here's a pro-tip: you can combine summarization & lorebooks. After a major story arc, you can have the AI generate a summary, then you can edit it for accuracy & add it to your lorebook as a new entry. This gives you the best of both worlds: the convenience of AI-generated summaries with the reliability of manual curation.

Level 3: The Ultimate Solution – Building an External Brain with RAG

Alright, now we're getting to the REALLY cool stuff. If you want to create a truly epic, novel-length roleplaying experience, you need to give your AI an external brain. This is where Retrieval-Augmented Generation (RAG) comes in.

It sounds complicated, but the concept is actually pretty simple. With RAG, you take all of your roleplaying information – your entire chat history, your lorebooks, character sheets, world maps, EVERYTHING – & you store it in a special kind of database called a vector database (like ChromaDB).

When you send a message to the AI, the RAG system first analyzes your message to understand what it's about. Then, it searches the vector database for the most relevant pieces of information. It might pull up a few past messages that are similar to the current situation, a lorebook entry about the character you're talking to, & a summary of the last session.

Finally, it takes all that retrieved information & combines it with your message to create a super-detailed prompt for the LLM. The AI then generates its response with all of this rich, relevant context.

The result? The AI has access to your ENTIRE roleplaying history, but it only "sees" the parts that are relevant to the current moment. This gets around the context window limit in a HUGE way. It's like having a conversation with someone who has a perfect, searchable memory of everything you've ever said.

SillyTavern, once again, is the hero here. It has a built-in RAG system (often called the "Data Bank") that can connect to a vector database. There are some great tutorials on YouTube that walk you through the process of setting this up, but here's the general workflow:

Chat & Summarize: Have your roleplaying sessions in SillyTavern as usual. Periodically, create summaries of your chat logs.
Create Your Knowledge Base: Take your summaries, your lorebook entries, & any other documents you want the AI to remember, & put them into a text file.
Vectorize Your Data: In SillyTavern's Data Bank, you'll "vectorize" this text file. This is the process of converting all that text into a format that the vector database can understand.
Enable RAG: Once your data is vectorized, you can enable the RAG feature. Now, whenever you chat, SillyTavern will automatically search your knowledge base & provide the AI with the most relevant information.

This is, without a doubt, the most powerful way to run long-term roleplaying sessions. It takes some setup, but it's a game-changer. It's how you go from a 40-message chat to a 4,000-message epic.

It’s Not Just About Memory, It's About Better Business Too

Now, this might sound like it's just for fun & games, but these same principles have some pretty serious applications in the business world. Think about it: what is a long-term roleplay if not an extended, evolving conversation?

This is where a tool like Arsturn comes in. Arsturn helps businesses create custom AI chatbots trained on their own data. Sound familiar? It's the same idea as the RAG system we just talked about. A business can feed all of its product information, support documents, & company policies into Arsturn. Then, when a customer asks a question, the Arsturn-powered chatbot can retrieve the exact right piece of information & provide an instant, accurate answer.

It's all about creating a better conversational experience. Whether you're a dungeon master trying to maintain plot continuity or a business trying to provide top-notch customer support, the goal is the same: to have a meaningful, intelligent conversation where no information is ever lost. With Arsturn, businesses can build no-code AI chatbots that engage with website visitors 24/7, answer questions instantly, & even help with lead generation. It's the same "external brain" concept, but applied to the world of customer engagement. Pretty cool, right?

Tying It All Together: Your Workflow for Epic Roleplays

So, to sum it all up, here's a step-by-step guide to running incredibly long & detailed roleplaying sessions with your local LLM:

Get the Right Tools: You'll want SillyTavern as your front-end & a backend like Oobabooga's Text Generation WebUI to run your models.
Start Simple: Begin by using SillyTavern's built-in summarization feature to get a feel for how it works.
Build Your Lore: As your world develops, start creating Lorebook entries for all the important details. This will be your curated source of truth.
Embrace RAG: When you're ready to take things to the next level, set up SillyTavern's Data Bank with a vector database. Start feeding it your chat summaries & lorebooks.
Choose the Right Model: Don't forget that the quality of your roleplay still depends on the LLM you're using. Check out some of the leaderboards for roleplaying models to find one that's creative, coherent, & good at following instructions.

It might seem like a lot, but trust me, once you get it set up, you'll wonder how you ever roleplayed without it. You'll be free from the constraints of the context window, able to build truly massive, sprawling narratives with an AI that remembers every last detail.

So go forth, build your worlds, & tell your epic stories. The only limit now is your imagination.

Hope this was helpful! Let me know what you think, & if you have any other cool tricks for long-form roleplaying, I'd love to hear them.