Create AI NPCs with Local LLMs & Ollama

8/10/2025

The Unscripted Future: Creating Next-Gen AI NPCs with Local LLMs & Ollama

Hey everyone, let's talk about something that's been bubbling up in the game dev world & is, frankly, SUPER exciting. We've all been there: you're hours deep into an epic RPG, the world is stunning, the quests are legendary, but then you talk to a shopkeeper who has the same three lines of dialogue on a loop. It kind of shatters the illusion, right? For decades, Non-Player Characters (NPCs) have been the lifeblood of game worlds, but they've also been one of the biggest technical & creative limitations. They're essentially puppets, moving along pre-defined paths & speaking from a script.

But what if they weren't? What if an NPC could actually understand you, remember your past conversations, & generate unique, unscripted dialogue in real-time? This isn't science fiction anymore. It's happening right now, thanks to Large Language Models (LLMs)—the same tech behind things like ChatGPT.

Here’s the thing, though. Most of the early experiments with LLM-powered NPCs have relied on cloud-based APIs. This means you need a constant internet connection, & the developer (or even the player) has to pay for every single interaction. It's a cool tech demo, but it's not exactly practical for a mainstream game.

This is where the REAL game-changer comes in: running those LLMs locally on the player's own machine. We're talking about a future where every NPC has a brain of its own, running right on your PC or console. And a huge part of making this accessible to more developers is a tool called Ollama. It's pretty revolutionary, & it’s opening the door to some truly mind-bending possibilities for game immersion.

From Puppet to Person: The Old Way vs. The New AI

Honestly, to appreciate where we're going, we have to remember where we've been.

Traditional NPCs: For as long as we've had games, NPCs have been built on complex but ultimately rigid systems. Think of them as intricate flowcharts. They use things like:

Dialogue Trees: A branching set of pre-written responses. You pick option A, they say line A. You pick option B, they say line B. It offers choice, but it's a finite set of choices.
State Machines: These control an NPC's behavior. They have a "patrol" state, an "attack" state, a "flee" state, etc. They're reactive, but only in ways the developer has explicitly coded.
Scripts: Every word, every action is meticulously scripted by writers & programmers. This gives developers immense control over the story, which is great, but it’s also incredibly time-consuming & limited.

This approach has given us some of the most iconic characters in gaming history, no doubt. But the illusion is fragile. You can't ask a question the developer didn't anticipate. The NPC has no real memory of your unique journey beyond a few key plot flags.

LLM-Powered NPCs: Now, imagine a different kind of NPC. Instead of a script, you give it a "personality." You feed it a backstory, some key knowledge about the game world, & a set of rules for how to behave. From that point on, it generates its own dialogue.

Dynamic Conversations: You can type or speak a question, & the NPC understands the intent behind it, not just a keyword. It can have a real, flowing conversation. In one demo, a developer was able to just react to what an AI-powered character was saying, with zero scripting involved.
Emergent Behavior: When you have multiple LLM-powered NPCs in the same space, they can interact with each other. Imagine walking through a city & overhearing two guards having a genuine, unscripted conversation about the weather or a recent dragon attack. It makes the world feel ALIVE in a way that's been impossible until now.
Personalization: These NPCs can remember you. They can recall that you sold them a bunch of junk an hour ago or that you asked them about their family. This creates a persistent, evolving relationship with the characters in the world.

Why Go Local? The Magic of On-Device AI

So, if LLMs can do this, why aren't they in every game? The big barrier has been cost & latency. Sending every conversation to a cloud server costs money & takes time, which can lead to awkward pauses in dialogue. Running the LLM locally solves a bunch of these problems.

Cost-Efficiency: This is the big one. If the AI is running on the player's hardware, there are no API calls to pay for. A game called AI People even lets players switch to a local LLM to enable unlimited gameplay without consuming any credits. This makes truly dynamic, talkative NPCs financially viable for both developers & players.

No Internet? No Problem: One of the most obvious benefits is that the game can be played entirely offline. The AI is self-contained. This is a HUGE deal for game preservation & for players who don't have a stable internet connection.

Full Developer Control: When you're running a local model, you have total control. You don't have to worry about a third-party API changing, going down, or implementing content filters that might not align with your creative vision. You choose the model, you fine-tune it, & you own the experience.

Privacy: All the interactions are happening on the player's machine. No conversations are being sent to a server, which is a nice little bonus for player privacy.

Meet Ollama: Your Friendly Neighborhood LLM Server

This is where it all starts to get really practical. The idea of setting up a local LLM used to be a super technical, intimidating process. You had to wrestle with Python libraries, manage dependencies, & it was just a headache.

Ollama changes all of that.

In simple terms, Ollama is a tool that makes it incredibly easy to download, set up, & run powerful open-source LLMs (like Llama 3, Mistral, etc.) on your own computer. You install it, and with one command in your terminal, you can have a model running. It essentially turns your machine into a local AI server.

For a game developer, this is amazing. Instead of building a complex backend, your game just needs to send a simple request to the Ollama server running on the same machine. This is often done using a lightweight web server framework like Flask in Python, which acts as a bridge between the game engine (like Unreal or Unity) & the LLM. It’s a clean, straightforward way to integrate cutting-edge AI without a massive engineering overhead.

A Peek Under the Hood: How Does It Actually Work?

Alright, so let's get a little more granular. How do you go from installing Ollama to having a smart-mouthed goblin in your game? The process, at a high level, looks something like this:

Choose Your Brain: First, you pick an LLM. Different models have different strengths. A massive model might be great at general conversation, but a smaller, fine-tuned model like "Nemotron-Mini" might be explicitly designed for roleplaying and generate better character responses. You pull this model down using Ollama.
The Prompt is Everything: You can't just let an LLM run wild. You need to give it context. This is called prompt engineering. For each NPC, you'd create a master prompt that acts as its personality file. It might look something like this:
"You are Kaelen, a grumpy dwarven blacksmith in the town of Stonehaven. You are 250 years old. You are proud of your work & suspicious of outsiders, especially elves. You know about the local mines & the recent goblin raids. You will respond to the player in short, gruff sentences. Do not break character."
The Game Engine Connection: Your game, whether it's built in Unreal Engine or Unity, needs to talk to Ollama. As mentioned, a common way is to make an HTTP request from the game's code (C++ in Unreal, C# in Unity) to a local server. So when a player talks, the game sends their words, along with the NPC's prompt, to the LLM.
Generating a Response: The LLM takes the player's input & the NPC's "personality prompt" and generates a response that fits the character. For example, if the player asks, "What's new?" Kaelen might reply, "More goblin trouble. Bad for business. Good for my forge."
Bringing it to Life: The game engine receives this text back. From there, it can be displayed as on-screen text, or you can use Text-to-Speech (TTS) services to generate an audio file on the fly, and even use other plugins to create real-time lip-syncing for the character model.

It's a multi-step process, but each step is becoming more and more accessible thanks to tools like Ollama & the amazing work being done by the open-source community. There are already tutorials on YouTube showing how to set up everything from the tokenizer to the final actor in Unreal Engine 5.

The Hurdles & Headaches: This Isn't a Silver Bullet (Yet)

Now, it's easy to get carried away by the hype, but we need to be realistic. This technology is still in its early days, & there are some MAJOR challenges to overcome.

Hardware is a BEAST: Running LLMs locally is demanding. The single biggest requirement is VRAM (video card memory). A decent conversational model might need a minimum of 8GB of VRAM, with 12GB or more being recommended for smoother performance. This immediately limits the potential audience to players with higher-end gaming PCs.
The Hallucination Problem: LLMs have a tendency to "hallucinate," or make things up. An NPC might confidently tell you about a quest that doesn't exist or a location that isn't in the game. This can be immersion-breaking at best & a confusing mess at worst. Developers need to build "guardrails" and use techniques like Retrieval-Augmented Generation (RAG) to keep the AI grounded in the game's actual lore & state.
Integration with Game Logic: This is probably the hardest problem to solve. It's one thing for an NPC to talk about a missing sheep. It's another thing for the game to actually spawn a sheep, create a quest objective, & provide a reward when you find it. Bridging the gap between the LLM's generated dialogue & the game's actual systems is a massive design & technical challenge.
Consistency is Key: LLMs can be forgetful. An NPC might be friendly in one conversation & hostile in the next for no reason. Maintaining a consistent personality & memory over long play sessions is a tough nut to crack, especially as the conversation history grows.

From In-Game NPCs to Real-World Engagement

The cool thing about this technology is that its applications go way beyond just gaming. The core idea—creating an AI personality trained on specific data to interact with users—is incredibly powerful for businesses, too.

Think about it. The same way a developer feeds an NPC a backstory & world lore, a business can train an AI on its product documentation, FAQs, & brand voice. This is exactly the kind of solution that companies are building with platforms like Arsturn.

While a game developer is trying to create an immersive character, a business is trying to create a helpful, engaging customer experience. With Arsturn, a business can build a no-code AI chatbot that's trained on their own data. This isn't a generic chatbot; it's a custom AI that can answer specific customer questions, provide instant support, & engage with website visitors 24/7. It's about moving from a static, scripted FAQ page to a dynamic, conversational assistant that provides personalized help—much like the leap from a scripted NPC to a dynamic one. It’s all about using AI to build meaningful, personalized connections with your audience.

So, You Want to Build Your First AI NPC?

Feeling inspired? The best part about all this is that you can start experimenting RIGHT NOW. Here are a few first steps:

Install Ollama: Head to their website. It's a super simple installation process for Windows, Mac, & Linux.
Download a Model: Open your terminal & type
1ollama pull llama3
. This will download a powerful, general-purpose model to get you started. For more character-focused stuff, look for models specifically fine-tuned for roleplaying on platforms like Hugging Face.
Play with Prompts: Before you even touch a game engine, just chat with the model through Ollama in your terminal. See if you can create a compelling character just through a well-written prompt.
Check out the Community: Dive into YouTube tutorials & Reddit communities like r/LocalLLaMA. There are people sharing code, offering advice, & showcasing their projects every single day.

The journey from a simple chat to a fully integrated in-game character is a long one, but it's not an impossible one. Not anymore.

We're at the very beginning of a new era in interactive entertainment. The line between a background character & a main character is about to get incredibly blurry. It's going to be a wild ride, with plenty of bumps along the way, but the potential for creating truly living, breathing worlds is immense.

Hope this was helpful! I'm super excited to see what kinds of crazy, unscripted adventures you all create. Let me know what you think.