Build a Local AI Chat Browser Extension with Ollama

8/10/2025

You Don't Need the Cloud: Building a Browser Extension for Local AI Chat with Ollama

Hey everyone, let's talk about something pretty cool. We're all getting used to AI assistants, right? They're everywhere. But most of them live on a server somewhere far away, which means your data is going there too. What if you could have a powerful AI, right in your browser, that runs entirely on your own machine? No cloud, no data sharing, just pure, private AI goodness.

Turns out, you absolutely can.

We're going to dive deep into how to build your very own browser extension that chats with a local AI model using a fantastic tool called Ollama. This isn't just a theoretical exercise; by the end of this, you'll have a solid blueprint for creating a practical, private AI assistant that can do things like summarize web pages, answer questions about what you're reading, or just be a handy brainstorming partner.

Honestly, once you get the hang of it, it's surprisingly straightforward & a TON of fun.

So, What's the Big Deal with Local AI?

First, why even bother with a local AI?

Privacy is EVERYTHING: This is the big one. When you use a commercial AI service, your prompts, the text you highlight, everything you do gets sent to their servers. With a local model running via Ollama, not a single byte of your conversational data ever leaves your computer. It stays between your browser & your local machine. That's a HUGE win for privacy.
No More API Bills: Running models locally means you're not paying per token or per API call. You download the model once, & you can use it as much as you want. It's free.
Customization & Control: You get to choose the model. Want something small & fast like Llama 3's 8B version? Go for it. Need a more powerful model for complex tasks? You can run that too (hardware permitting). You're in complete control.
It's Just Plain Cool: Let's be real, there's a certain magic to running a powerful language model on your own laptop. It feels like you're living in the future.

This is where Ollama comes in. Think of it like Docker, but for large language models. It's a tool that makes it incredibly easy to download, manage, & run open-source models like Llama 3, Gemma, & many others right on your own Mac, Windows, or Linux machine. It also conveniently exposes a REST API, which is the key to letting our browser extension talk to it.

The Game Plan: Architecting Our Extension

Before we start slinging code, let's get a bird's-eye view of what we're building. A browser extension isn't just one single program; it's a collection of components that work together. For our local AI chat extension, we'll need a few key parts, all tied together by the

manifest.json

file.

1manifest.json
: This is the heart of the extension. It's a JSON file that tells Chrome (or any Chromium-based browser) everything it needs to know: the extension's name, version, what files it uses, & what permissions it needs.
Popup (
1popup.html
&
1popup.js
): This is our user interface. When you click the extension's icon in the toolbar, a little window pops up. This is where our chat interface will live. It's just a standard HTML file styled with CSS & powered by JavaScript.
Background Script (
1background.js
): This is the workhorse. Since the popup window can be closed at any time, we need a persistent script running in the background to manage the connection to Ollama & handle the core logic of the chat. In Manifest V3, this is called a "service worker".
Content Script (
1content.js

Optional but Recommended)
: If we want our extension to interact with the content of the web page we're on (for example, to grab the text of an article for summarization), we need a content script. This script gets injected directly into the web page.

The flow will look something like this:

The user types a message in the popup UI.
The
1popup.js
sends that message to our
1background.js
service worker.
The
1background.js
script makes an HTTP request to the local Ollama server (running on
1http://localhost:11434
).
Ollama processes the request with the chosen AI model & streams the response back.
The
1background.js
script receives the streaming response & sends it back to the
1popup.js
.
The
1popup.js
updates the chat UI in real-time, showing the AI's response as it's being generated.

Pretty neat, right? Now let's get our hands dirty.

Step 1: Getting Ollama Up & Running

This is the easiest part.

Head over to ollama.com & download the application for your operating system.
Install it.
Open your terminal or command prompt & pull a model. A great starting point is Llama 3's 8-billion parameter model. It's powerful but still relatively lightweight.