How to Run a Local GPT-OSS Agent on Your M-Series Mac
Z
Zack Saadioui
8/11/2025
So You Want to Run a Local GPT-OSS Agent on Your M-Series Mac? Here's How.
Hey everyone, hope you're doing well. So, there's been a TON of buzz lately around running large language models locally. & if you're like me, a tinkerer with a shiny M-series Mac sitting on your desk, you've probably been wondering, "Can I get in on this action?" The answer is a resounding YES. & honestly, it's not as complicated as you might think.
OpenAI recently dropped their open-weight model, GPT-OSS, & the community has been going wild with it. The cool thing about this isn't just that you can run a powerful model for free on your own machine, but that you can give it agentic capabilities. We're talking about letting it use tools, browse the web, & even interact with your local files. This is all made possible with something called MCP Support, which stands for Modular Command Protocol.
In this guide, I'm going to walk you through EVERYTHING you need to know to get a GPT-OSS agent, complete with MCP support, running on your M-series Mac. We'll talk about the tools you'll need, why M-series Macs are weirdly perfect for this, & how to get it all set up, step-by-step. It’s gonna be fun.
First Off, Why Bother Running a Local LLM?
Before we dive into the "how," let's quickly cover the "why." Why go through the trouble of setting up a local model when you can just use ChatGPT or Claude via their websites?
Privacy, Baby! When you run a model locally, your data stays on your machine. Period. You're not sending your super-secret startup ideas or your cringey fan fiction to a third-party server. This is a HUGE deal for a lot of people & businesses.
No Internet? No Problem. Once you've downloaded the models, you can use them completely offline. This is awesome for when you're on the go or just have spotty Wi-Fi.
No Rate Limits or Fees. When you're running the model on your own hardware, you're not at the mercy of API rate limits or pay-per-token fees. You can experiment, tinker, & generate text to your heart's content without worrying about a bill.
Customization & Control. This is the big one for developers & power users. You can fine-tune these models on your own data, swap out different versions, & have granular control over things like the context window size.
Now, let's be real for a second. A local model like GPT-OSS isn't going to be as powerful as the full-blown GPT-4o or Claude 3.5 Sonnet. Those models run on massive server farms with more computing power than we have in our entire neighborhood. But for many tasks, especially development, testing agentic workflows, & personal productivity, these local models are more than capable.
M-Series Macs: The Unsung Heroes of Local AI
Here's something you might not know: Apple's M-series chips (M1, M2, M3, & now M4) are uniquely suited for running these AI models. Why? It all comes down to their Unified Memory Architecture.
In a traditional computer, the CPU & GPU have their own separate pools of memory. If the GPU needs data that's in the CPU's memory, it has to be copied over, which takes time & can be a bottleneck. Apple's M-series chips, on the other hand, have a single pool of memory that both the CPU & GPU can access directly.
This is a GAME CHANGER for AI models, which are often very memory-intensive. It means that these models can run much more efficiently on a Mac with, say, 32GB of unified memory than on a PC with a separate 16GB of RAM & 16GB of VRAM. This efficiency is further boosted by Apple's MLX framework, a library specifically designed to optimize machine learning on Apple silicon. Some implementations of GPT-OSS using MLX have seen performance of up to 40 tokens per second, which is blazingly fast for local inference.
The Tools of the Trade: LM Studio vs. Ollama
There are two main players in the game when it comes to running local LLMs on a Mac: LM Studio & Ollama. Both are fantastic tools that essentially download, manage, & serve up these models for you to use.
Ollama: This is a super popular, command-line-first tool. It's lightweight, easy to use if you're comfortable in the terminal, & has a huge community. You can download it, pull models with a simple command like
1
ollama pull gpt-oss:20b
, & then interact with it via the command line or by hooking it up to other applications. Ollama also has a basic UI & can be configured to work with various front-ends.
LM Studio: This is a more GUI-focused application. It has a really nice interface that lets you discover & download models, chat with them, & most importantly for our purposes, easily configure a local server with MCP support. While I've used Ollama for a long time, for getting started with agentic capabilities, I've recently switched to LM Studio. It just makes the whole process a bit more transparent & user-friendly, especially when it comes to managing context windows & tools.
For this guide, we're going to focus on LM Studio because of its excellent, built-in support for MCP, which is crucial for building a true "agent."
So, What Exactly IS "MCP Support"?
Okay, let's demystify this. MCP stands for Modular Command Protocol. Think of it as a standardized way for an AI model to request access to tools. It's a bit like a universal remote for AI agents.
When an agent has MCP support, it means it can say, "Hey, I need to use the 'web search' tool," or "I need to use the 'read file' tool," in a way that the host application (like LM Studio) understands. The application then performs the action on the agent's behalf & feeds the result back to the model.
This is what elevates a simple chatbot into a powerful agent. It's the difference between a model that can only talk about what it was trained on & a model that can actively seek out new information, interact with your system, & perform tasks. Some common MCP "servers" or tools you might use include:
Basic Memory: Gives the agent a way to remember things between sessions.
Write Data: Allows the agent to save information to files.
Context Seven: Helps manage the context window, which is like the model's short-term memory.
Web Searching: Lets the agent search the web for real-time information.
Python Code Execution: A VERY powerful tool that allows the agent to write & run Python code.
By enabling these tools, you're essentially giving your local GPT-OSS agent a set of superpowers.
Step-by-Step Guide: Running GPT-OSS with MCP on Your M-Series Mac
Alright, let's get our hands dirty. Here's how to get everything up and running.
Step 1: Download & Install LM Studio
First things first, head over to the lmstudio.ai website. Download the version for Apple Silicon (it's a .dmg file). The installation is as straightforward as it gets: just drag the LM Studio app into your Applications folder.
When you first launch it, you'll be greeted by a clean, modern interface. Take a moment to look around. You'll see a few main sections on the left-hand side: a search/discover page, a chat page, & a local server page.
Step 2: Download the GPT-OSS Model
Now for the fun part.
Click on the magnifying glass icon (the search page) in the top left.
In the search bar, type
1
gpt-oss
. You should see it pop up as one of the top results, likely from the official OpenAI Hugging Face repository.
On the right side of the screen, you'll see a list of different versions or "quantizations" of the model. These are essentially compressed versions that trade a tiny bit of performance for a much smaller file size & lower memory usage. For an M-series Mac with 16GB of RAM or more, you should be able to comfortably run the
1
gpt-oss-20b
models. I'd recommend starting with one of the
1
Q5_K_M
or
1
Q4_K_M
GGUF files. These tend to offer a good balance of quality & performance.
Click the Download button next to the model file you've chosen. It's a big file, so it might take a little while depending on your internet connection. You can see the download progress at the bottom of the app.
Step 3: Load the Model & Configure It
Once the download is complete, it's time to load the model into memory.
Click on the chat icon (the speech bubble) on the left-hand side.
At the top of the screen, you'll see a dropdown menu that says "Select a model to load." Click it & choose the
1
gpt-oss
model you just downloaded.
Now, this is an important tip: on the right-hand side, look for the Model Configuration panel. I highly suggest toggling on "Manually choose model load parameters." This gives you more control.
One of the most critical settings here is the Context Window Size (n_ctx). By default, LM Studio sets this to 4096 tokens, which is pretty good. But depending on your Mac's RAM, you can often push this higher. The context window is the model's "short-term memory." A larger context window means it can remember more of your conversation & any tool descriptions you've loaded. Try bumping it up to 8192 or even 16384 if you have 32GB of RAM or more. Just be aware that a larger context window will use more memory.
Also in the configuration panel, you can set a Preset. I recommend starting with the "LM Studio" or "ChatML" preset, as these are well-suited for conversational AI.
Once you've configured everything, the model will load into your Mac's memory. You can see the progress & resource usage (RAM, CPU) on the right side. It might take a minute or two. Once it's loaded, you can try having a basic chat with it right in this window to make sure it's working!
Step 4: Setting Up the Local Server with MCP Support
This is where the magic happens. We're going to start a local server that exposes the GPT-OSS model through an OpenAI-compatible API, and—crucially—we're going to enable MCP.
Click on the local server icon (it looks like
1
<->
) on the left.
At the top, make sure your
1
gpt-oss
model is selected.
On the right-hand side, you'll see a section for Tools (MCP). This is the promised land!
Click on "Select a Tools (MCP) configuration." You'll see an option to use a "Simple" server or to add your own. To get started, you can actually just start enabling the built-in tools. You'll see toggles for things like
1
core_memory
,
1
agent_manager
, etc.
To install new tools, you typically use a
1
mcp.json
file. Many open-source agentic tools will provide you with one. You can copy the contents of that JSON file into the configuration in LM Studio. For now, let's just get the server running.
Click the big green "Start Server" button.
You should see logs start to appear in the center panel, indicating that the server is running. It will usually be on
1
localhost:1234
. You've now got a local, OpenAI-compatible API endpoint for your GPT-OSS model!
What Now? Connecting an Agent to Your Local Server
Okay, so you have a server running. Now what?
Your local server at
1
http://localhost:1234/v1/
can be used by any application that supports a custom OpenAI endpoint. This is incredibly powerful. You can point other AI development tools, coding assistants, or custom scripts at your local model instead of hitting OpenAI's paid API.
For example, you could use a tool like Sidekick or WaveTerm & configure them to use your local server. In the settings of these apps, you would:
Add a new custom model.
Give it a name, like "Local GPT-OSS".
Override the base URL to point to
1
http://localhost:1234/v1/
.
Set the API key to anything (it's not needed for the local server, but some apps require the field to be filled).
Now, when you use that application, it will be sending its requests to the GPT-OSS model running on your Mac, complete with the MCP tools you enabled in LM Studio. You can watch the server logs in LM Studio to see the requests coming in, including the tool calls! It's pretty cool to see the agent reasoning about which tool to use, making the call, & then getting the result back.
A Quick Word on Business Applications: Local vs. Cloud
Now, running a local agent is AMAZING for development, testing, & personal use. It's a fantastic way to learn about how AI agents work without racking up a huge bill.
However, if you're a business looking to deploy a customer-facing chatbot or an internal AI assistant, running it on a local Mac probably isn't the right move. You need something that's scalable, reliable, & easy for your whole team to manage.
This is where a platform like Arsturn comes in. Honestly, while I love tinkering with this local stuff, for my actual business needs, a managed solution is a no-brainer. Arsturn helps businesses build no-code AI chatbots trained on their own data. So, instead of messing with model files & server configurations, you can just upload your documents, connect your data sources, & deploy a custom chatbot to your website in minutes. It's designed to provide instant customer support, answer questions, & engage with website visitors 24/7, all without the headache of managing the underlying infrastructure. It's a conversational AI platform that helps businesses build meaningful connections with their audience through personalized chatbots.
So, think of it this way: use local agents on your Mac to experiment & learn, but when you're ready to build something for your business that needs to be robust & always-on, look to a solution like Arsturn.
Tying It All Together
And there you have it. That's the full rundown on how to get a powerful GPT-OSS agent running locally on your M-series Mac, complete with the agentic superpowers of MCP.
We covered why local LLMs are worth your time, how Apple's M-series chips give you an edge, & the difference between tools like Ollama & LM Studio. We walked through downloading & configuring the model in LM Studio, and, most importantly, how to start a local server with MCP support enabled.
The world of local AI is moving at a breakneck pace, & it's genuinely exciting that we can now run these incredibly capable models on our personal computers. It opens up a whole new world of possibilities for developers, researchers, & hobbyists.
So go ahead, give it a shot. Download LM Studio, grab the GPT-OSS model, & start experimenting. See what you can build. You might be surprised at what your Mac is capable of.
Hope this was helpful! Let me know what you think, or if you run into any issues. Happy tinkering