How to Set Up an Agentic CLI with Local LLMs

8/11/2025

Beyond Chatbots: How to Set Up a Truly Agentic CLI with Local LLMs

Alright, let's talk. We've all gotten pretty used to chatbots popping up on websites, answering our basic questions. And honestly, they're getting good. But what if you could take that same AI power, rip it out of the web browser, & run it right in your command line? Not just as a chat buddy, but as an agent—something that can actually do things on your computer.

We're talking about a level of automation & assistance that goes way beyond simple Q&A. Imagine typing "summarize this PDF & create a presentation for me," & watching it happen. Or "find the bug in this code, fix it, & commit the changes with a good message." This isn't science fiction anymore; it's what's possible with an agentic CLI powered by local Large Language Models (LLMs).

This is for the developers, the power users, the folks who live in the terminal & want to build their own custom AI assistant that’s private, powerful, & tailored to their exact workflow. It's a bit of a journey, but TOTALLY worth it.

Why Bother with a Local, Agentic CLI?

Before we dive into the "how," let's get into the "why." Why not just use a cloud-based service?

Here’s the thing:

Privacy is HUGE. When you're working with your own source code, sensitive documents, or proprietary data, sending it off to a third-party API is a non-starter for many. With a local LLM, everything—your prompts, your data, the model's responses—stays on your machine. Period. This is critical for industries like healthcare, finance, & law where compliance with regulations like HIPAA or GDPR is mandatory.
Ultimate Control & Customization. You pick the model. You fine-tune it on your own data. You define the tools it can use. You're not stuck with the "one-size-fits-all" approach of a public service. You can build a specialized agent that understands your codebase, your company's jargon, or your personal writing style.
No More API Fees. Those cloud API calls can add up, especially if you're doing a lot of experimentation. Running a model locally is a one-time hardware cost (or you can use what you already have). It's just more cost-effective in the long run, especially for heavy users.
Offline Capability. Once you have your setup, you don't need an internet connection to use it. This is pretty cool for working on the go or in environments with spotty connectivity.
It's the Future, Honestly. The command line is where defined tasks happen. It's the natural environment for engineering work, & integrating an LLM that can execute commands, manage files, & interact with your tools is the next logical step in developer productivity.

The Core Components of Your Agentic CLI

Okay, so you're sold on the idea. What do you actually need to build this thing? The architecture is generally made up of a few key parts:

The Local LLM & Its Runner: This is the brain of your operation. You'll need an open-source LLM & a way to run it efficiently on your machine.
The Agentic Framework: This is the "nervous system." It's the code or library that takes your instructions, reasons about them, & decides which tools to use to accomplish the task. It's what makes the LLM "agentic."
The Command-Line Interface (CLI): This is your entry point. It's how you'll interact with your agent from the terminal.
The Tools: These are the "hands" of your agent. They are scripts or functions that allow the agent to perform actions, like reading a file, searching the web, or executing a shell command.

Let's break down how to set each of these up.

Step 1: Choosing & Running Your Local LLM

This is probably the most intimidating part, but it's gotten SO much easier. You don't need to be an AI researcher to get a powerful model running locally.

The Tool You Absolutely Need: Ollama

Honestly, just start with Ollama. It's a game-changer. Ollama is a desktop platform that lets you download, run, & manage a huge library of open-source LLMs with simple, one-line commands. Think of it like a package manager for AI models. It handles all the complexity of setting up the model & provides a local API endpoint that your agent can talk to.

To get started with Ollama:

Download & install it from their website.
Open your terminal & pull a model. A great starting point is a versatile, smaller model. Try one of these:
- 1ollama pull llama3:8b
  (Meta's latest powerful model)
- 1ollama pull mistral
  (A very popular & capable model)
- 1ollama pull codellama
  (Specialized for coding tasks)

And that's it. You now have a powerful LLM running on your machine. Pretty cool, right?

Which Model Should You Choose?

The "best" model depends on your hardware & your tasks.

For General Purpose & Reasoning: Llama 3 (8B or 70B) is the current king of open-source models, great for dialogue & instruction-following. Mixtral-8x22B is another powerhouse, known for its strong multilingual & coding capabilities.
For Coding-Specific Tasks: CodeLlama is fine-tuned for code generation & understanding. If you're building a coding assistant, this is a must-try.
For Enterprise & RAG: Cohere's Command R+ has an open version available & is optimized for complex workflows that involve retrieving information from documents.

Start with a smaller model (like a 7B or 8B parameter version) to see how it performs on your machine. You can always upgrade to a larger, more powerful model if your hardware can handle it. Remember, larger models require more RAM & a beefier GPU.

Step 2: Picking an Agentic Framework

Now you need the logic that turns your LLM from a simple chatbot into an agent that can take action. You have two main paths here: use a pre-built agentic CLI tool or build your own with a framework.

Path A: Use a Pre-built Agentic CLI (The Fast Track)

Several open-source projects already provide a feature-rich CLI agent out of the box. These are awesome for getting started quickly.

agent-cli: This is a fantastic local-first toolkit. It comes with agents for autocorrecting text, transcribing audio with Whisper, text-to-speech, & a conversational chat agent with tool-calling abilities. It's designed to run 100% locally & can be set up with a
1docker-compose.yml
file that spins up Ollama & other necessary services.
easy-llm-cli: A very user-friendly option built on Node.js. It's compatible with multiple LLM providers (including local ones) & allows you to automate tasks like analyzing a codebase or handling git operations. You can get started with a simple
1npx easy-llm-cli
.
cli-agent: This project, found on Reddit, is another great example of an agentic framework designed for arbitrary LLMs, including local ones run with Ollama. It supports sub-agents for complex tasks like "deep research," where it can spawn multiple "researcher" agents & a "summarizer" agent to create a report.

Path B: Build Your Own with a Framework (The Custom Route)

If you want more control & want to understand the nuts & bolts, building your own agent with a framework like LangChain, LlamaIndex, or AutoGen is the way to go.

LangGraph: Part of the LangChain ecosystem, LangGraph is perfect for building stateful, multi-step agents. You define your agent as a graph where nodes are actions (like querying the LLM or calling a tool) & edges are the transitions between them. A tutorial by Youness Mansar shows how you can build an agent with LangGraph, serve it with a FastAPI UI, & even deploy it.
AutoGen: Developed by Microsoft, AutoGen is a framework for building applications with multiple agents that can converse with each other to solve tasks. You could have a "coder" agent & a "critic" agent that work together to write & refine code. A popular YouTube tutorial by Matthew Berman shows how to power individual AutoGen agents with different local models served by Ollama & LiteLLM. This allows you to use a specialized model for each agent's role (e.g., CodeLlama for the coder, Mistral for the generalist assistant).

A Simple DIY Example using Python & Flask

To give you a concrete idea, here's a conceptual "hello world" for an agentic CLI, inspired by a Medium tutorial.

Backend (brains.py using Flask):