8/27/2024

Setting Up Local RAG with Ollama

If you’re diving into the world of Retrieval-Augmented Generation (RAG) and fancy the idea of setting it up locally, you’ve stumbled upon the right place! With advancements in technology, especially around Large Language Models (LLMs) like Ollama, it’s now possible to run sophisticated chatbots or data retrieval systems right from your own machine. This guide will walk you through the steps to set up RAG with Ollama, providing you with a handy tool for enhancing your applications.

What is RAG and Why Use It?

RAG stands for Retrieval-Augmented Generation, a powerful approach that combines the generative capabilities of LLMs with customizable databases or knowledge bases. Here’s why you might want to set up RAG:

Full Customization: Running RAG locally means complete control over the setup and customization of your model. You have the power to fine-tune your system without relying on external services.
Enhanced Privacy: Maintaining sensitive data locally helps mitigate risks associated with sending confidential information over the internet.
Data Security: Using local models means you’re less exposed to potential data breaches and misuse from remote servers.
Independence from Internet Connectivity: When everything is set up locally, you can run your chatbot or AI-based applications even without a stable internet connection.

Let’s explore how you can set this up step-by-step!

Prerequisites for Setting Up RAG

Before you dive into installation, make sure you have the following:

Python 3: You’ll need this versatile programming language to write code for your RAG application. Check Python's official page for installation.
ChromaDB: This is a vector database used to store and manage embeddings data. It allows you to work with large datasets efficiently.
Ollama: Head over to the Ollama download page to install the LLM server which processes the models.

Step-by-Step Setup

Step 1: Installing Python 3

Download & install Python from the official website. Make sure to add Python to your system PATH.
Verify your installation by running this command in your terminal:
1 2bash python3 --version # Should report version 3.11.7 or later

Step 2: Create a Virtual Environment

Creating a virtual environment helps keep your projects clean and organized. Set it up as follows:

bash
mkdir local-rag
cd local-rag
python3 -m venv venv
source venv/bin/activate  # On Windows, it's venv\Scripts\activate

Step 3: Installing Required Libraries

With your virtual environment activated, install the required dependencies. Run:

bash
pip install --q chromadb
pip install --q unstructured langchain langchain-text-splitters
pip install --q flask

Step 4: Installing Ollama

You should get Ollama running on your machine. Here’s how:

Download Ollama from its official website (Click here) based on your operating system.
Confirm that Ollama is installed correctly:
1 2bash ollama --version # Expected output: ollama version x.xx.x
Pull in the model you’ll need. For example, let’s grab the Mistral model:
1 2bash ollama pull mistral
If you need a text embedding model, pull it as well:
1 2bash ollama pull nomic-embed-text
Run Ollama models with:
1 2bash ollama serve

Step 5: Build Your RAG Application Code

Create a new file named

app.py

where you’ll define the main functionality. This file will handle requests and manage actions related to your RAG app. Here’s a simple example: ```python import os from dotenv import load_dotenv load_dotenv() from flask import Flask, request, jsonify from embed import embed from query import query from get_vector_db import get_vector_db

Initialize Flask application

TEMPFOLDER = os.getenv('TEMPFOLDER', './_temp') os.makedirs(TEMP_FOLDER, exist_ok=True) app = Flask(__name)

@app.route('/embed', methods=['POST']) def route_embed(): if 'file' not in request.files: return jsonify({'error': 'No file part'}), 400 file = request.files['file'] if file.filename == '': return jsonify({'error': 'No selected file'}), 400 embedded = embed(file) if embedded: return jsonify({'message': 'File embedded successfully'}), 200 return jsonify({'error': 'File embedded unsuccessfully'}), 400

@app.route('/query', methods=['POST']) def route_query(): data = request.get_json() response = query(data.get('query')) if response: return jsonify({'message': response}), 200 return jsonify({'error': 'Something went wrong'}), 400

if name == 'main': app.run(host='0.0.0.0', port=8080, debug=True) ``` This code initializes a Flask app with two main routes: one for embedding files and one for querying the model.

Step 6: Handle Embedding with
`1embed.py`

In a new file named

embed.py

, handle the embedding process: ```python import os datetimeimport datetime from werkzeug.utils import secure_filename from langchain_community.document_loaders import UnstructuredPDFLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from get_vector_db import get_vector_db

TEMP_FOLDER = os.getenv('TEMP_FOLDER', './_temp')

Function to check if the uploaded file is allowed

def allowed_file(filename): return '.' in filename and filename.rsplit('.', 1)[1].lower() == 'pdf'

Function to save the uploaded file

def savefile(file): ct = datetime.now() ts = ct.timestamp() filename = str(ts) + '' + secure_filename(file.filename) file_path = os.path.join(TEMP_FOLDER, filename) file.save(file_path) return file_path

Main embedding function

def embed(file): if file.filename == '' or not allowed_file(file.filename): return False file_path = save_file(file) loader = UnstructuredPDFLoader(file_path=file_path) data = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100) chunks = text_splitter.split_documents(data)

1
2
3
4
5
db = get_vector_db()
db.add_documents(chunks)
db.persist()
os.remove(file_path)
return True

1
2
3
4
5
This code manages file uploads, validates them, and processes them into usable text chunks for further embedding.

### Step 7: Process User Queries in `query.py`

Now, let’s create `query.py` to handle user queries and generate answers:

python import os from langchain_community.chat_models import ChatOllama from langchain.prompts import PromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from langchain.retrievers.multi_query import MultiQueryRetriever from get_vector_db import get_vector_db

LLM_MODEL = os.getenv('LLM_MODEL', 'mistral')

Function to process user queries

def query(question): db = get_vector_db() retriever = db.as_retriever(search_type='similarity')

1
2
3
4
response = retriever.get_relevant_docs(question)
llm = ChatOllama(model=LLM_MODEL)
final_response = llm.predict(response)
return final_response

1
2
3
4
5
This will manage how queries from users are processed through the model.

### Step 8: Testing Your Local RAG Application

With all the pieces in place, it’s time to test if your setup works. Start your server by running:

bash python3 app.py

1
2

``
Visit your

http://localhost:8080` in your web browser. You can create forms or use tools like Postman to submit files that you want to embed and process queries.

Why Ollama is the Best Choice

Ease of Installation

Ollama makes it easy to spin up a local environment with powerful models without going through complex processes that tend to derail most beginners. You just have to pull the models you want, and Ollama does the heavy lifting behind the scenes.

Deep Integration with Other Tools

Many examples showcase LangChain, ChromaDB, or even FAISS in conjunction with Ollama, emphasizing its versatility across projects. Want to enhance your RAG implementation? Integrate with tools that suit your specific needs effortlessly!

Boosting Your Project with Arsturn

Once you’ve got your local RAG set up and running, you might want to think about engaging your audience differently. That’s where Arsturn comes in! It lets you instantly create custom ChatGPT chatbots for your website, enhancing user engagement and conversions. You can create chatbots tailored specifically to your audience’s needs without any complex coding knowledge required!

Effortless Customization: You can create functional chatbots tailored to your brand image.
Gain Insights: Every interaction offers analytics you can use to improve your services.
Seamless Integration: Embed your Arsturn chatbot in just a few clicks.

Head to Arsturn.com for more information and start crafting AI-driven engaging chat experiences!

Conclusion

Setting up a local RAG system with Ollama can be an exciting journey into the capabilities of AI and LLMs. With the guidelines laid out in this post, you’re well-equipped to build your very own local system. Also, don’t forget the potential of enhancing your audience interaction with tools like Arsturn. Embrace this cutting-edge technology to keep moving forward in today’s digital landscape!

Happy coding!