8/27/2024

The Power of Integrating Ollama with MongoDB Atlas Vector Search

Have you been wondering how to start creating your own RAG (Retrieval-Augmented Generation) application without getting bogged down in the nitty-gritty of initializing and running large language models (LLMs) locally? Well, today marks an exciting moment as we dive deep into the newly unleashed open-source tool called Ollama. This innovative tool allows you to kickstart popular LLMs like Llama2, Mistral, and others with incredible ease, abstracting away the complexities of management behind a simple library & API.
In this blog, we’ll explore step-by-step how to build a powerful RAG application by integrating Ollama with MongoDB Atlas Vector Search, along with the versatile Langchain framework. So, grab your coding hat & let’s get cracking on creating the NEXT game-changing AI product together!

Why Use MongoDB Atlas Vector Search for RAG Applications?

MongoDB Atlas Vector Search brings a compelling solution to the table for developers diving into building RAG applications. Its seamless integration with database management harnesses advanced search capabilities, allowing for efficient storage & retrieval of vectorized data alongside operational and transactional data through its flexible schema. In applications that rely on understanding & processing large volumes of text & complex data types, enabling fast and accurate searches within high-dimensional vector spaces proves invaluable. Here’s why it’s a no-brainer to team up MongoDB Atlas with Ollama:
  • Efficient information retrieval: Quick retrieval of relevant information enhances the performance of RAG applications, leading to more precise & contextual responses.
  • Robust management: MongoDB Atlas complements Ollama’s processing prowess, letting you easily manage & analyze large volumes of data.

What is Ollama?

Ollama is a lightweight, flexible framework that simplifies the deployment of LLMs on consumer-grade hardware. The best part? It allows you to harness the power of advanced AI models without the need for powerful GPUs. Ollama's architecture bundles model weights, configurations, & data into a unified package, making it easy to interact with various LLMs, like Llama2, Mistral, & Gemma. The added flexibility of deploying on standard consumer hardware, including Macs, Linux, & Windows systems, means you can create AI applications virtually anywhere!

Prerequisites

Before we jump into the coding part, make sure you have the following:
  • Basic knowledge of Python & MongoDB.
  • An environment capable of running Python programs (like a local machine or a cloud-based IDE).
  • A MongoDB Atlas account & a cluster set up.
  • Access to Ollama & necessary Python packages, which include
    1 langchain
    ,
    1 pymongo
    , etc.

Step-by-Step Guide to Integration

Let’s break down how we can seamlessly integrate Ollama with MongoDB in a few simple steps:

Step 1: Environment Setup

Start by installing the required Python packages. Ensure you have the
1 streamlit
,
1 pymongo
, &
1 langchain
dependencies installed. Don't forget to create a requirements.txt file to keep track of what you need:
1 2 3 4 langchain pymongo streamlit sentence_transformers
Also, download Ollama here to kick things off. After you've set everything up, run the following command to pull the Llama2 model locally:
1 ollama pull llama2

Step 2: MongoDB Atlas Configuration

Next, you’ll want to utilize MongoDB Atlas to store & manage your dataset. For this example, we’ll create a collection called
1 movies
, which will serve as the foundation of our RAG application. Sign into your MongoDB Atlas account, create a cluster (try the free M0 tier), & ensure your database is populated with relevant data (for instance, movie titles & plots).
Follow the steps here to load a sample dataset into your MongoDB cluster.

Step 3: Initialize Ollama & MongoDB Clients

Now it’s time to integrate the Ollama model capabilities with MongoDB. This step involves setting the database connection using your MongoDB URI & initializing the Ollama model with your desired configuration. Here’s how you can do that:
Create a config.py file with the following code:
1 2 3 mongo_uri = "mongodb+srv://user:password@<your-atlas-uri>/?retryWrites=true&w=majority" db_name = "sample_mflix" coll_name = "movies"
You will also need to create a MongoDB Atlas vector search index. The index definition for vector search will look something like this:
1 2 3 4 5 6 7 8 9 10 { "fields": [ { "numDimensions": 384, "path": "plot_embedding_hf", "similarity": "cosine", "type": "vector" } ] }
To efficiently retrieve the MongoDB collection, configure text embeddings for vector search using the HuggingFace embeddings. This means you’ll transform movie descriptions into searchable vectors, making it easy to retrieve relevant information based on user queries.
Create an encoder.py file and add the following code to encode the movie documents:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 from sentence_transformers import SentenceTransformer import pymongo import config mongo_uri = config.mongo_uri db = config.db_name collection = config.coll_name # Initialize db connection connection = pymongo.MongoClient(mongo_uri) collection = connection[db][collection] # Set up the transformer model model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") for doc in collection.find({"plot_embedding_hf": {"$exists": False}}, {}).limit(10): if "vector" not in doc.keys(): movieid = doc["_id"] title = doc["title"] print("computing vector... title: " + title) text = title fullplot = doc.get("fullplot", None) if fullplot: text += ". " + fullplot vector = model.encode(text).tolist() collection.update_one({"_id": movieid}, {"$set": {"plot_embedding_hf": vector, "title": title, "fullplot": fullplot}}, upsert=True) print("vector computed: " + str(doc["_id"])) else: print("vector already computed")
Run this encoder.py file to see the movie documents embedded with new vectors.

Step 5: Building the Streamlit App

Let’s create an interactive user-friendly interface using Streamlit. This app will contain input fields for user queries & buttons to trigger the retrieval and generation process.
Create a main.py file & add the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 import streamlit as st from langchain.callbacks.manager import CallbackManager from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler from langchain.chains import RetrievalQA from langchain.llms import LlamaCpp from langchain.vectorstores import MongoDBAtlasVectorSearch from langchain.embeddings import HuggingFaceEmbeddings import pymongo import config from langchain_community.llms import Ollama # Initialize MongoDB client uri = config.mongo_uri client = MongoClient(uri) db_name = config.db_name coll_name = config.coll_name collection = client[db_name][coll_name] # Initialize text embedding model (encoder) embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2") index_name = "vector_index" vector_field_name = "plot_embedding_hf" text_field_name = "title" # Specify MongoDB Atlas database collection vector search vectorStore = MongoDBAtlasVectorSearch( collection=collection, embedding=embeddings, index_name=index_name, embedding_key=vector_field_name, text_key=text_field_name ) # Callbacks for token-wise streaming callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) # Run LLM Ollama llm = Ollama(model="llama2", callback_manager=callback_manager) # Streamlit App def main(): st.title("Movies Retrieval GPT App") # User input query = st.text_input("Enter query:") # Retrieve context data retriever = vectorStore.as_retriever() # Query LLM with user input context data if st.button("Query LLM"): with st.spinner("Querying LLM..."): qa = RetrievalQA.from_chain_type( llm, chain_type="stuff", retriever=retriever ) response = qa({"query": query}) st.text("Llama2 Response:") st.text(response["result"]) if __name__ == '__main__': main()
Run
1 streamlit run main.py
, and then navigate to
1 localhost:8501
to start asking questions about movies. When a user query is detected, Langchain will use the configured vector search to retrieve relevant data from MongoDB Atlas, passing along the context to Ollama to generate a tailor-made response. Check the MongoDB database to verify if the movie “Space Jam” is there!

Step 6: Performance Optimization & Troubleshooting

As you dive deeper into building your craft with Ollama & MongoDB, it's critical to monitor performance & iron out any potential issues. Here’re some pointers to optimize your applications:
  • Make sure to set proper indexing on your MongoDB collections to speed up retrieval times.
  • For your embeddings, consider using different models based on the nature of the data (complex vs simple queries).
  • Utilize caching mechanisms to speed up responses.

Conclusion: The Future of AI Applications

The integration of Ollama with MongoDB really opens up a plethora of opportunities for developers looking to build cutting-edge RAG applications with minimal hassle. Whether you're a fledgling developer or a seasoned pro, leveraging the strengths of both technologies can lead to innovative solutions that have the potential to revolutionize how we perceive & interact with AI.
If you’re keen on maximizing your audience engagement & boosting conversions, I highly recommend checking out Arsturn. Arsturn empowers you to create custom chatbots utilizing powerful conversational AI without the need for coding. Their platform allows you to build AI solutions tailored specifically for your audience, growing your brand effortlessly!
The tools available today can dramatically improve your user experience, so dive in headfirst & start building some incredible AI-powered applications. Happy building, everyone!

Copyright © Arsturn 2024