8/26/2024

LlamaIndex Tutorial: Your First Steps

Welcome to the LlamaIndex adventure! If you’re digging into the world of Large Language Models (LLMs), then you're in for a real treat. This tutorial is packed with everything you need to get started with LlamaIndex and create your mini query engine using OpenAI's powerful models. Let’s dive right in!

What Is LlamaIndex?

LlamaIndex is your go-to framework for building context-augmented applications powered by LLMs. It's crafted to bridge the gap between these powerful AI models and your own private, domain-specific data. By utilizing LlamaIndex, you can leverage structured ingestion, organization, and querying of diverse data sources, including APIs, databases, and documents. So, whether you're a novice just starting out or a savvy developer looking for advanced customization, there's something in the LlamaIndex toolkit for everyone!

Why Use LlamaIndex?

Seamless Integration: Easily link various data sources like PDFs or SQL databases with LLMs.
Efficient Querying: Natural language querying is made simple, letting you sift through your private data without a hitch.
Customizable Options: From high-level APIs for beginners to low-level access for experts, explore LlamaIndex's extensive capabilities.

Setting Up LlamaIndex

Before we start coding, there are a few things we need to set up.

1. Installation

To install LlamaIndex, you'll want to use

pip

. Simply run this command:

1
2

bash
pip install llama-index

Don’t forget to install any additional requirements if prompted. You should also check that you have the latest version of Python. LlamaIndex supports Python 3.7 and later.

2. Download the Data

For our first example, we’re going to use Paul Graham's essay, “What Worked On.” It’s a great piece to show how LlamaIndex processes various kinds of text. You can easily grab this data by creating a folder called

data

and downloading the text file from the following link: Download Paul Graham’s Essay.

3. Set Up Your OpenAI API Key

LlamaIndex typically uses OpenAI's

gpt-3.5-turbo

model by default. You’ll need to set your OpenAI API Key to access this. Here’s how you can do it:

On MacOS/Linux, use this command in your terminal:
1 2bash export OPENAI_API_KEY=YOUR_API_KEY_HERE
If you’re on Windows, the command looks a bit different:
1 2bash set OPENAI_API_KEY=YOUR_API_KEY_HERE
Replace
1YOUR_API_KEY_HERE
with your actual OpenAI API key which you can get from OpenAI’s API key page.

Your First LlamaIndex Script

Now that we've got all our ducks in a row, let’s jump into the heart of the action. Create a file named

starter.py

in your working directory. We’ll kick things off with the following code: ```python from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data() index = VectorStoreIndex.from_documents(documents)

1
2

``
This simple snippet loads the document from the

data` folder and creates an index from it. How easy was that?

Visualizing Your Structure

Your directory should look something like this:

1
2
3

├── starter.py
└── data
    └── paul_graham_essay.txt

Querying Your Data

With your index built, let’s move onto asking some questions! We can add the following lines to our

starter.py

file:

python
query_engine = index.as_query_engine()
response = query_engine.query("What did the author focus on growing up?")
print(response)

When you run your script now, it should give you a brief answer based on the essay. It might say something like, "The author focused on writing and programming outside of school," or similar context that matches the query.

Logging : Peek Under the Hood

If you want to see more of what’s going on behind the scenes while the program runs, you can add a logging feature. Just add these lines to the top of your

starter.py

: ```python import logging import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

1
2

``
This will give you a more verbose output! You can set the level to

INFO` if you don’t want as much detail.

Storing Your Index

By default, your data is loaded and stored in memory — great for temporary operations but not efficient if you’re running multiple queries. To improve performance, let’s store the index. Add this line:

1
2

python
index.storage_context.persist()

This command persists your index to disk, making it quicker to load next time. It stores your embeddings in a

storage

directory.

Loading Existing Index

We can also check if the stored index exists before creating a new one. Here’s how: ```python import os.path from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader, StorageContext, load_index_from_storage, )

PERSIST_DIR = "./storage" if not os.path.exists(PERSIST_DIR): documents = SimpleDirectoryReader("data").load_data() index = VectorStoreIndex.from_documents(documents) index.storage_context.persist(persist_dir=PERSIST_DIR) else: storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR) index = load_index_from_storage(storage_context) ``` With this code in place, your script will now check if a previous index exists; if it does, it loads that instead of creating a new one.

Efficient Querying

Now you can efficiently query your index! Regardless of whether it’s freshly built or loaded from previous work, the command remains the same — just change the query:

python
query_engine = index.as_query_engine()
response = query_engine.query("What did the author focus on growing up?")
print(response)

Exploring Further with LlamaIndex

Congratulations! You've just built your first application using LlamaIndex. However, this is just the beginning.

Key Features to Explore

Natural Language Queries: Ask questions in plain language to fetch data.
Index Customization: Play around with different indexing strategies based on your data.
LLM Integration: Utilize various LLMs for advanced use cases and richer responses.

Arsturn: Unlock Your Chatbot Potential

As you continue exploring and building on LlamaIndex, why not consider integrating conversational AI into your projects? With Arsturn, you can effortlessly create customized chatbots that boost audience engagement & conversions. Whether you need an FAQ bot or a personal assistant, Arsturn allows you to design, train, and deploy bots that fit your needs. Plus, it’s super user-friendly – no coding skills required! So, take your bot to the NEXT LEVEL and engage your audience like never before with Arsturn.

Wrapping It Up

Moving forward, don't forget to check out high-level concepts like RAG (Retrieval-Augmented Generation) for information retrieval alongside LLMs. Whether you're diving deeper into integration or simply refining your querying capabilities, there's a lot to learn. If you're curious about customization or specific modules, LlamaIndex has an extensive range of component guides to help you out!

Now get out there & start creating with LlamaIndex. Happy coding!

Time to take your skills for a spin and find amazing use-cases for LlamaIndex that you will surely love! The possibilities are endless, and with the right tools & a little creativity, who knows what you might concoct?