8/27/2024

Setting Up Ollama on AWS: Your Comprehensive Guide

In today's world, deploying a Large Language Model (LLM) has become a necessary skill for developers, researchers, and businesses alike. With the rapid advancement of AI technology, frameworks like Ollama allow you to harness the power of models such as Llama 2 and Code Llama for various applications. This guide will take you through the step-by-step process of setting up Ollama on AWS, ensuring you can quickly deploy your AI-powered chatbots or applications.

Why Choose Ollama?

Ollama stands out as an open-sourced solution for deploying local LLMs effortlessly. With features like:
  • Local Access: Enables complete control over data and model usage.
  • Customization: Tailor models to suit specific applications.
  • Compliance & Cost-effectiveness: You can ensure your deployment is cost-effective while adhering to various regulatory standards.
These benefits make Ollama an ideal choice for both small developers & large enterprises looking to enhance their digital interactions. Not to mention, Ollama supports GPU acceleration for enhanced performance on platforms like macOS & Linux.

Getting Started with AWS

First off, you need an AWS account. If you don’t have one, head over to the AWS Sign-Up page. Once you're on your cloud journey, let’s set up Ollama.

Step 1: Initialize Your EC2 Instance

Start by creating an EC2 instance. You can choose various options based on your requirement, but for this guide, we recommend a GPU-enabled instance for optimal performance. Here’s the recommended configuration:
  • Instance Type:
    1 g4dn.xlarge
    (Approximately $390/month)
  • vCPU: 4
  • RAM: 16 GB
  • GPU: 1 (VRAM: 16 GB)
  • EBS Volume: 100 GB (gp3)
  • Operating System: Amazon Linux 2
  • SSH Key: Required for login via PuTTY or similar tools.
Once you configure these settings, launch your instance.

Step 2: Configure the AWS CLI

After launching your instance, it’s time to configure the AWS Command Line Interface (CLI). Here’s how:
  1. Amazon Linux 2 comes with AWS CLI pre-installed.
  2. Connect to your instance via SSH.
  3. Use the command
    1 aws configure
    and input your default region. Omit the access key & secret access key when utilizing an AWS Instance Role.

Step 3: Create an Instance Role

To download the NVIDIA drivers, you need to create an instance role in AWS that allows your EC2 instance to access S3 resources. Go ahead and set it up, preferably with full S3 access for testing purposes.

Step 4: Verify S3 Access

You can verify that your instance has access to S3 by executing the following command:
1 2 bash aws s3 ls
If this lists your S3 buckets, you are good to go!

Step 5: Install NVIDIA GRID Drivers

Ollama on AWS requires NVIDIA GRID drivers, especially for EC2 instances. Here’s what you need to do:
  1. Update & Install Tools: Run:
    1 2 3 4 bash sudo yum update -y sudo yum install gcc sudo reboot
  2. Install the drivers: Here’s the command combo:
    1 2 3 4 5 6 bash sudo yum install -y gcc kernel-devel-$(uname -r) cd /home/ec2-user aws s3 cp --recursive s3://ec2-linux-nvidia-drivers/latest/ . chmod +x NVIDIA-Linux-x86_64*.run sudo ./NVIDIA-Linux-x86_64*.run
  3. Configure Drivers: After installation, you will want to verify that the drivers are correctly installed using:
    1 2 bash nvidia-smi -q | head

Step 6: Setting Up Docker Engine

Docker is essential for running Ollama smoothly. Here’s how to install Docker on Amazon Linux 2:
  1. Install Docker: Run the following commands:
    1 2 3 bash sudo yum update -y sudo yum install docker
  2. Add User: Add the ec2-user to the docker group to run Docker commands without
    1 sudo
    :
    1 2 bash sudo usermod -a -G docker ec2-user
  3. Start Docker: Finally start Docker and enable it to run on boot:
    1 2 3 4 bash sudo systemctl enable docker.service sudo systemctl start docker.service sudo systemctl status docker.service

Step 7: Install NVIDIA Docker Toolkit

Now, you can install the NVIDIA container toolkit to allow Docker containers to use the GPU:
  1. Add Repo: Install the NVIDIA Docker toolkit:
    1 2 3 4 bash curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo sudo yum install -y nvidia-container-toolkit
  2. Configure Docker: Move to configure the Docker runtime:
    1 2 3 bash sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker

Step 8: Install Ollama Server Docker Container

With your Docker environment set, you can deploy Ollama. Run:
1 2 bash docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama --restart always ollama/ollama
This command will run the Ollama server, exposing it on port 11434.

Step 9: Pull Required LLM Models

Next, you’ll want to pull the necessary models to get started. Smaller models might be easier for first testing. For example:
1 2 bash docker exec -it ollama ollama pull llama2

Step 10: Install Ollama Web UI Container

To make your chatbot or application visually appealing and easier to utilize, you can install the Ollama Web UI. Run:
1 2 bash docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v ollama-webui:/app/backend/data --name ollama-webui --restart always ghcr.io/ollama-webui/ollama-webui:main
This container will be your graphical interface for interacting with your models!

Step 11: Accessing the Ollama Web UI

You’ve built it, now let’s access it. Open a web browser and navigate to:
1 http://<your-ec2-public-ip>:3000
Replace
1 <your-ec2-public-ip>
with the Public DNS or IP address of your EC2 instance. Here you'll find the interface to chat with your deployed models.

Final Thoughts

Setting up Ollama on AWS can seem daunting, but by following each step carefully, you can successfully deploy a powerful Large Language Model that will enhance your business operations or personal projects.
The process not only positions you to leverage the advances of AI but also empowers you to maintain control over your data and customization of models. So, embrace the change and build your own conversational AI solutions!
But before you dive deeper into this AI revolution, don’t forget to check out Arsturn. With Arsturn, you can effortlessly create customized ChatGPT chatbots to engage your audience effectively. It’s designed for everyone, ensuring you get to enchant your users without needing any coding skills. Plus, you benefit from insightful analytics that help tailor responses based on real-time interactions. No credit cards are required to start, so give it a go and enhance your audience engagement today!
By following these steps & utilizing resources effectively, you’ll be set on the right track to harnessing the potential of AI through Ollama on AWS! Happy deploying!

Copyright © Arsturn 2024