8/27/2024

Creating a News Aggregator with Ollama

In today's fast-paced digital world, staying up-to-date with the latest news can be quite a challenge. With the overwhelming amount of information available online, it’s easy to feel lost. That’s where a personalized news aggregator comes into play. By leveraging Ollama, you can curate articles from various sources into a single, easy-to-navigate platform. This blog post will walk you through creating your very own news aggregator using Ollama, the Llama3 language model, and LangChain to simplify content retrieval and summarization.

Why Use a News Aggregator?

  1. Save Time: Using a news aggregator allows you to efficiently gather information from multiple sources without having to visit each site individually.
  2. Personalization: You can tailor your news feed based on your interests, filtering out the noise to focus only on what matters to you.
  3. Stay Informed: With a centralized place for news, you’re less likely to miss crucial updates on topics you care about.

Overview of the Technology Stack

Creating a news aggregator can be boiled down to a few essential components:
  • Ollama: An innovative platform that allows you to run and host various language models including Llama3.
  • LangChain: A powerful framework that helps with data processing and text generation across multiple sources.
  • News API: A reliable service for fetching current and relevant news articles from various publishers.

Getting Started with Ollama

To kick off your news aggregator project, you'll need to set up Ollama. Here’s how:
  1. Installing Ollama: This application is available on multiple platforms. For example;
  2. Create an Account: You'll need to create an account to use the service. This allows you to save your preferences and data securely.

Setting Environment & Dependencies

After setting up Ollama, we need to install various Python packages to get everything running smoothly. You’ll need:
  • 1 langchain
  • 1 ollama
  • 1 newsapi-python
You can install these via pip:
1 2 bash pip install langchain ollama newsapi-python
Next, you need to get your News API key by signing up on their site, which will allow your aggregator to fetch news articles.

Building the Core Functions

1. Fetching News

First, let’s create the function that will fetch news articles from the News API. This is crucial for gathering the latest and most relevant content. Here’s how you can do it: ```python from newsapi import NewsApiClient import json
newsapi = NewsApiClient(api_key='YOUR_API_KEY')
def latest_news(query):
all_articles = newsapi.get_everything(q=query, language='en', sort_by='publishedAt')
extracted_data = [] for article in all_articleslen(extracted_data) >= 10: break extracted_data.append({ 'title': article.get('title', 'No title'), 'description': article.get('description', 'No description available'), 'url': article.get('url', 'No Url') })
with open('news.json', 'w') as p: json.dump(extracted_data, p) ``` This function queries the news based on a keyword and fetches the latest articles, limiting the number of articles saved to 10 for simplicity.

2. Splitting Text Processing

After fetching articles, you will need to process the content. This is where LangChain comes in handy. The following function loads and splits your articles into manageable chunks: ```python from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.community.document_loaders import JSONLoader
def load_documents(file_path): loader = JSONLoader(file_path=file_path, jq_schema='.[].{description: .description, url: .url}', text_content=False) return loader.load()
def split_documents(documents): text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=128) return text_splitter.split_documents(documents) ``` This workflow helps to prepare your text data for the next step, which involves creating embeddings for easier retrieval!

3. Creating Embeddings

To enable fast search and retrieval, we'll create a vector store document embedding: ```python from langchain.community.vectorstores import Chroma from langchain.community.embeddings import OllamaEmbeddings
def create_vector_store(documents): embedding_model = OllamaEmbeddings(model='llama3') vector_store = Chroma.from_documents(documents=documents, embedding=embedding_model) return vector_store.as_retriever() ``` This function utilizes the OllamaEmbeddings model to generate embeddings for the articles you have loaded.

4. Generate a Newsletter

With everything set up, let’s assemble the pieces together to generate a simple newsletter: ```python datetime import date
def generate_newsletter(topic): latest_news(topic) documents = load_documents('news.json') document_splits = split_documents(documents) retriever = create_vector_store(document_splits)
1 2 # Create a formatted newsletter formatted_newsletter = f'# Daily Digest: {date.today()}

{topic.capitalize()} Updates\n\n"

1 2 3 4 for document in retriever.invoke(topic): formatted_newsletter += f'### [{document.title}]({document.url})\n**Summary**: {document.description}\n\n' return formatted_newsletter
1 2 3 4 This generates a well-structured newsletter that can be sent out to your audience or displayed on your site. ## Sample Output When you run the above code, you will get a markdown-styled output similar to this:

Daily Digest: 2024-05-07

World News Updates

Title of Main News Article

Summary: Brief description of the main news article.

Second News Article Title

Summary: Summary of the second article. ```

Designing Your Aggregator

This implementation provides a good starting point. But you can also enhance your news aggregator with more features:
  • Search Functionality: Let users search for specific topics.
  • Themed Newsletters: Categories based on interests (e.g., technology, politics).
  • Custom Appearance: Adjust styles to match your brand.
  • Integration: Use APIs from other services for social media or blogs.

Conclusion

Creating a personalized news aggregator can be easy with all the powerful tools available today like Ollama and LangChain. An aggregator not only helps in pulling the latest information from varied sources, but also provides a tailored look at news that matters to you.
And speaking of harnessing powerful tools for your needs, have you checked out Arsturn?
With Arsturn, businesses can easily create customizable and interactive chatbots without any coding skills. Engage your audience in conversations that matter and increase your conversion rates. Arsturn lets you channel the power of AI, making it easy to connect with your audience in meaningful ways from day one!
So, if you're ready to take your engagement to the next level, give Arsturn a try today! No credit card is required, so what are you waiting for?
Happy coding, and good luck with your news aggregator!
--- Acknowledgments: Much credit goes to the developers of Ollama and LangChain for creating a fantastic framework that simplifies these processes.
Source: This guide is inspired by various discussions and examples found on platforms like Medium, GitHub, and the official Ollama Documentation.

Copyright © Arsturn 2024