NewsMate Tailwind Templates

Newsmate

NewsMate is a full-stack AI chatbot that answers news-related queries using a RAG pipeline. It features Xenova embeddings, Qdrant vector search, Gemini API, Redis-based session handling, and a modern React + Tailwind CSS frontend.

🧠 NewsMate: A RAG-Based News Chatbot

πŸ“Œ Objective

NewsMate is an intelligent chatbot that provides real-time, conversational answers based on the latest news articles. It leverages Retrieval-Augmented Generation (RAG) to ground LLM responses in fresh, factual news content sourced from RSS feeds.

πŸ› οΈ Tech Stack

  • Frontend: React.js with Vite
  • Backend: Node.js with Express
  • Embeddings: @xenova/transformers (MiniLM)
  • Vector Store: Qdrant (Cloud-hosted)
  • Database/Cache: Redis (Session management)
  • Language Model: Gemini API
  • Scheduler: Custom Node.js cron-like job

πŸ” Key Features & Architecture

  • RSS Feed Ingestion

    • Regularly fetches articles from multiple RSS sources (e.g., NYTimes).
    • Extracts title, link, and content.
  • Embedding Generation

    • Uses @xenova/transformers to create dense vector representations of article content.
    • Ensures semantic similarity can be measured during retrieval.
  • Deduplication

    • Each article’s link is hashed using SHA-256 and formatted into a UUID-like string.
    • This unique ID prevents duplicate insertion into the vector store.
  • Vector Storage (Qdrant)

    • Embeddings are upserted into the news_articles collection.
    • Qdrant is queried later for semantically similar chunks during a chatbot session.
  • Chatbot (RAG)

    • User question triggers a semantic search in Qdrant.
    • Top-k relevant context chunks are injected into the Gemini API prompt.
    • The model generates context-grounded answers.
  • Session Management

    • Redis is used to track ongoing conversations and maintain continuity.

⚠️ Difficulties Faced & Resolutions | Problem | Cause | Resolution | | ------------------------------------------ | ---------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | | "Bad Request: Invalid ID" from Qdrant | Direct use of URLs as point IDs (invalid format) | Introduced hashing (SHA-256) of links into UUID-like strings | | Duplicate Data Despite No Uploads | Same link repeatedly inserted without proper deduplication logic | Added pre-check using qdrant.retrieve() to skip already embedded articles | | ReferenceError: stats is not defined | Misuse of Qdrant collection metadata in ensureCollectionExists | Removed incorrect stats reference and replaced with proper collection existence check | | Embedding Failures | Some articles had malformed or insufficient content | Added validation and fallback logic per article to skip bad entries | | Silent Failures / Poor Logging | Errors weren't specific or granular | Improved error logs and debug messaging for each critical operation (embedding, upsert, fetch) |

βœ… Outcomes

  • βœ… Successfully built a full-stack news chatbot using RAG.
  • βœ… Embedded 50+ articles into Qdrant and enabled semantic search.
  • βœ… Integrated Gemini for high-quality LLM responses.
  • βœ… Achieved auto-updating context through periodic embedding refreshes.

πŸ“˜ Future Improvements

  • Add user authentication and persistent chat history.
  • Integrate summarization and source citation.
  • Scale to handle multi-lingual feeds.

Top categories

Loading Svelte Themes