Retrieval Augmented Generation (RAG)

Implement RAG systems to enhance Claude's responses with external knowledge bases

Last updated: May 2025

What is RAG?

Retrieval Augmented Generation (RAG) combines Claude's generative capabilities with external knowledge retrieval. Instead of relying solely on training data, RAG systems dynamically fetch relevant information from external sources to provide more accurate, up-to-date, and contextually relevant responses.

Key Benefits

• Access to real-time information
• Domain-specific knowledge integration
• Reduced hallucinations
• Improved factual accuracy
• Scalable knowledge updates
• Source attribution and transparency

Common Applications

• Customer support chatbots
• Technical documentation Q&A
• Research assistants
• Legal document analysis
• Medical information systems
• Enterprise knowledge bases

RAG Architecture

1. Indexing

Process and store documents as embeddings in a vector database

2. Retrieval

Find relevant documents based on query similarity

3. Generation

Use retrieved context to generate informed responses

Detailed Workflow

1.Document Processing: Split documents into chunks, generate embeddings
2.Query Processing: Convert user query into embedding vector
3.Similarity Search: Find most relevant document chunks
4.Context Injection: Combine retrieved content with user query
5.Response Generation: Claude generates response using retrieved context

Technology Stack

Vector Databases

Supabase + pgvector

PostgreSQL extension with vector support

Pinecone

Managed vector database service

Weaviate

Open-source vector search engine

ChromaDB

Lightweight embedding database

Embedding Models

OpenAI Ada v2

Popular choice for general text embeddings

Sentence Transformers

Open-source models for specialized domains

Cohere Embed

Multilingual embedding capabilities

Voyage AI

High-performance embedding models

Implementation Example

Basic RAG System with Supabase

1. Database Setup

-- Enable pgvector extension
create extension vector;

-- Create documents table
create table documents (
  id bigserial primary key,
  content text,
  metadata jsonb,
  embedding vector(1536)
);

-- Create index for similarity search
create index on documents 
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);

2. Document Indexing (Python)

import openai
from supabase import create_client
import tiktoken

class DocumentIndexer:
    def __init__(self, supabase_url, supabase_key, openai_key):
        self.supabase = create_client(supabase_url, supabase_key)
        openai.api_key = openai_key
        self.encoder = tiktoken.get_encoding("cl100k_base")
    
    def chunk_text(self, text, chunk_size=500, overlap=50):
        &quot;&quot;&quot;Split text into overlapping chunks&quot;&quot;&quot;
        tokens = self.encoder.encode(text)
        chunks = []
        
        for i in range(0, len(tokens), chunk_size - overlap):
            chunk_tokens = tokens[i:i + chunk_size]
            chunk_text = self.encoder.decode(chunk_tokens)
            chunks.append(chunk_text)
            
        return chunks
    
    def get_embedding(self, text):
        &quot;&quot;&quot;Generate embedding for text&quot;&quot;&quot;
        response = openai.Embedding.create(
            input=text,
            model=&quot;text-embedding-ada-002&quot;
        )
        return response[&apos;data&apos;][0][&apos;embedding&apos;]
    
    def index_document(self, content, metadata=None):
        &quot;&quot;&quot;Index a document into the vector database&quot;&quot;&quot;
        chunks = self.chunk_text(content)
        
        for i, chunk in enumerate(chunks):
            embedding = self.get_embedding(chunk)
            
            doc_metadata = {
                **(metadata or {}),
                &quot;chunk_index&quot;: i,
                &quot;total_chunks&quot;: len(chunks)
            }
            
            self.supabase.table(&apos;documents&apos;).insert({
                &apos;content&apos;: chunk,
                &apos;metadata&apos;: doc_metadata,
                &apos;embedding&apos;: embedding
            }).execute()

Retrieval and Generation

RAG Query System

import anthropic

class RAGSystem:
    def __init__(self, supabase_client, anthropic_key):
        self.supabase = supabase_client
        self.claude = anthropic.Anthropic(api_key=anthropic_key)
    
    def retrieve_documents(self, query, limit=5):
        &quot;&quot;&quot;Retrieve relevant documents for a query&quot;&quot;&quot;
        # Generate embedding for query
        query_embedding = self.get_embedding(query)
        
        # Perform similarity search
        result = self.supabase.rpc(
            &apos;match_documents&apos;,
            {
                &apos;query_embedding&apos;: query_embedding,
                &apos;match_threshold&apos;: 0.7,
                &apos;match_count&apos;: limit
            }
        ).execute()
        
        return result.data
    
    def generate_response(self, query, retrieved_docs):
        &quot;&quot;&quot;Generate response using Claude with retrieved context&quot;&quot;&quot;
        # Construct context from retrieved documents
        context = "

".join([
            f&quot;Source: {doc[&apos;metadata&apos;].get(&apos;title&apos;, &apos;Unknown&apos;)}
{doc[&apos;content&apos;]}&quot;
            for doc in retrieved_docs
        ])
        
        prompt = f&quot;&quot;&quot;Based on the following context, please answer the user&apos;s question. 
If the context doesn&apos;t contain enough information to answer the question, 
please say so and explain what information would be needed.

Context:
{context}

Question: {query}

Please provide a helpful and accurate response based on the context provided.&quot;&quot;&quot;

        message = self.claude.messages.create(
            model=&quot;claude-3-5-sonnet-20241022&quot;,
            max_tokens=1000,
            messages=[{
                &quot;role&quot;: &quot;user&quot;,
                &quot;content&quot;: prompt
            }]
        )
        
        return message.content[0].text
    
    def query(self, question):
        &quot;&quot;&quot;Complete RAG workflow&quot;&quot;&quot;
        # Step 1: Retrieve relevant documents
        retrieved_docs = self.retrieve_documents(question)
        
        # Step 2: Generate response with context
        response = self.generate_response(question, retrieved_docs)
        
        return {
            &quot;answer&quot;: response,
            &quot;sources&quot;: [doc[&apos;metadata&apos;] for doc in retrieved_docs],
            &quot;context_used&quot;: len(retrieved_docs)
        }

Advanced RAG Techniques

Hybrid Search

Combine semantic similarity with keyword search for better results.

Implementation:

• Vector similarity (semantic)
• BM25 keyword matching
• Weighted score combination
• Reranking algorithms

Query Expansion

Enhance queries to improve retrieval accuracy.

Techniques:

• Synonym expansion
• Query rewriting with Claude
• Multi-query approaches
• Context-aware expansion

Contextual Compression

Compress retrieved content to include only relevant information.

Benefits:

• Reduced token usage
• Improved focus
• Better context utilization
• Faster processing

Multi-Agent RAG

Use multiple specialized agents for complex queries.

Agents:

• Query router
• Domain specialists
• Fact checker
• Response synthesizer

Best Practices

Do's ✓

•Optimize chunk size for your domain
•Include metadata for better filtering
•Implement relevance scoring
•Use clear source attribution
•Monitor and evaluate performance
•Handle missing context gracefully

Don'ts ✗

•Ignore document quality and preprocessing
•Use overly large or small chunks
•Skip relevance threshold tuning
•Overwhelm context with irrelevant data
•Neglect embedding model selection
•Forget to update stale documents

Practice Projects

Project 1: Documentation Q&A

Build a RAG system for technical documentation that can answer user questions about APIs, setup instructions, and troubleshooting.

Skills: Document processing, embedding generation, similarity search

Project 2: Research Assistant

Create a research assistant that can search through academic papers, extract relevant information, and provide evidence-based answers.

Skills: Academic paper processing, citation handling, multi-document synthesis

Project 3: Customer Support Bot

Develop a customer support system that can access company knowledge bases, product manuals, and FAQ documents to provide accurate support responses.

Skills: Multi-modal content, conversation context, escalation handling

Back to AI Journey