Retrieval Augmented Generation (RAG)
Implement RAG systems to enhance Claude's responses with external knowledge bases
Last updated: May 2025
What is RAG?
Retrieval Augmented Generation (RAG) combines Claude's generative capabilities with external knowledge retrieval. Instead of relying solely on training data, RAG systems dynamically fetch relevant information from external sources to provide more accurate, up-to-date, and contextually relevant responses.
Key Benefits
- • Access to real-time information
- • Domain-specific knowledge integration
- • Reduced hallucinations
- • Improved factual accuracy
- • Scalable knowledge updates
- • Source attribution and transparency
Common Applications
- • Customer support chatbots
- • Technical documentation Q&A
- • Research assistants
- • Legal document analysis
- • Medical information systems
- • Enterprise knowledge bases
RAG Architecture
1. Indexing
Process and store documents as embeddings in a vector database
2. Retrieval
Find relevant documents based on query similarity
3. Generation
Use retrieved context to generate informed responses
Detailed Workflow
- 1.Document Processing: Split documents into chunks, generate embeddings
- 2.Query Processing: Convert user query into embedding vector
- 3.Similarity Search: Find most relevant document chunks
- 4.Context Injection: Combine retrieved content with user query
- 5.Response Generation: Claude generates response using retrieved context
Technology Stack
Vector Databases
Supabase + pgvector
PostgreSQL extension with vector support
Pinecone
Managed vector database service
Weaviate
Open-source vector search engine
ChromaDB
Lightweight embedding database
Embedding Models
OpenAI Ada v2
Popular choice for general text embeddings
Sentence Transformers
Open-source models for specialized domains
Cohere Embed
Multilingual embedding capabilities
Voyage AI
High-performance embedding models
Implementation Example
Basic RAG System with Supabase
1. Database Setup
-- Enable pgvector extension create extension vector; -- Create documents table create table documents ( id bigserial primary key, content text, metadata jsonb, embedding vector(1536) ); -- Create index for similarity search create index on documents using ivfflat (embedding vector_cosine_ops) with (lists = 100);
2. Document Indexing (Python)
import openai from supabase import create_client import tiktoken class DocumentIndexer: def __init__(self, supabase_url, supabase_key, openai_key): self.supabase = create_client(supabase_url, supabase_key) openai.api_key = openai_key self.encoder = tiktoken.get_encoding("cl100k_base") def chunk_text(self, text, chunk_size=500, overlap=50): """Split text into overlapping chunks""" tokens = self.encoder.encode(text) chunks = [] for i in range(0, len(tokens), chunk_size - overlap): chunk_tokens = tokens[i:i + chunk_size] chunk_text = self.encoder.decode(chunk_tokens) chunks.append(chunk_text) return chunks def get_embedding(self, text): """Generate embedding for text""" response = openai.Embedding.create( input=text, model="text-embedding-ada-002" ) return response['data'][0]['embedding'] def index_document(self, content, metadata=None): """Index a document into the vector database""" chunks = self.chunk_text(content) for i, chunk in enumerate(chunks): embedding = self.get_embedding(chunk) doc_metadata = { **(metadata or {}), "chunk_index": i, "total_chunks": len(chunks) } self.supabase.table('documents').insert({ 'content': chunk, 'metadata': doc_metadata, 'embedding': embedding }).execute()
Retrieval and Generation
RAG Query System
import anthropic class RAGSystem: def __init__(self, supabase_client, anthropic_key): self.supabase = supabase_client self.claude = anthropic.Anthropic(api_key=anthropic_key) def retrieve_documents(self, query, limit=5): """Retrieve relevant documents for a query""" # Generate embedding for query query_embedding = self.get_embedding(query) # Perform similarity search result = self.supabase.rpc( 'match_documents', { 'query_embedding': query_embedding, 'match_threshold': 0.7, 'match_count': limit } ).execute() return result.data def generate_response(self, query, retrieved_docs): """Generate response using Claude with retrieved context""" # Construct context from retrieved documents context = " ".join([ f"Source: {doc['metadata'].get('title', 'Unknown')} {doc['content']}" for doc in retrieved_docs ]) prompt = f"""Based on the following context, please answer the user's question. If the context doesn't contain enough information to answer the question, please say so and explain what information would be needed. Context: {context} Question: {query} Please provide a helpful and accurate response based on the context provided.""" message = self.claude.messages.create( model="claude-3-5-sonnet-20241022", max_tokens=1000, messages=[{ "role": "user", "content": prompt }] ) return message.content[0].text def query(self, question): """Complete RAG workflow""" # Step 1: Retrieve relevant documents retrieved_docs = self.retrieve_documents(question) # Step 2: Generate response with context response = self.generate_response(question, retrieved_docs) return { "answer": response, "sources": [doc['metadata'] for doc in retrieved_docs], "context_used": len(retrieved_docs) }
Advanced RAG Techniques
Hybrid Search
Combine semantic similarity with keyword search for better results.
- • Vector similarity (semantic)
- • BM25 keyword matching
- • Weighted score combination
- • Reranking algorithms
Query Expansion
Enhance queries to improve retrieval accuracy.
- • Synonym expansion
- • Query rewriting with Claude
- • Multi-query approaches
- • Context-aware expansion
Contextual Compression
Compress retrieved content to include only relevant information.
- • Reduced token usage
- • Improved focus
- • Better context utilization
- • Faster processing
Multi-Agent RAG
Use multiple specialized agents for complex queries.
- • Query router
- • Domain specialists
- • Fact checker
- • Response synthesizer
Best Practices
Do's ✓
- •Optimize chunk size for your domain
- •Include metadata for better filtering
- •Implement relevance scoring
- •Use clear source attribution
- •Monitor and evaluate performance
- •Handle missing context gracefully
Don'ts ✗
- •Ignore document quality and preprocessing
- •Use overly large or small chunks
- •Skip relevance threshold tuning
- •Overwhelm context with irrelevant data
- •Neglect embedding model selection
- •Forget to update stale documents
Practice Projects
Project 1: Documentation Q&A
Build a RAG system for technical documentation that can answer user questions about APIs, setup instructions, and troubleshooting.
Project 2: Research Assistant
Create a research assistant that can search through academic papers, extract relevant information, and provide evidence-based answers.
Project 3: Customer Support Bot
Develop a customer support system that can access company knowledge bases, product manuals, and FAQ documents to provide accurate support responses.