AutoRAG Knowledge Bases
Overview
AutoRAG (now called AI Search) is Cloudflare’s fully managed Retrieval-Augmented Generation (RAG) service that enables NPCs to have personalized knowledge bases with semantic search capabilities. This feature allows each NPC to retrieve contextually relevant information from their knowledge base during conversations, enabling more accurate and grounded responses.
Status: Planned for v1.4.0
What is AutoRAG?
AutoRAG abstracts the complexity of building RAG pipelines by integrating:
| Component | Description | Cloudflare Service |
|---|---|---|
| Data Storage | Store knowledge documents | R2 Buckets |
| Vector Database | Semantic embeddings storage | Vectorize |
| Embedding Generation | Convert text to vectors | Workers AI |
| Query Rewriting | Improve search queries | Workers AI |
| Response Generation | Generate AI responses | Workers AI |
| API Management | Monitor and control usage | AI Gateway |
Key Benefits for NPCs
- Semantic Search: NPCs can find contextually relevant information, not just keyword matches
- Automatic Embeddings: No need to manually manage vector databases
- Per-NPC Knowledge: Each NPC can have domain-specific knowledge
- Grounded Responses: Responses are based on actual knowledge, reducing hallucinations
- Managed Infrastructure: Cloudflare handles chunking, indexing, and retrieval
Architecture
Current Knowledge Flow (Without RAG)
Knowledge Files (R2) → npc_knowledge_data (D1) → Manual filtering → NPC Prompt
↑ No embeddings
↑ No semantic search
Proposed Architecture (With AutoRAG)
┌─────────────────────────────────────────────────────────────────┐
│ AutoRAG Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ R2 Buckets (Per-NPC Knowledge) │
│ ├── padawanforge-npc-pythagoras/ → AutoRAG "pythagoras-kb" │
│ ├── padawanforge-npc-einstein/ → AutoRAG "einstein-kb" │
│ └── padawanforge-npc-{name}/ → AutoRAG "{name}-kb" │
│ │
│ NPC Chat Request │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐
│ │ 1. Retrieve relevant context from AutoRAG │
│ │ env.AI.autorag(npc.knowledge_base).search({ │
│ │ query: playerMessage, │
│ │ rewrite_query: true, │
│ │ max_results: 5 │
│ │ }) │
│ └─────────────────────────────────────────────────────────────┘
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐
│ │ 2. Inject retrieved context into CoSTAR prompt │
│ └─────────────────────────────────────────────────────────────┘
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐
│ │ 3. Generate response with augmented knowledge │
│ └─────────────────────────────────────────────────────────────┘
│ │
└─────────────────────────────────────────────────────────────────┘
API Methods
AutoRAG provides two primary methods for querying knowledge bases:
1. aiSearch() - Full RAG Response
Returns an AI-generated answer along with source citations.
const answer = await env.AI.autorag("npc-knowledge-base").aiSearch({
query: "What is the theory of relativity?",
stream: false, // Set to true for streaming responses
model: "@cf/meta/llama-3.1-8b-instruct" // Optional: specify model
});
// Response includes:
// - Generated answer text
// - Source citations with filenames
// - Confidence scores
Use Case: Direct Q&A where the full response is generated by AutoRAG.
2. search() - Retrieval Only
Returns relevant document chunks without generating a response.
const searchResult = await env.AI.autorag("npc-knowledge-base").search({
query: playerMessage,
rewrite_query: true, // Improve query before search
max_results: 5, // Number of results (1-50, default 10)
rerank: true // Reorder by semantic relevance
});
// Response includes:
// - Array of matching documents
// - Filename, content, and relevance scores
Use Case: Context injection into existing CoSTAR prompts (recommended for NPCs).
Recommended Approach for NPCs
Use search() for NPCs to maintain the CoSTAR prompt structure and NPC personality:
// 1. Retrieve relevant context
const context = await env.AI.autorag(npc.knowledge_base_name).search({
query: playerMessage,
rewrite_query: true,
max_results: 5
});
// 2. Format context for prompt injection
const ragContext = context.data
.map(doc => `[Source: ${doc.filename}]\n${doc.content}`)
.join('\n\n');
// 3. Inject into CoSTAR prompt
const prompt = buildCoSTARPrompt({
context: `${npcContext}\n\n## Retrieved Knowledge\n${ragContext}`,
// ... rest of CoSTAR structure
});
// 4. Generate response with Workers AI
const response = await env.AI.run(model, { prompt });
Configuration
Wrangler Configuration
No additional configuration is required. The existing AI binding supports AutoRAG:
// wrangler.jsonc
{
"ai": {
"binding": "AI" // Supports env.AI.autorag()
}
}
Database Schema Updates
New columns for NPCs table:
-- Add AutoRAG support to NPCs
ALTER TABLE npcs ADD COLUMN knowledge_base_name TEXT;
ALTER TABLE npcs ADD COLUMN autorag_enabled INTEGER DEFAULT 0;
-- Index for quick lookup
CREATE INDEX idx_npcs_autorag ON npcs(autorag_enabled) WHERE autorag_enabled = 1;
NPC Configuration
interface NPCWithAutoRAG {
// Existing fields...
id: number;
name: string;
system_prompt: string;
// New AutoRAG fields
knowledge_base_name: string | null; // AutoRAG instance name
autorag_enabled: boolean; // Feature flag
}
Setup Guide
Prerequisites
- Cloudflare account with Workers AI access
- R2 bucket for knowledge storage
- AI Gateway (created automatically or manually)
Step 1: Create R2 Bucket for NPC Knowledge
# Using Wrangler CLI
wrangler r2 bucket create padawanforge-npc-pythagoras
Or via Cloudflare Dashboard:
- Navigate to R2 → Create bucket
- Name:
padawanforge-npc-{npc-name} - Upload knowledge documents (TXT, PDF, MD, JSON)
Step 2: Create AutoRAG Instance
Via Cloudflare Dashboard:
- Navigate to AI → AI Search (formerly AutoRAG)
- Click Create
- Configure:
- Name:
{npc-name}-kb(e.g.,pythagoras-kb) - R2 Bucket: Select the NPC’s bucket
- Embedding Model: Default (recommended)
- LLM: Default or custom model
- AI Gateway: Select or create new
- Name:
Step 3: Upload Knowledge Documents
Supported formats:
- Plain Text:
.txt(max 4MB) - Rich Format:
.pdf,.md,.json(max 1MB)
# Upload via Wrangler
wrangler r2 object put padawanforge-npc-pythagoras/geometry-basics.md --file ./docs/geometry.md
wrangler r2 object put padawanforge-npc-pythagoras/pythagorean-theorem.pdf --file ./docs/theorem.pdf
Step 4: Enable AutoRAG for NPC
UPDATE npcs
SET knowledge_base_name = 'pythagoras-kb',
autorag_enabled = 1
WHERE name = 'Pythagoras';
Implementation
NpcAgent.ts Enhancements
// src/lib/services/NpcAgent.ts
export class NpcAgent {
private db: D1Database;
private ai: any; // Cloudflare AI binding
constructor(db: D1Database, ai: any) {
this.db = db;
this.ai = ai;
}
/**
* Retrieve relevant context from AutoRAG knowledge base
*/
async retrieveKnowledgeContext(
npcId: number,
query: string
): Promise<string> {
// Get NPC AutoRAG configuration
const npc = await this.db.prepare(`
SELECT knowledge_base_name, autorag_enabled
FROM npcs WHERE id = ?
`).bind(npcId).first();
// Return empty if AutoRAG not enabled
if (!npc?.autorag_enabled || !npc?.knowledge_base_name) {
return '';
}
try {
// Query AutoRAG for relevant context
const result = await this.ai.autorag(npc.knowledge_base_name).search({
query: query,
rewrite_query: true,
max_results: 5,
rerank: true
});
// Format retrieved chunks for prompt injection
if (!result.data || result.data.length === 0) {
return '';
}
return result.data
.map((doc: any, index: number) =>
`[Source ${index + 1}: ${doc.filename}]\n${doc.content}`
)
.join('\n\n---\n\n');
} catch (error) {
console.error('[NpcAgent] AutoRAG retrieval failed:', error);
return ''; // Graceful fallback
}
}
/**
* Generate NPC response with RAG-augmented context
*/
async generateNPCChatResponseWithRAG(
npcId: number,
playerMessage: string,
conversationHistory: Array<{ role: 'user' | 'assistant'; content: string }>,
context: ChatContext
): Promise<string> {
// Retrieve knowledge context
const ragContext = await this.retrieveKnowledgeContext(npcId, playerMessage);
// Get NPC configuration
const config = await this.getNPCConfiguration(npcId);
// Build CoSTAR prompt with RAG context
const prompt = this.buildCoSTARChatPromptWithRAG(
config,
{
...chatContext,
ragContext
}
);
// Generate response
return this.generateResponse(prompt);
}
}
Updated CoSTAR Prompt Structure
private buildCoSTARChatPromptWithRAG(
config: NPCConfiguration,
chatContext: ChatContextWithRAG
): string {
const ragSection = chatContext.ragContext
? `
## Retrieved Knowledge Base Context
The following information was retrieved from your knowledge base and is relevant
to the current conversation. Use this information to provide accurate, grounded
responses:
${chatContext.ragContext}
IMPORTANT: Base your response on this retrieved knowledge when applicable.
Cite sources naturally in your response when using specific facts.
`
: '';
return this.buildCoSTARPrompt({
context: `You are ${chatContext.npcName}, an AI tutor...
${ragSection}
Your Core Identity:
- Name: ${chatContext.npcName}
- Personality: ${chatContext.npcPersonality}
...`,
objective: `${config.objective}
When knowledge base context is available, prioritize using that information
to provide accurate, grounded responses.`,
// ... rest of CoSTAR structure
});
}
Limits and Pricing
AutoRAG Limits
| Resource | Limit |
|---|---|
| AutoRAG instances per account | 10 |
| Files per instance | 100,000 |
| Plain text file size | 4 MB |
| Rich format file size | 1 MB |
| Max results per query | 50 |
Pricing (Open Beta)
AutoRAG is currently in open beta and free to enable. Underlying services may incur charges:
| Service | Free Tier | Paid |
|---|---|---|
| R2 Storage | 10 GB | $0.015/GB-month |
| Vectorize | 5M vectors | Pay-per-use |
| Workers AI | 10k neurons/day | Pay-per-use |
| AI Gateway | Free | Free |
Migration Strategy
Phase 1: Parallel Implementation
- Add AutoRAG support alongside existing knowledge system
- Feature flag per NPC (
autorag_enabled) - Test with single NPC first (e.g., Pythagoras)
- Monitor quality and latency
Phase 2: Knowledge Migration
- Export existing
npc_knowledge_dataentries - Convert to document format (MD/TXT)
- Upload to NPC-specific R2 buckets
- Create AutoRAG instances
- Validate retrieval quality
Phase 3: Full Rollout
- Enable for all NPCs with knowledge bases
- Deprecate manual knowledge injection
- Update admin UI for knowledge management
- Document new workflows
Best Practices
Knowledge Document Formatting
# Topic: Pythagorean Theorem
## Definition
The Pythagorean theorem states that in a right triangle,
the square of the hypotenuse equals the sum of squares
of the other two sides: a² + b² = c²
## Key Concepts
- Only applies to right triangles
- The hypotenuse is the longest side
- Can be used to find unknown side lengths
## Examples
Example 1: If a = 3 and b = 4, then c = 5
Because: 3² + 4² = 9 + 16 = 25 = 5²
## Common Misconceptions
- Does NOT apply to all triangles
- The formula is not a² + b² = c
Chunking Considerations
AutoRAG automatically chunks documents, but for best results:
- Keep related information together
- Use clear headings and structure
- Avoid very long paragraphs
- Include context within each section
Query Optimization
// Enable query rewriting for better results
const result = await env.AI.autorag("kb").search({
query: playerMessage,
rewrite_query: true, // Improves ambiguous queries
max_results: 5, // Balance between context and token usage
rerank: true // Better relevance ordering
});
Troubleshooting
Common Issues
1. Empty retrieval results
- Verify documents are uploaded to R2
- Check AutoRAG indexing status in dashboard
- Ensure query is relevant to knowledge base content
2. Slow response times
- AutoRAG adds ~200-500ms latency
- Consider caching frequent queries
- Reduce
max_resultsif context is too large
3. Irrelevant results
- Enable
rerank: truefor better relevance - Review document structure and formatting
- Consider query rewriting
Monitoring
Use AI Gateway to monitor:
- Query volume and latency
- Token usage per NPC
- Error rates and failures
- Cost breakdown
Related Documentation
- NPC System - Core NPC framework
- AI Integration - Multi-provider AI service
- Circuit Breaker - Service resilience patterns
- Database Schema - Complete database reference