AutoRAG Knowledge Bases

Overview

AutoRAG (now called AI Search) is Cloudflare’s fully managed Retrieval-Augmented Generation (RAG) service that enables NPCs to have personalized knowledge bases with semantic search capabilities. This feature allows each NPC to retrieve contextually relevant information from their knowledge base during conversations, enabling more accurate and grounded responses.

Status: Planned for v1.4.0

What is AutoRAG?

AutoRAG abstracts the complexity of building RAG pipelines by integrating:

Component	Description	Cloudflare Service
Data Storage	Store knowledge documents	R2 Buckets
Vector Database	Semantic embeddings storage	Vectorize
Embedding Generation	Convert text to vectors	Workers AI
Query Rewriting	Improve search queries	Workers AI
Response Generation	Generate AI responses	Workers AI
API Management	Monitor and control usage	AI Gateway

Key Benefits for NPCs

Semantic Search: NPCs can find contextually relevant information, not just keyword matches
Automatic Embeddings: No need to manually manage vector databases
Per-NPC Knowledge: Each NPC can have domain-specific knowledge
Grounded Responses: Responses are based on actual knowledge, reducing hallucinations
Managed Infrastructure: Cloudflare handles chunking, indexing, and retrieval

Architecture

Current Knowledge Flow (Without RAG)

Knowledge Files (R2) → npc_knowledge_data (D1) → Manual filtering → NPC Prompt
                       ↑ No embeddings
                       ↑ No semantic search

Proposed Architecture (With AutoRAG)

┌─────────────────────────────────────────────────────────────────┐
│                     AutoRAG Architecture                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  R2 Buckets (Per-NPC Knowledge)                                │
│  ├── padawanforge-npc-pythagoras/  → AutoRAG "pythagoras-kb"   │
│  ├── padawanforge-npc-einstein/    → AutoRAG "einstein-kb"     │
│  └── padawanforge-npc-{name}/      → AutoRAG "{name}-kb"       │
│                                                                 │
│  NPC Chat Request                                               │
│       │                                                         │
│       ▼                                                         │
│  ┌─────────────────────────────────────────────────────────────┐
│  │ 1. Retrieve relevant context from AutoRAG                   │
│  │    env.AI.autorag(npc.knowledge_base).search({              │
│  │      query: playerMessage,                                   │
│  │      rewrite_query: true,                                    │
│  │      max_results: 5                                          │
│  │    })                                                        │
│  └─────────────────────────────────────────────────────────────┘
│       │                                                         │
│       ▼                                                         │
│  ┌─────────────────────────────────────────────────────────────┐
│  │ 2. Inject retrieved context into CoSTAR prompt              │
│  └─────────────────────────────────────────────────────────────┘
│       │                                                         │
│       ▼                                                         │
│  ┌─────────────────────────────────────────────────────────────┐
│  │ 3. Generate response with augmented knowledge               │
│  └─────────────────────────────────────────────────────────────┘
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

API Methods

AutoRAG provides two primary methods for querying knowledge bases:

1. `aiSearch()` - Full RAG Response

Returns an AI-generated answer along with source citations.

const answer = await env.AI.autorag("npc-knowledge-base").aiSearch({
  query: "What is the theory of relativity?",
  stream: false,  // Set to true for streaming responses
  model: "@cf/meta/llama-3.1-8b-instruct"  // Optional: specify model
});

// Response includes:
// - Generated answer text
// - Source citations with filenames
// - Confidence scores

Use Case: Direct Q&A where the full response is generated by AutoRAG.

2. `search()` - Retrieval Only

Returns relevant document chunks without generating a response.

const searchResult = await env.AI.autorag("npc-knowledge-base").search({
  query: playerMessage,
  rewrite_query: true,   // Improve query before search
  max_results: 5,        // Number of results (1-50, default 10)
  rerank: true           // Reorder by semantic relevance
});

// Response includes:
// - Array of matching documents
// - Filename, content, and relevance scores

Use Case: Context injection into existing CoSTAR prompts (recommended for NPCs).

Recommended Approach for NPCs

Use search() for NPCs to maintain the CoSTAR prompt structure and NPC personality:

// 1. Retrieve relevant context
const context = await env.AI.autorag(npc.knowledge_base_name).search({
  query: playerMessage,
  rewrite_query: true,
  max_results: 5
});

// 2. Format context for prompt injection
const ragContext = context.data
  .map(doc => `[Source: ${doc.filename}]\n${doc.content}`)
  .join('\n\n');

// 3. Inject into CoSTAR prompt
const prompt = buildCoSTARPrompt({
  context: `${npcContext}\n\n## Retrieved Knowledge\n${ragContext}`,
  // ... rest of CoSTAR structure
});

// 4. Generate response with Workers AI
const response = await env.AI.run(model, { prompt });

Configuration

Wrangler Configuration

No additional configuration is required. The existing AI binding supports AutoRAG:

// wrangler.jsonc
{
  "ai": {
    "binding": "AI"  // Supports env.AI.autorag()
  }
}

Database Schema Updates

New columns for NPCs table:

-- Add AutoRAG support to NPCs
ALTER TABLE npcs ADD COLUMN knowledge_base_name TEXT;
ALTER TABLE npcs ADD COLUMN autorag_enabled INTEGER DEFAULT 0;

-- Index for quick lookup
CREATE INDEX idx_npcs_autorag ON npcs(autorag_enabled) WHERE autorag_enabled = 1;

NPC Configuration

interface NPCWithAutoRAG {
  // Existing fields...
  id: number;
  name: string;
  system_prompt: string;

  // New AutoRAG fields
  knowledge_base_name: string | null;  // AutoRAG instance name
  autorag_enabled: boolean;             // Feature flag
}

Setup Guide

Prerequisites

Cloudflare account with Workers AI access
R2 bucket for knowledge storage
AI Gateway (created automatically or manually)

Step 1: Create R2 Bucket for NPC Knowledge

# Using Wrangler CLI
wrangler r2 bucket create padawanforge-npc-pythagoras

Or via Cloudflare Dashboard:

Navigate to R2 → Create bucket
Name: padawanforge-npc-{npc-name}
Upload knowledge documents (TXT, PDF, MD, JSON)

Step 2: Create AutoRAG Instance

Via Cloudflare Dashboard:

Navigate to AI → AI Search (formerly AutoRAG)
Click Create
Configure:
- Name: {npc-name}-kb (e.g., pythagoras-kb)
- R2 Bucket: Select the NPC’s bucket
- Embedding Model: Default (recommended)
- LLM: Default or custom model
- AI Gateway: Select or create new

Step 3: Upload Knowledge Documents

Supported formats:

Plain Text: .txt (max 4MB)
Rich Format: .pdf, .md, .json (max 1MB)

# Upload via Wrangler
wrangler r2 object put padawanforge-npc-pythagoras/geometry-basics.md --file ./docs/geometry.md
wrangler r2 object put padawanforge-npc-pythagoras/pythagorean-theorem.pdf --file ./docs/theorem.pdf

Step 4: Enable AutoRAG for NPC

UPDATE npcs
SET knowledge_base_name = 'pythagoras-kb',
    autorag_enabled = 1
WHERE name = 'Pythagoras';

Implementation

NpcAgent.ts Enhancements

// src/lib/services/NpcAgent.ts

export class NpcAgent {
  private db: D1Database;
  private ai: any;  // Cloudflare AI binding

  constructor(db: D1Database, ai: any) {
    this.db = db;
    this.ai = ai;
  }

  /**
   * Retrieve relevant context from AutoRAG knowledge base
   */
  async retrieveKnowledgeContext(
    npcId: number,
    query: string
  ): Promise<string> {
    // Get NPC AutoRAG configuration
    const npc = await this.db.prepare(`
      SELECT knowledge_base_name, autorag_enabled
      FROM npcs WHERE id = ?
    `).bind(npcId).first();

    // Return empty if AutoRAG not enabled
    if (!npc?.autorag_enabled || !npc?.knowledge_base_name) {
      return '';
    }

    try {
      // Query AutoRAG for relevant context
      const result = await this.ai.autorag(npc.knowledge_base_name).search({
        query: query,
        rewrite_query: true,
        max_results: 5,
        rerank: true
      });

      // Format retrieved chunks for prompt injection
      if (!result.data || result.data.length === 0) {
        return '';
      }

      return result.data
        .map((doc: any, index: number) =>
          `[Source ${index + 1}: ${doc.filename}]\n${doc.content}`
        )
        .join('\n\n---\n\n');

    } catch (error) {
      console.error('[NpcAgent] AutoRAG retrieval failed:', error);
      return ''; // Graceful fallback
    }
  }

  /**
   * Generate NPC response with RAG-augmented context
   */
  async generateNPCChatResponseWithRAG(
    npcId: number,
    playerMessage: string,
    conversationHistory: Array<{ role: 'user' | 'assistant'; content: string }>,
    context: ChatContext
  ): Promise<string> {
    // Retrieve knowledge context
    const ragContext = await this.retrieveKnowledgeContext(npcId, playerMessage);

    // Get NPC configuration
    const config = await this.getNPCConfiguration(npcId);

    // Build CoSTAR prompt with RAG context
    const prompt = this.buildCoSTARChatPromptWithRAG(
      config,
      {
        ...chatContext,
        ragContext
      }
    );

    // Generate response
    return this.generateResponse(prompt);
  }
}

Updated CoSTAR Prompt Structure

private buildCoSTARChatPromptWithRAG(
  config: NPCConfiguration,
  chatContext: ChatContextWithRAG
): string {
  const ragSection = chatContext.ragContext
    ? `
## Retrieved Knowledge Base Context

The following information was retrieved from your knowledge base and is relevant
to the current conversation. Use this information to provide accurate, grounded
responses:

${chatContext.ragContext}

IMPORTANT: Base your response on this retrieved knowledge when applicable.
Cite sources naturally in your response when using specific facts.
`
    : '';

  return this.buildCoSTARPrompt({
    context: `You are ${chatContext.npcName}, an AI tutor...
${ragSection}

Your Core Identity:
- Name: ${chatContext.npcName}
- Personality: ${chatContext.npcPersonality}
...`,

    objective: `${config.objective}

When knowledge base context is available, prioritize using that information
to provide accurate, grounded responses.`,

    // ... rest of CoSTAR structure
  });
}

Limits and Pricing

AutoRAG Limits

Resource	Limit
AutoRAG instances per account	10
Files per instance	100,000
Plain text file size	4 MB
Rich format file size	1 MB
Max results per query	50

Pricing (Open Beta)

AutoRAG is currently in open beta and free to enable. Underlying services may incur charges:

Service	Free Tier	Paid
R2 Storage	10 GB	$0.015/GB-month
Vectorize	5M vectors	Pay-per-use
Workers AI	10k neurons/day	Pay-per-use
AI Gateway	Free	Free

Migration Strategy

Phase 1: Parallel Implementation

Add AutoRAG support alongside existing knowledge system
Feature flag per NPC (autorag_enabled)
Test with single NPC first (e.g., Pythagoras)
Monitor quality and latency

Phase 2: Knowledge Migration

Export existing npc_knowledge_data entries
Convert to document format (MD/TXT)
Upload to NPC-specific R2 buckets
Create AutoRAG instances
Validate retrieval quality

Phase 3: Full Rollout

Enable for all NPCs with knowledge bases
Deprecate manual knowledge injection
Update admin UI for knowledge management
Document new workflows

Best Practices

Knowledge Document Formatting

# Topic: Pythagorean Theorem

## Definition
The Pythagorean theorem states that in a right triangle,
the square of the hypotenuse equals the sum of squares
of the other two sides: a² + b² = c²

## Key Concepts
- Only applies to right triangles
- The hypotenuse is the longest side
- Can be used to find unknown side lengths

## Examples
Example 1: If a = 3 and b = 4, then c = 5
Because: 3² + 4² = 9 + 16 = 25 = 5²

## Common Misconceptions
- Does NOT apply to all triangles
- The formula is not a² + b² = c

Chunking Considerations

AutoRAG automatically chunks documents, but for best results:

Keep related information together
Use clear headings and structure
Avoid very long paragraphs
Include context within each section

Query Optimization

// Enable query rewriting for better results
const result = await env.AI.autorag("kb").search({
  query: playerMessage,
  rewrite_query: true,  // Improves ambiguous queries
  max_results: 5,       // Balance between context and token usage
  rerank: true          // Better relevance ordering
});

Troubleshooting

Common Issues

1. Empty retrieval results

Verify documents are uploaded to R2
Check AutoRAG indexing status in dashboard
Ensure query is relevant to knowledge base content

2. Slow response times

AutoRAG adds ~200-500ms latency
Consider caching frequent queries
Reduce max_results if context is too large

3. Irrelevant results

Enable rerank: true for better relevance
Review document structure and formatting
Consider query rewriting

Monitoring

Use AI Gateway to monitor:

Query volume and latency
Token usage per NPC
Error rates and failures
Cost breakdown

NPC System - Core NPC framework
AI Integration - Multi-provider AI service
Circuit Breaker - Service resilience patterns
Database Schema - Complete database reference

External Resources

← AI Tools Catalog Circuit Breaker →