AutoRAG Knowledge Bases

Overview

AutoRAG (now called AI Search) is Cloudflare’s fully managed Retrieval-Augmented Generation (RAG) service that enables NPCs to have personalized knowledge bases with semantic search capabilities. This feature allows each NPC to retrieve contextually relevant information from their knowledge base during conversations, enabling more accurate and grounded responses.

Status: Planned for v1.4.0


What is AutoRAG?

AutoRAG abstracts the complexity of building RAG pipelines by integrating:

ComponentDescriptionCloudflare Service
Data StorageStore knowledge documentsR2 Buckets
Vector DatabaseSemantic embeddings storageVectorize
Embedding GenerationConvert text to vectorsWorkers AI
Query RewritingImprove search queriesWorkers AI
Response GenerationGenerate AI responsesWorkers AI
API ManagementMonitor and control usageAI Gateway

Key Benefits for NPCs

  1. Semantic Search: NPCs can find contextually relevant information, not just keyword matches
  2. Automatic Embeddings: No need to manually manage vector databases
  3. Per-NPC Knowledge: Each NPC can have domain-specific knowledge
  4. Grounded Responses: Responses are based on actual knowledge, reducing hallucinations
  5. Managed Infrastructure: Cloudflare handles chunking, indexing, and retrieval

Architecture

Current Knowledge Flow (Without RAG)

Knowledge Files (R2) → npc_knowledge_data (D1) → Manual filtering → NPC Prompt
                       ↑ No embeddings
                       ↑ No semantic search

Proposed Architecture (With AutoRAG)

┌─────────────────────────────────────────────────────────────────┐
│                     AutoRAG Architecture                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  R2 Buckets (Per-NPC Knowledge)                                │
│  ├── padawanforge-npc-pythagoras/  → AutoRAG "pythagoras-kb"   │
│  ├── padawanforge-npc-einstein/    → AutoRAG "einstein-kb"     │
│  └── padawanforge-npc-{name}/      → AutoRAG "{name}-kb"       │
│                                                                 │
│  NPC Chat Request                                               │
│       │                                                         │
│       ▼                                                         │
│  ┌─────────────────────────────────────────────────────────────┐
│  │ 1. Retrieve relevant context from AutoRAG                   │
│  │    env.AI.autorag(npc.knowledge_base).search({              │
│  │      query: playerMessage,                                   │
│  │      rewrite_query: true,                                    │
│  │      max_results: 5                                          │
│  │    })                                                        │
│  └─────────────────────────────────────────────────────────────┘
│       │                                                         │
│       ▼                                                         │
│  ┌─────────────────────────────────────────────────────────────┐
│  │ 2. Inject retrieved context into CoSTAR prompt              │
│  └─────────────────────────────────────────────────────────────┘
│       │                                                         │
│       ▼                                                         │
│  ┌─────────────────────────────────────────────────────────────┐
│  │ 3. Generate response with augmented knowledge               │
│  └─────────────────────────────────────────────────────────────┘
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

API Methods

AutoRAG provides two primary methods for querying knowledge bases:

1. aiSearch() - Full RAG Response

Returns an AI-generated answer along with source citations.

const answer = await env.AI.autorag("npc-knowledge-base").aiSearch({
  query: "What is the theory of relativity?",
  stream: false,  // Set to true for streaming responses
  model: "@cf/meta/llama-3.1-8b-instruct"  // Optional: specify model
});

// Response includes:
// - Generated answer text
// - Source citations with filenames
// - Confidence scores

Use Case: Direct Q&A where the full response is generated by AutoRAG.

2. search() - Retrieval Only

Returns relevant document chunks without generating a response.

const searchResult = await env.AI.autorag("npc-knowledge-base").search({
  query: playerMessage,
  rewrite_query: true,   // Improve query before search
  max_results: 5,        // Number of results (1-50, default 10)
  rerank: true           // Reorder by semantic relevance
});

// Response includes:
// - Array of matching documents
// - Filename, content, and relevance scores

Use Case: Context injection into existing CoSTAR prompts (recommended for NPCs).

Use search() for NPCs to maintain the CoSTAR prompt structure and NPC personality:

// 1. Retrieve relevant context
const context = await env.AI.autorag(npc.knowledge_base_name).search({
  query: playerMessage,
  rewrite_query: true,
  max_results: 5
});

// 2. Format context for prompt injection
const ragContext = context.data
  .map(doc => `[Source: ${doc.filename}]\n${doc.content}`)
  .join('\n\n');

// 3. Inject into CoSTAR prompt
const prompt = buildCoSTARPrompt({
  context: `${npcContext}\n\n## Retrieved Knowledge\n${ragContext}`,
  // ... rest of CoSTAR structure
});

// 4. Generate response with Workers AI
const response = await env.AI.run(model, { prompt });

Configuration

Wrangler Configuration

No additional configuration is required. The existing AI binding supports AutoRAG:

// wrangler.jsonc
{
  "ai": {
    "binding": "AI"  // Supports env.AI.autorag()
  }
}

Database Schema Updates

New columns for NPCs table:

-- Add AutoRAG support to NPCs
ALTER TABLE npcs ADD COLUMN knowledge_base_name TEXT;
ALTER TABLE npcs ADD COLUMN autorag_enabled INTEGER DEFAULT 0;

-- Index for quick lookup
CREATE INDEX idx_npcs_autorag ON npcs(autorag_enabled) WHERE autorag_enabled = 1;

NPC Configuration

interface NPCWithAutoRAG {
  // Existing fields...
  id: number;
  name: string;
  system_prompt: string;

  // New AutoRAG fields
  knowledge_base_name: string | null;  // AutoRAG instance name
  autorag_enabled: boolean;             // Feature flag
}

Setup Guide

Prerequisites

  1. Cloudflare account with Workers AI access
  2. R2 bucket for knowledge storage
  3. AI Gateway (created automatically or manually)

Step 1: Create R2 Bucket for NPC Knowledge

# Using Wrangler CLI
wrangler r2 bucket create padawanforge-npc-pythagoras

Or via Cloudflare Dashboard:

  1. Navigate to R2Create bucket
  2. Name: padawanforge-npc-{npc-name}
  3. Upload knowledge documents (TXT, PDF, MD, JSON)

Step 2: Create AutoRAG Instance

Via Cloudflare Dashboard:

  1. Navigate to AIAI Search (formerly AutoRAG)
  2. Click Create
  3. Configure:
    • Name: {npc-name}-kb (e.g., pythagoras-kb)
    • R2 Bucket: Select the NPC’s bucket
    • Embedding Model: Default (recommended)
    • LLM: Default or custom model
    • AI Gateway: Select or create new

Step 3: Upload Knowledge Documents

Supported formats:

  • Plain Text: .txt (max 4MB)
  • Rich Format: .pdf, .md, .json (max 1MB)
# Upload via Wrangler
wrangler r2 object put padawanforge-npc-pythagoras/geometry-basics.md --file ./docs/geometry.md
wrangler r2 object put padawanforge-npc-pythagoras/pythagorean-theorem.pdf --file ./docs/theorem.pdf

Step 4: Enable AutoRAG for NPC

UPDATE npcs
SET knowledge_base_name = 'pythagoras-kb',
    autorag_enabled = 1
WHERE name = 'Pythagoras';

Implementation

NpcAgent.ts Enhancements

// src/lib/services/NpcAgent.ts

export class NpcAgent {
  private db: D1Database;
  private ai: any;  // Cloudflare AI binding

  constructor(db: D1Database, ai: any) {
    this.db = db;
    this.ai = ai;
  }

  /**
   * Retrieve relevant context from AutoRAG knowledge base
   */
  async retrieveKnowledgeContext(
    npcId: number,
    query: string
  ): Promise<string> {
    // Get NPC AutoRAG configuration
    const npc = await this.db.prepare(`
      SELECT knowledge_base_name, autorag_enabled
      FROM npcs WHERE id = ?
    `).bind(npcId).first();

    // Return empty if AutoRAG not enabled
    if (!npc?.autorag_enabled || !npc?.knowledge_base_name) {
      return '';
    }

    try {
      // Query AutoRAG for relevant context
      const result = await this.ai.autorag(npc.knowledge_base_name).search({
        query: query,
        rewrite_query: true,
        max_results: 5,
        rerank: true
      });

      // Format retrieved chunks for prompt injection
      if (!result.data || result.data.length === 0) {
        return '';
      }

      return result.data
        .map((doc: any, index: number) =>
          `[Source ${index + 1}: ${doc.filename}]\n${doc.content}`
        )
        .join('\n\n---\n\n');

    } catch (error) {
      console.error('[NpcAgent] AutoRAG retrieval failed:', error);
      return ''; // Graceful fallback
    }
  }

  /**
   * Generate NPC response with RAG-augmented context
   */
  async generateNPCChatResponseWithRAG(
    npcId: number,
    playerMessage: string,
    conversationHistory: Array<{ role: 'user' | 'assistant'; content: string }>,
    context: ChatContext
  ): Promise<string> {
    // Retrieve knowledge context
    const ragContext = await this.retrieveKnowledgeContext(npcId, playerMessage);

    // Get NPC configuration
    const config = await this.getNPCConfiguration(npcId);

    // Build CoSTAR prompt with RAG context
    const prompt = this.buildCoSTARChatPromptWithRAG(
      config,
      {
        ...chatContext,
        ragContext
      }
    );

    // Generate response
    return this.generateResponse(prompt);
  }
}

Updated CoSTAR Prompt Structure

private buildCoSTARChatPromptWithRAG(
  config: NPCConfiguration,
  chatContext: ChatContextWithRAG
): string {
  const ragSection = chatContext.ragContext
    ? `
## Retrieved Knowledge Base Context

The following information was retrieved from your knowledge base and is relevant
to the current conversation. Use this information to provide accurate, grounded
responses:

${chatContext.ragContext}

IMPORTANT: Base your response on this retrieved knowledge when applicable.
Cite sources naturally in your response when using specific facts.
`
    : '';

  return this.buildCoSTARPrompt({
    context: `You are ${chatContext.npcName}, an AI tutor...
${ragSection}

Your Core Identity:
- Name: ${chatContext.npcName}
- Personality: ${chatContext.npcPersonality}
...`,

    objective: `${config.objective}

When knowledge base context is available, prioritize using that information
to provide accurate, grounded responses.`,

    // ... rest of CoSTAR structure
  });
}

Limits and Pricing

AutoRAG Limits

ResourceLimit
AutoRAG instances per account10
Files per instance100,000
Plain text file size4 MB
Rich format file size1 MB
Max results per query50

Pricing (Open Beta)

AutoRAG is currently in open beta and free to enable. Underlying services may incur charges:

ServiceFree TierPaid
R2 Storage10 GB$0.015/GB-month
Vectorize5M vectorsPay-per-use
Workers AI10k neurons/dayPay-per-use
AI GatewayFreeFree

Migration Strategy

Phase 1: Parallel Implementation

  1. Add AutoRAG support alongside existing knowledge system
  2. Feature flag per NPC (autorag_enabled)
  3. Test with single NPC first (e.g., Pythagoras)
  4. Monitor quality and latency

Phase 2: Knowledge Migration

  1. Export existing npc_knowledge_data entries
  2. Convert to document format (MD/TXT)
  3. Upload to NPC-specific R2 buckets
  4. Create AutoRAG instances
  5. Validate retrieval quality

Phase 3: Full Rollout

  1. Enable for all NPCs with knowledge bases
  2. Deprecate manual knowledge injection
  3. Update admin UI for knowledge management
  4. Document new workflows

Best Practices

Knowledge Document Formatting

# Topic: Pythagorean Theorem

## Definition
The Pythagorean theorem states that in a right triangle,
the square of the hypotenuse equals the sum of squares
of the other two sides: a² + b² = c²

## Key Concepts
- Only applies to right triangles
- The hypotenuse is the longest side
- Can be used to find unknown side lengths

## Examples
Example 1: If a = 3 and b = 4, then c = 5
Because: 3² + 4² = 9 + 16 = 25 = 5²

## Common Misconceptions
- Does NOT apply to all triangles
- The formula is not a² + b² = c

Chunking Considerations

AutoRAG automatically chunks documents, but for best results:

  • Keep related information together
  • Use clear headings and structure
  • Avoid very long paragraphs
  • Include context within each section

Query Optimization

// Enable query rewriting for better results
const result = await env.AI.autorag("kb").search({
  query: playerMessage,
  rewrite_query: true,  // Improves ambiguous queries
  max_results: 5,       // Balance between context and token usage
  rerank: true          // Better relevance ordering
});

Troubleshooting

Common Issues

1. Empty retrieval results

  • Verify documents are uploaded to R2
  • Check AutoRAG indexing status in dashboard
  • Ensure query is relevant to knowledge base content

2. Slow response times

  • AutoRAG adds ~200-500ms latency
  • Consider caching frequent queries
  • Reduce max_results if context is too large

3. Irrelevant results

  • Enable rerank: true for better relevance
  • Review document structure and formatting
  • Consider query rewriting

Monitoring

Use AI Gateway to monitor:

  • Query volume and latency
  • Token usage per NPC
  • Error rates and failures
  • Cost breakdown

External Resources

PadawanForge v1.4.1