Skip to content

Flow: Agentic Search (AI-Powered Conversational Search)

AI-powered search using Google Gemini to categorize queries, generate summaries, and support multi-turn conversations.

Request Path

graph TD
    A["Customer App"]

    subgraph proxy["Cloudflare Worker (search_proxy: {env}-ecom-api)"]
        subgraph agentic["Cloudflare Worker (agentic_search: {env}-agentic-search)"]
            B{"Cache check: CF Cache → XOR filter → DynamoDB"}
            C["Initial Marqo search (via search_proxy RPC)"]
            D["Google Gemini: function calling → query expansions + categories"]
            E["Per-category Marqo searches (parallel, via search_proxy RPC)"]
            F["Gemini: summary generation (streamed)"]
            G["Durable Object: save conversation state"]
        end
        H["SSE stream (init, category-hits, delta, stream-end)"]
    end

    I["Customer App"]

    A --> B
    B -->|cached| H
    B -->|uncached| C
    C --> D
    D --> E
    E --> H
    E -.->|optional| F
    F -.->|optional| G
    G -.-> H
    H --> I

Step-by-Step

1. Search Proxy Routes to Agentic Worker

Where: components/search_proxy/src/worker.ts

The search proxy receives the agentic search request and calls AGENTIC_SEARCH_WORKER via Cloudflare service binding RPC.

Inspect: Tail both workers — see Cloudflare Workers.

npx wrangler tail {env}-ecom-api
npx wrangler tail {env}-agentic-search

2. Payload Validation & Feature Check

Where: components/agentic_search/src/index.ts

Base64-decoded payload validated against Zod schema. Checks feature_flags.agentic_search is enabled.

Failure: 400 if feature not enabled or invalid payload.

3. Three-Tier Cache Check

Where: components/agentic_search/src/ddb-cached-lookup.ts

  1. Cloudflare Cache API (2hr TTL, with 60s stale-while-revalidate) — key: https://internal/agentic-cached-query/{accountId}/{indexName}/{normalized_query}. Staleness check uses 1hr threshold; Cloudflare max-age is set to 2x (7200s) so entries survive past staleness for the stale-while-revalidate pattern.
  2. XOR Filter (bloom filter) — negative filter: if query not in filter, skip DDB entirely
  3. DynamoDB ({env}-AgenticCachedQueriesTable) — direct key lookup

If cached and eligible (based on cached_query_percentage config + session bucketing):

  • Sends cached summary + executes category searches
  • Skips LLM entirely

Inspect: Check cached queries — see DynamoDB.

aws dynamodb get-item --table-name {env}-AgenticCachedQueriesTable \
  --key '{"pk": {"S": "{accountId}#{indexName}#{normalized_query}"}}'

4. Short Query Bypass

Queries with fewer than 4 words skip the LLM entirely — returns basic Marqo search results.

5. Initial Marqo Search (Parallel)

Calls SEARCH_PROXY_WORKER.handleSearch() via service binding RPC. Runs in parallel with LLM call.

6. LLM Phase 1: Function Calling

Where: components/agentic_search/src/search/agentic-search.ts

Calls Google Gemini (gemini-2.5-flash) with:

  • System instructions (base prompt or per-account custom prompt)
  • Conversation history (if continuing)
  • execute_search function tool

LLM generates JSON with:

  • query_expansions: array of [query, category_label, confidence]
  • When agentic_config.filter_facets.enabled is enabled and facet context is available, each expansion may include an optional 4th element: a Marqo filter DSL string
  • query_complements: related search terms
  • summary: text with angle-bracket wrapped product links

Streaming: Uses stream-json library to parse LLM JSON output in real-time. Category searches fire as soon as each expansion is parsed (not waiting for full response).

Inspect: Check Gemini API key — see Secrets Manager. Key stored as GOOGLE_API_KEY env var.

7. Category Searches (Parallel)

For each LLM-generated query expansion, executes a Marqo search via SEARCH_PROXY_WORKER.handleSearch(). Up to 6 parallel searches.

If an agent-constructed filter is present, it is merged with any client-provided filter. If the merged filter yields 0 hits, the worker retries without the agent filter and marks the streamed category payload with filterDropped + originalFilter.

The filter that was actually applied is surfaced as appliedFilter on each categoryHits[] item.

8. LLM Phase 2: Summary Generation (Optional)

If agentic_config.enable_summary is true:

  • Second Gemini call with function calling DISABLED
  • Synthesizes summary from search results
  • Streamed as delta SSE events with text chunks

9. Conversation State (Optional)

Where: components/agentic_search/src/conversation-do.ts

If enableConversation is true:

  • Durable Object (ConversationSqlDO) stores conversation context in SQLite
  • Trimmed to 10 interactions for LLM, 100 for storage
  • Auto-cleanup after 30 days of inactivity (DO alarm)

Inspect: Durable Objects visible in Cloudflare dashboard.

10. SSE Response Stream

Event types sent to client (agentic search path):

init          → { categories: boolean, summary: boolean }
delta         → { summary?, categoryHits?, hits?, facets?, status?, error? }
stream-end    → {}

The converse path (handleConverse) uses different event types:

message         → text delta chunks
category-hits   → search category results
conversation-id → { conversationId }
error           → error details
stream-end      → {}

Converse (Multi-Turn Chat)

Similar to agentic search but:

  • Always uses function calling (no separate summary phase)
  • Supports document context injection (fetches docs by ID)
  • Supports image analysis (up to 5 images, base64-encoded)
  • Conversation history preserved in Durable Object

Chat Suggestions

handleChatSuggestions() — generates 3-5 follow-up questions for a document:

  1. Fetch document via search proxy RPC
  2. Call Gemini Lite (gemini-2.5-flash-lite)
  3. Cache result in Cloudflare Cache API (30-day TTL)

Performance Metrics

Where: components/agentic_search/src/timer.ts

Logged at completion as agentic_search_completed:

{
  "source": "llm|cached|short_query_bypass",
  "timings": {
    "E2E": 2500,
    "INITIAL_MARQO_SEARCH": 200,
    "LLM_FIRST_CALL": 1800,
    "LLM_SECOND_CALL": 800,
    "CATEGORY_SEARCH_TOTAL": 400,
    "CACHED_QUERY_LOOKUP": 15
  }
}

What to Look For

Symptom Where to Check
Agentic search not available feature_flags.agentic_search in settings. Check Settings Sync.
LLM errors Gemini API key missing/invalid. Check GOOGLE_API_KEY in Secrets Manager.
Slow responses Check timings in completion log. LLM latency? Marqo latency?
Cache not working Check DDB table. Check XOR filter rebuild. Check CF cache headers (2hr max-age, 1hr staleness threshold). Check cached_query_percentage config and session bucketing.
Conversation context lost Durable Object cleanup (30-day inactivity). Check DO console.
5xx from agentic worker Tail {env}-agentic-search worker. Check Gemini API status.
Empty categories LLM returned no query expansions. Check system prompt. Short query bypass.
Streaming errors Check SSE error events in delta. Partial results may still be valid.