Flow: Agentic Search (AI-Powered Conversational Search)

AI-powered search using Google Gemini to categorize queries, generate summaries, and support multi-turn conversations.

Request Path

graph TD
    A["Customer App"]

    subgraph proxy["Cloudflare Worker (search_proxy: {env}-ecom-api)"]
        subgraph agentic["Cloudflare Worker (agentic_search: {env}-agentic-search)"]
            B{"Cache check: CF Cache → XOR filter → DynamoDB"}
            C["Initial Marqo search (via search_proxy RPC)"]
            D["Google Gemini: function calling → query expansions + categories"]
            E["Per-category Marqo searches (parallel, via search_proxy RPC)"]
            F["Gemini: summary generation (streamed)"]
            G["Durable Object: save conversation state"]
        end
        H["SSE stream (init, category-hits, delta, stream-end)"]
    end

    I["Customer App"]

    A --> B
    B -->|cached| H
    B -->|uncached| C
    C --> D
    D --> E
    E --> H
    E -.->|optional| F
    F -.->|optional| G
    G -.-> H
    H --> I

Step-by-Step

1. Search Proxy Routes to Agentic Worker

Where: components/search_proxy/src/worker.ts

The search proxy receives the agentic search request and calls AGENTIC_SEARCH_WORKER via Cloudflare service binding RPC.

Inspect: Tail both workers — see Cloudflare Workers.

npx wrangler tail {env}-ecom-api
npx wrangler tail {env}-agentic-search

2. Payload Validation & Feature Check

Where: components/agentic_search/src/index.ts

Base64-decoded payload validated against Zod schema. Checks feature_flags.agentic_search is enabled.

Failure: 400 if feature not enabled or invalid payload.

3. Three-Tier Cache Check

Where: components/agentic_search/src/ddb-cached-lookup.ts

Cloudflare Cache API (2hr TTL, with 60s stale-while-revalidate) — key: https://internal/agentic-cached-query/{accountId}/{indexName}/{normalized_query}. Staleness check uses 1hr threshold; Cloudflare max-age is set to 2x (7200s) so entries survive past staleness for the stale-while-revalidate pattern.
XOR Filter (bloom filter) — negative filter: if query not in filter, skip DDB entirely
DynamoDB ({env}-AgenticCachedQueriesTable) — direct key lookup

If cached and eligible (based on cached_query_percentage config + session bucketing):

Sends cached summary + executes category searches
Skips LLM entirely

Inspect: Check cached queries — see DynamoDB.

aws dynamodb get-item --table-name {env}-AgenticCachedQueriesTable \
  --key '{"pk": {"S": "{accountId}#{indexName}#{normalized_query}"}}'

4. Short Query Bypass

Queries with fewer than 4 words skip the LLM entirely — returns basic Marqo search results.

5. Initial Marqo Search (Parallel)

Calls SEARCH_PROXY_WORKER.handleSearch() via service binding RPC. Runs in parallel with LLM call.

6. LLM Phase 1: Function Calling

Where: components/agentic_search/src/search/agentic-search.ts

Calls Google Gemini (gemini-2.5-flash) with:

System instructions (base prompt or per-account custom prompt)
Conversation history (if continuing)
execute_search function tool

LLM generates JSON with:

query_expansions: array of [query, category_label, confidence]
When agentic_config.filter_facets.enabled is enabled and facet context is available, each expansion may include an optional 4th element: a Marqo filter DSL string
query_complements: related search terms
summary: text with angle-bracket wrapped product links

Streaming: Uses stream-json library to parse LLM JSON output in real-time. Category searches fire as soon as each expansion is parsed (not waiting for full response).

Inspect: Check Gemini API key — see Secrets Manager. Key stored as GOOGLE_API_KEY env var.

7. Category Searches (Parallel)

For each LLM-generated query expansion, executes a Marqo search via SEARCH_PROXY_WORKER.handleSearch(). Up to 6 parallel searches.

If an agent-constructed filter is present, it is merged with any client-provided filter. If the merged filter yields 0 hits, the worker retries without the agent filter and marks the streamed category payload with filterDropped + originalFilter.

The filter that was actually applied is surfaced as appliedFilter on each categoryHits[] item.

8. LLM Phase 2: Summary Generation (Optional)

If agentic_config.enable_summary is true:

Second Gemini call with function calling DISABLED
Synthesizes summary from search results
Streamed as delta SSE events with text chunks

9. Conversation State (Optional)

Where: components/agentic_search/src/conversation-do.ts

If enableConversation is true:

Durable Object (ConversationSqlDO) stores conversation context in SQLite
Trimmed to 10 interactions for LLM, 100 for storage
Auto-cleanup after 30 days of inactivity (DO alarm)

Inspect: Durable Objects visible in Cloudflare dashboard.

10. SSE Response Stream

Event types sent to client (agentic search path):

init          → { categories: boolean, summary: boolean }
delta         → { summary?, categoryHits?, hits?, facets?, status?, error? }
stream-end    → {}

The converse path (handleConverse) uses different event types:

message         → text delta chunks
category-hits   → search category results
conversation-id → { conversationId }
error           → error details
stream-end      → {}

Converse (Multi-Turn Chat)

Similar to agentic search but:

Always uses function calling (no separate summary phase)
Supports document context injection (fetches docs by ID)
Supports image analysis (up to 5 images, base64-encoded)
Conversation history preserved in Durable Object

Chat Suggestions

handleChatSuggestions() — generates 3-5 follow-up questions for a document:

Fetch document via search proxy RPC
Call Gemini Lite (gemini-2.5-flash-lite)
Cache result in Cloudflare Cache API (30-day TTL)

Performance Metrics

Where: components/agentic_search/src/timer.ts

Logged at completion as agentic_search_completed:

{
  "source": "llm|cached|short_query_bypass",
  "timings": {
    "E2E": 2500,
    "INITIAL_MARQO_SEARCH": 200,
    "LLM_FIRST_CALL": 1800,
    "LLM_SECOND_CALL": 800,
    "CATEGORY_SEARCH_TOTAL": 400,
    "CACHED_QUERY_LOOKUP": 15
  }
}

What to Look For

Symptom	Where to Check
Agentic search not available	`feature_flags.agentic_search` in settings. Check Settings Sync.
LLM errors	Gemini API key missing/invalid. Check `GOOGLE_API_KEY` in Secrets Manager.
Slow responses	Check `timings` in completion log. LLM latency? Marqo latency?
Cache not working	Check DDB table. Check XOR filter rebuild. Check CF cache headers (2hr max-age, 1hr staleness threshold). Check `cached_query_percentage` config and session bucketing.
Conversation context lost	Durable Object cleanup (30-day inactivity). Check DO console.
5xx from agentic worker	Tail `{env}-agentic-search` worker. Check Gemini API status.
Empty categories	LLM returned no query expansions. Check system prompt. Short query bypass.
Streaming errors	Check SSE `error` events in delta. Partial results may still be valid.

Agentic Search — worker bindings, DO details, cache tables
Search Proxy — how agentic search is invoked
Cloudflare Workers — tailing workers, DO inspection
DynamoDB — cached queries table
Secrets Manager — Gemini API key
Search — underlying search execution