Flow: Agentic Search (AI-Powered Conversational Search)
AI-powered search using Google Gemini to categorize queries, generate summaries, and support multi-turn conversations.
Request Path
graph TD
A["Customer App"]
subgraph proxy["Cloudflare Worker (search_proxy: {env}-ecom-api)"]
subgraph agentic["Cloudflare Worker (agentic_search: {env}-agentic-search)"]
B{"Cache check: CF Cache → XOR filter → DynamoDB"}
C["Initial Marqo search (via search_proxy RPC)"]
D["Google Gemini: function calling → query expansions + categories"]
E["Per-category Marqo searches (parallel, via search_proxy RPC)"]
F["Gemini: summary generation (streamed)"]
G["Durable Object: save conversation state"]
end
H["SSE stream (init, category-hits, delta, stream-end)"]
end
I["Customer App"]
A --> B
B -->|cached| H
B -->|uncached| C
C --> D
D --> E
E --> H
E -.->|optional| F
F -.->|optional| G
G -.-> H
H --> I
Step-by-Step
1. Search Proxy Routes to Agentic Worker
Where: components/search_proxy/src/worker.ts
The search proxy receives the agentic search request and calls AGENTIC_SEARCH_WORKER via Cloudflare service binding RPC.
Inspect: Tail both workers — see Cloudflare Workers.
npx wrangler tail {env}-ecom-api
npx wrangler tail {env}-agentic-search
2. Payload Validation & Feature Check
Where: components/agentic_search/src/index.ts
Base64-decoded payload validated against Zod schema. Checks feature_flags.agentic_search is enabled.
Failure: 400 if feature not enabled or invalid payload.
3. Three-Tier Cache Check
Where: components/agentic_search/src/ddb-cached-lookup.ts
- Cloudflare Cache API (2hr TTL, with 60s stale-while-revalidate) — key:
https://internal/agentic-cached-query/{accountId}/{indexName}/{normalized_query}. Staleness check uses 1hr threshold; Cloudflaremax-ageis set to 2x (7200s) so entries survive past staleness for the stale-while-revalidate pattern. - XOR Filter (bloom filter) — negative filter: if query not in filter, skip DDB entirely
- DynamoDB (
{env}-AgenticCachedQueriesTable) — direct key lookup
If cached and eligible (based on cached_query_percentage config + session bucketing):
- Sends cached summary + executes category searches
- Skips LLM entirely
Inspect: Check cached queries — see DynamoDB.
aws dynamodb get-item --table-name {env}-AgenticCachedQueriesTable \
--key '{"pk": {"S": "{accountId}#{indexName}#{normalized_query}"}}'
4. Short Query Bypass
Queries with fewer than 4 words skip the LLM entirely — returns basic Marqo search results.
5. Initial Marqo Search (Parallel)
Calls SEARCH_PROXY_WORKER.handleSearch() via service binding RPC. Runs in parallel with LLM call.
6. LLM Phase 1: Function Calling
Where: components/agentic_search/src/search/agentic-search.ts
Calls Google Gemini (gemini-2.5-flash) with:
- System instructions (base prompt or per-account custom prompt)
- Conversation history (if continuing)
execute_searchfunction tool
LLM generates JSON with:
query_expansions: array of[query, category_label, confidence]- When
agentic_config.filter_facets.enabledis enabled and facet context is available, each expansion may include an optional 4th element: a Marqo filter DSL string query_complements: related search termssummary: text with angle-bracket wrapped product links
Streaming: Uses stream-json library to parse LLM JSON output in real-time. Category searches fire as soon as each expansion is parsed (not waiting for full response).
Inspect: Check Gemini API key — see Secrets Manager. Key stored as GOOGLE_API_KEY env var.
7. Category Searches (Parallel)
For each LLM-generated query expansion, executes a Marqo search via SEARCH_PROXY_WORKER.handleSearch(). Up to 6 parallel searches.
If an agent-constructed filter is present, it is merged with any client-provided filter. If the merged filter yields 0 hits, the worker retries without the agent filter and marks the streamed category payload with filterDropped + originalFilter.
The filter that was actually applied is surfaced as appliedFilter on each categoryHits[] item.
8. LLM Phase 2: Summary Generation (Optional)
If agentic_config.enable_summary is true:
- Second Gemini call with function calling DISABLED
- Synthesizes summary from search results
- Streamed as
deltaSSE events with text chunks
9. Conversation State (Optional)
Where: components/agentic_search/src/conversation-do.ts
If enableConversation is true:
- Durable Object (
ConversationSqlDO) stores conversation context in SQLite - Trimmed to 10 interactions for LLM, 100 for storage
- Auto-cleanup after 30 days of inactivity (DO alarm)
Inspect: Durable Objects visible in Cloudflare dashboard.
10. SSE Response Stream
Event types sent to client (agentic search path):
init → { categories: boolean, summary: boolean }
delta → { summary?, categoryHits?, hits?, facets?, status?, error? }
stream-end → {}
The converse path (handleConverse) uses different event types:
message → text delta chunks
category-hits → search category results
conversation-id → { conversationId }
error → error details
stream-end → {}
Converse (Multi-Turn Chat)
Similar to agentic search but:
- Always uses function calling (no separate summary phase)
- Supports document context injection (fetches docs by ID)
- Supports image analysis (up to 5 images, base64-encoded)
- Conversation history preserved in Durable Object
Chat Suggestions
handleChatSuggestions() — generates 3-5 follow-up questions for a document:
- Fetch document via search proxy RPC
- Call Gemini Lite (
gemini-2.5-flash-lite) - Cache result in Cloudflare Cache API (30-day TTL)
Performance Metrics
Where: components/agentic_search/src/timer.ts
Logged at completion as agentic_search_completed:
{
"source": "llm|cached|short_query_bypass",
"timings": {
"E2E": 2500,
"INITIAL_MARQO_SEARCH": 200,
"LLM_FIRST_CALL": 1800,
"LLM_SECOND_CALL": 800,
"CATEGORY_SEARCH_TOTAL": 400,
"CACHED_QUERY_LOOKUP": 15
}
}
What to Look For
| Symptom | Where to Check |
|---|---|
| Agentic search not available | feature_flags.agentic_search in settings. Check Settings Sync. |
| LLM errors | Gemini API key missing/invalid. Check GOOGLE_API_KEY in Secrets Manager. |
| Slow responses | Check timings in completion log. LLM latency? Marqo latency? |
| Cache not working | Check DDB table. Check XOR filter rebuild. Check CF cache headers (2hr max-age, 1hr staleness threshold). Check cached_query_percentage config and session bucketing. |
| Conversation context lost | Durable Object cleanup (30-day inactivity). Check DO console. |
| 5xx from agentic worker | Tail {env}-agentic-search worker. Check Gemini API status. |
| Empty categories | LLM returned no query expansions. Check system prompt. Short query bypass. |
| Streaming errors | Check SSE error events in delta. Partial results may still be valid. |
Related Docs
- Agentic Search — worker bindings, DO details, cache tables
- Search Proxy — how agentic search is invoked
- Cloudflare Workers — tailing workers, DO inspection
- DynamoDB — cached queries table
- Secrets Manager — Gemini API key
- Search — underlying search execution