Skip to content

Summary

The Admin Lambda provides a REST API for forking indexes. Forking is a mechanism for making a copy of a customer's index and (optionally) transparently switching to serve from the copy. This can be used for development and testing on customer index, and blue-green deployments changing otherwise immutable settings on an index (e.g. model, account, cluster, etc.), or for green-blue deployment of changes to infra.

Concepts

  • Index name: the name of a Marqo Cloud index. This is unique per account, and in the Marqo Classic API, it is immutable per index and thus a unique identifier.
  • Index ID: a general unique identifier for an index (e.g. {system_account_id}-{index_name}). In the ecom world, a request for an index by ID (via the x-marqo-index-id header) does not necessarily map to an index with the specified name.
  • Index settings: the immutable settings of a Marqo index, sent in the Classic API create index request body.

Components

Fork will interact with the following components:

  • Data:
  • AccountsTable / UsersAccountsTable
  • CustomerIndexConfigTable
  • env-EcomIndexSettingsTable
  • Index query configs table
  • Merchandising table
  • Feature flags JSON
  • Infra:
  • kops clusters
  • Multitenant clusters
  • Workflows
  • Tests:
  • Canary tests

Steps

Describe

Get all the necessary details about the source and target.

The set of details is mostly covered by Ops/Tactical/Validationgeneral index readiness, and Ecom Canary Testing which validates changes to configs.

  • Source and target account and cluster details
  • IDs
  • Feature flags
  • Source index details
  • Name
  • Settings
  • Infrastructure
  • Bespoke infra config (e.g. scaled out API nodes)
  • URLs
  • Target index
  • Exists?
  • Source ecom index settings
  • Configs (add_docs_config, collections_config, search_config)
  • Infrastructure (especially the queue ARN)
  • Query configs
  • Configs
  • Merchandising
  • Config
  • Rules
  • Pixel
  • Mappings for automatic doc updates

TODO: In general, how to behave if the target resources/config already exist.

Create

Create a new index with the desired immutable configuration.

Once the queue is created, also deactivate the trigger for the ecom indexer so the queue isn't consumed until we're ready.

Configure

Create or update any mutable configuration (most of the things in "Describe"), defaulting to copies of the old index, able to be overridden at clone time.

Transfer

Once the target index is ready, update the source index's add_docs_config.index_write_aliases to start forking all subsequent writes to the target index.

In parallel, being the transfer operation for the existing docs (either manual snapshot to be restored, or the reindexing pipeline).

Persistence

Fork details are stored in a DynamoDB table called {env}-IndexForksTable. A new record is created for each state change of each fork.

Column Description
pk Source system account ID
sk (Source index name)#(System timestamp, ISO format)
fork_id Fork ID
status pending, in_progress, ready, failed, rolled_back, aborted, complete
source_cell_id Source cell ID
source_system_account_id Source system account ID
source_index_name Source index name
target_cell_id Target cell ID
target_system_account_id Target system account ID
target_index_name Target index name
created_at Timestamp of creation of this record (particular status reached)
updated_at Timestamp of last update of this record

For each fork, we store:

  • One record with all the context with which the fork was created.
  • One record for each status change with timestamps.

Access patterns:

  • List all forks for a given index ID (account ID + index name) and their latest status
  • Get the latest status for a given fork ID
  • List the history of a given fork ID
  • Create a new fork record
  • Update the
  • Update the status of a fork (by creating a new record with the same fork ID and a new timestamp)

API

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks

Create a new fork.

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cutover

Cut over to the target index, routing all search traffic to the target index. Leaves the source index untouched.

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/rollback

Revert the necessary configs to serve all traffic from the source index.

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cleanup

Check that the target index is successfully serving all traffic, and tear down the source index.

POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/abort

Abort the fork, tearing down the target index and restoring the source index to its original state.

Implementation Plan

1. Persistence Layer

  • Schema Design: Define a DynamoDB table schema for ForksTable to store fork ID, status, source/target details, and step progress.
  • Service: Create a ForkService to handle CRUD operations for fork records.
  • Deployment: Deploy the new table to the production environment via admin_stack in CDK.

2. Core Fork Logic (Orchestrator)

  • Describe & Validation: Implement logic to fetch source index details and validate target index parameters.
  • Resource Creation: Integrate with IndexSettingsService to create the target index (immutable settings).
  • Configuration Sync: Implement logic to copy and merge mutable settings (Ecom, Query Configs, Merchandising, Pixel) from source to target.
  • Write Aliasing: Implement the update of add_docs_config on the source index to alias writes to the target.

3. API Implementation

  • POST /forks:
  • Generate Fork ID.
  • Create initial record in ForksTable.
  • Trigger the asynchronous fork workflow (likely via Step Functions or async Lambda invocation).
  • Return Fork ID and pending status.
  • POST /cutover:
  • Retrieve fork record.
  • Verify fork is in ready state.
  • Update routing configuration (Index Registry/DNS/Gateway) to point search traffic to target.
  • Update status to complete.
  • POST /rollback:
  • Revert write aliases on source index.
  • Revert search routing if cutover was attempted.
  • Update status to rolled_back.
  • POST /cleanup:
  • Verify traffic is serving correctly on target.
  • Delete source index resources.
  • POST /abort:
  • Revert any changes to source (aliases).
  • Delete target index resources.
  • Mark fork as aborted.

4. Asynchronous Workflow

The fork workflow is orchestrated by a Step Functions state machine (AdminIndexForkWorkflow). Each step invokes the Admin Lambda with a specific action:

fork.ensure_target → fork.configure_target → fork.prepare_transfer → snapshot → restore → fork.activate_target → fork.verify → succeed
Step Lambda Action Description
Ensure Target fork.ensure_target Create target index if missing, single readiness check (SFN retries on TargetIndexNotReadyError), validate infra compatibility
Configure Target fork.configure_target Export source config, import into target, validate post-import export matches
Prepare Transfer fork.prepare_transfer Disable target SQS ESM, add write alias (source → target)
Snapshot (cross-account SFN) Snapshot source index documents
Restore (cross-account SFN) Restore snapshot onto target index
Activate Target fork.activate_target Re-enable target SQS ESM so queued writes drain
Verify fork.verify Compare search results between source and target, mark READY or FAILED

All steps are idempotent for safe Step Functions retries. Failures mark the fork as FAILED with a descriptive message.

5. Testing

  • Unit Tests: Test individual components (Service, API models, Logic).
  • Integration Tests: Test the full flow with mocked infrastructure calls.