Summary
The Admin Lambda provides a REST API for forking indexes. Forking is a mechanism for making a copy of a customer's index and (optionally) transparently switching to serve from the copy. This can be used for development and testing on customer index, and blue-green deployments changing otherwise immutable settings on an index (e.g. model, account, cluster, etc.), or for green-blue deployment of changes to infra.
Concepts
- Index name: the name of a Marqo Cloud index. This is unique per account, and in the Marqo Classic API, it is immutable per index and thus a unique identifier.
- Index ID: a general unique identifier for an index (e.g.
{system_account_id}-{index_name}). In the ecom world, a request for an index by ID (via thex-marqo-index-idheader) does not necessarily map to an index with the specified name. - Index settings: the immutable settings of a Marqo index, sent in the Classic API create index request body.
Components
Fork will interact with the following components:
- Data:
- AccountsTable / UsersAccountsTable
- CustomerIndexConfigTable
- env-EcomIndexSettingsTable
- Index query configs table
- Merchandising table
- Feature flags JSON
- Infra:
- kops clusters
- Multitenant clusters
- Workflows
- Tests:
- Canary tests
Steps
Describe
Get all the necessary details about the source and target.
The set of details is mostly covered by Ops/Tactical/Validationgeneral index readiness, and Ecom Canary Testing which validates changes to configs.
- Source and target account and cluster details
- IDs
- Feature flags
- Source index details
- Name
- Settings
- Infrastructure
- Bespoke infra config (e.g. scaled out API nodes)
- URLs
- Target index
- Exists?
- Source ecom index settings
- Configs (add_docs_config, collections_config, search_config)
- Infrastructure (especially the queue ARN)
- Query configs
- Configs
- Merchandising
- Config
- Rules
- Pixel
- Mappings for automatic doc updates
TODO: In general, how to behave if the target resources/config already exist.
Create
Create a new index with the desired immutable configuration.
Once the queue is created, also deactivate the trigger for the ecom indexer so the queue isn't consumed until we're ready.
Configure
Create or update any mutable configuration (most of the things in "Describe"), defaulting to copies of the old index, able to be overridden at clone time.
Transfer
Once the target index is ready, update the source index's add_docs_config.index_write_aliases to start forking all subsequent writes to the target index.
In parallel, being the transfer operation for the existing docs (either manual snapshot to be restored, or the reindexing pipeline).
Persistence
Fork details are stored in a DynamoDB table called {env}-IndexForksTable. A new record is created for each state change of each fork.
| Column | Description |
|---|---|
| pk | Source system account ID |
| sk | (Source index name)#(System timestamp, ISO format) |
| fork_id | Fork ID |
| status | pending, in_progress, ready, failed, rolled_back, aborted, complete |
| source_cell_id | Source cell ID |
| source_system_account_id | Source system account ID |
| source_index_name | Source index name |
| target_cell_id | Target cell ID |
| target_system_account_id | Target system account ID |
| target_index_name | Target index name |
| created_at | Timestamp of creation of this record (particular status reached) |
| updated_at | Timestamp of last update of this record |
For each fork, we store:
- One record with all the context with which the fork was created.
- One record for each status change with timestamps.
Access patterns:
- List all forks for a given index ID (account ID + index name) and their latest status
- Get the latest status for a given fork ID
- List the history of a given fork ID
- Create a new fork record
- Update the
- Update the status of a fork (by creating a new record with the same fork ID and a new timestamp)
API
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks
Create a new fork.
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cutover
Cut over to the target index, routing all search traffic to the target index. Leaves the source index untouched.
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/rollback
Revert the necessary configs to serve all traffic from the source index.
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/cleanup
Check that the target index is successfully serving all traffic, and tear down the source index.
POST /api/v1/accounts/{account_id}/indexes/{index_name}/forks/{fork_id}/abort
Abort the fork, tearing down the target index and restoring the source index to its original state.
Implementation Plan
1. Persistence Layer
- Schema Design: Define a DynamoDB table schema for
ForksTableto store fork ID, status, source/target details, and step progress. - Service: Create a
ForkServiceto handle CRUD operations for fork records. - Deployment: Deploy the new table to the production environment via
admin_stackin CDK.
2. Core Fork Logic (Orchestrator)
- Describe & Validation: Implement logic to fetch source index details and validate target index parameters.
- Resource Creation: Integrate with
IndexSettingsServiceto create the target index (immutable settings). - Configuration Sync: Implement logic to copy and merge mutable settings (Ecom, Query Configs, Merchandising, Pixel) from source to target.
- Write Aliasing: Implement the update of
add_docs_configon the source index to alias writes to the target.
3. API Implementation
POST /forks:- Generate Fork ID.
- Create initial record in
ForksTable. - Trigger the asynchronous fork workflow (likely via Step Functions or async Lambda invocation).
- Return Fork ID and
pendingstatus. POST /cutover:- Retrieve fork record.
- Verify fork is in
readystate. - Update routing configuration (Index Registry/DNS/Gateway) to point search traffic to target.
- Update status to
complete. POST /rollback:- Revert write aliases on source index.
- Revert search routing if cutover was attempted.
- Update status to
rolled_back. POST /cleanup:- Verify traffic is serving correctly on target.
- Delete source index resources.
POST /abort:- Revert any changes to source (aliases).
- Delete target index resources.
- Mark fork as
aborted.
4. Asynchronous Workflow
The fork workflow is orchestrated by a Step Functions state machine (AdminIndexForkWorkflow). Each step invokes the Admin Lambda with a specific action:
fork.ensure_target → fork.configure_target → fork.prepare_transfer → snapshot → restore → fork.activate_target → fork.verify → succeed
| Step | Lambda Action | Description |
|---|---|---|
| Ensure Target | fork.ensure_target |
Create target index if missing, single readiness check (SFN retries on TargetIndexNotReadyError), validate infra compatibility |
| Configure Target | fork.configure_target |
Export source config, import into target, validate post-import export matches |
| Prepare Transfer | fork.prepare_transfer |
Disable target SQS ESM, add write alias (source → target) |
| Snapshot | (cross-account SFN) | Snapshot source index documents |
| Restore | (cross-account SFN) | Restore snapshot onto target index |
| Activate Target | fork.activate_target |
Re-enable target SQS ESM so queued writes drain |
| Verify | fork.verify |
Compare search results between source and target, mark READY or FAILED |
All steps are idempotent for safe Step Functions retries. Failures mark the fork as FAILED with a descriptive message.
5. Testing
- Unit Tests: Test individual components (Service, API models, Logic).
- Integration Tests: Test the full flow with mocked infrastructure calls.