Backup and Restore¶
The backup system protects persistent data (persistence DB, agent memory, and company configuration) through automated and manual backups with configurable retention policies and validated restore.
Architecture¶
- BackupService: Central orchestrator coordinating component handlers, manifests, compression, and scheduling
- ComponentHandler protocol: Pluggable interface for backing up and restoring individual data components
SQLitePersistenceComponentHandler: SQLiteVACUUM INTOfor consistent point-in-time copiesPostgresPersistenceComponentHandler:pg_dump/pg_restoreshellouts withPGPASSWORDinjected via the child environment (never on argv) and a per-invocation timeoutMemoryComponentHandler:shutil.copytreewithsymlinks=Truefor agent memory data directoryConfigComponentHandler:shutil.copy2for company YAML configuration- PERSISTENCE_BACKUP_HANDLER_REGISTRY:
StrategyRegistrykeyed onconfig.persistence.backend("sqlite" / "postgres");_build_persistence_handlerdispatches by backend so swapping SQLite for Postgres at deploy time picks the matchingVACUUM INTO/pg_dumpimplementation without editing the factory. - BackupScheduler: Background asyncio task for periodic backups with interruptible sleep via
asyncio.Event - RetentionManager: Prunes old backups by count and age; never prunes the most recent backup or
pre_migration-tagged backups
Backup Triggers¶
| Trigger | When | Behaviour |
|---|---|---|
| Scheduled | Configurable interval (default: 6h) | Background, non-blocking |
| Pre-shutdown | Company.shutdown() / SIGTERM |
Synchronous, skips compression |
| Post-startup | After config load, before accepting tasks | Snapshot as recovery point |
| Manual | POST /api/v1/admin/backups |
On-demand, returns manifest. Requires the Idempotency-Key header (RFC-style retry-safe key, max 255 chars); identical keys within 24h return the cached manifest instead of starting a second backup so a 5xx-driven client retry cannot launch concurrent backups and violate the at-most-one-running invariant. Missing or empty header yields HTTP 400. |
| Pre-migration | Before restore operations | Safety net, automatic |
Restore Flow¶
- Validate
backup_idformat (12-char hex) - Load and verify manifest (structural validation)
- Re-compute and verify SHA-256 checksum against manifest
- Validate component sources (handler-specific checks)
- Create safety backup (pre-migration trigger)
- Atomic restore per component (
.bakrollback on failure) - Return
RestoreResponsewith safety backup ID
Configuration¶
Backup settings live in the backup namespace with runtime editability via BackupSettingsSubscriber:
enabled: Toggle scheduler start/stopschedule_hours: Reschedule interval (1 to 168 hours)compression,on_shutdown,on_startup: Advisory (read at use time)path: Requires restart (not dispatched)
REST API¶
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/admin/backups |
Trigger manual backup |
GET |
/api/v1/admin/backups |
List available backups |
GET |
/api/v1/admin/backups/{id} |
Get backup details |
DELETE |
/api/v1/admin/backups/{id} |
Delete a specific backup |
POST |
/api/v1/admin/backups/restore |
Restore from backup (requires confirm=true) |
Error responses¶
Every endpoint surfaces a structured RFC 9457 envelope on failure (see errors reference). Status codes are produced in two layers.
Layer 1: controller-specific translation (in
src/synthorg/api/controllers/backup.py). The restore endpoint
translates internal-detail exceptions into sanitised
HTTP-aware domain errors so the response body never echoes raw
manifest-parse internals; the original exception is preserved on
__cause__ for the structured log emitted by the centralised
handler. The controller does not build its own Response envelope;
it raises the typed error and the centralised handler maps it.
| Endpoint | Caught exception | Re-raised as | Resulting status |
|---|---|---|---|
POST /api/v1/admin/backups/restore |
ManifestError |
ValidationError("Invalid backup manifest") |
422 |
POST /api/v1/admin/backups/restore |
RestoreError |
InternalServerException("Restore operation failed") |
500 |
POST /api/v1/admin/backups, POST /api/v1/admin/backups/restore |
BackupInProgressError |
ConflictError("A backup operation is already in progress") |
409 |
GET /api/v1/admin/backups/{id}, DELETE /api/v1/admin/backups/{id}, POST /api/v1/admin/backups/restore |
BackupNotFoundError |
propagated unchanged (carries RECORD_NOT_FOUND) |
404 |
Layer 2: centralised mapping via handle_backup_error in
src/synthorg/api/exception_handlers.py. Catches every BackupError
subtype not translated by the controller (for example, ManifestError
raised from GET /api/v1/admin/backups/{id} since that endpoint does
not translate it explicitly):
| Exception | Status | error_code |
|---|---|---|
BackupNotFoundError |
404 |
RECORD_NOT_FOUND |
BackupInProgressError |
409 |
RESOURCE_CONFLICT |
Any other BackupError subtype (ManifestError, RestoreError, RetentionError, ComponentBackupError, plain BackupError) |
500 |
INTERNAL_ERROR (detail "Backup operation failed") |
See Also¶
- Persistence: repository protocol, migrations, schema
- Deployment: container runtime
- Design Overview: full index