Incident Recovery
This page is the operator playbook for the most common DaoFlow failures.
First Five Minutes
Collect the state before you make changes:
cd /opt/daoflow
docker compose ps
docker compose logs --tail=200 daoflow
docker compose logs --tail=100 postgres redis temporal temporal-ui
curl http://127.0.0.1:3000/trpc/health
If the web UI is reachable, also capture:
- the failing deployment or backup run ID
- the server name involved
- the last successful deployment or backup before the incident
- the deployment state artifact export from the deployment details panel
Control Plane Will Not Start
Check:
.envcontains non-emptyBETTER_AUTH_SECRET,ENCRYPTION_KEY,POSTGRES_PASSWORD, andTEMPORAL_POSTGRES_PASSWORDBETTER_AUTH_URLmatches the public origin operators are actually using/var/run/docker.sockis mounted into thedaoflowcontainerpostgresis healthy beforedaoflowstarts
Useful commands:
docker compose logs --tail=200 daoflow
docker compose logs --tail=200 postgres
If startup fails during owner bootstrap, correct the DAOFLOW_INITIAL_ADMIN_* values and restart daoflow.
Deployments Fail Or Stall
Use the deployment ID from the dashboard or CLI response:
daoflow logs --deployment <deployment-id> --json
daoflow status --json
daoflow doctor --json
Check:
- target server SSH connectivity
- Docker and Docker Compose availability on the managed host
- whether the Compose deploy required a context upload and that the staging workspace is writable
- whether the failure is in plan generation, artifact staging, Docker execution, or post-start health
- whether the dashboard deployment details show a difference between declared config, frozen deployment input, and last observed live state
From the dashboard:
- open the failed service or deployment record
- expand the deployment details
- copy or download the deployment state artifact JSON
- compare the frozen deployment input with the live runtime section before changing anything
If Temporal mode is enabled, also inspect:
docker compose logs --tail=200 temporal temporal-ui daoflow
Emergency fallback:
- set
DAOFLOW_ENABLE_TEMPORAL=falsein.env docker compose up -d daoflow
That returns the system to the legacy in-process worker while you investigate Temporal separately.
Backups Fail
daoflow backup list --json
daoflow backup destination test --id <destination-id>
daoflow backup run --policy <policy-id> --yes
Most backup failures reduce to one of:
- destination credentials or bucket and path permissions
- SSH or Docker access to the target host
- not enough disk space in the staging or destination path
Failed backup runs are preserved as first-class records. Do not delete them until you have captured their error detail.
Restore Or Verification Fails
daoflow backup restore --backup-run-id <run-id> --yes
daoflow backup verify --backup-run-id <run-id> --yes
daoflow backup download --backup-run-id <run-id> --json
Current product behavior is artifact-oriented:
- restore requests resolve the backup run and download the artifact from the configured destination
- success or failure is recorded in restore metadata, audit entries, and events
- application-specific volume or database rehydration may still require manual operator steps
If you need manual recovery today:
- use
daoflow backup download --backup-run-id <run-id> --jsonto discover the artifact path - use your storage backend tooling or
rcloneto fetch the artifact - restore the data with the application or database-specific procedure
- record the manual action in your incident notes
Compose State Recovery
When a Compose-backed service looks wrong but the host is still reachable:
- open the service
Composetab to copy or download the DaoFlow-managed override layer - open the latest deployment details and export the deployment state artifact JSON
- compare the declared config, frozen deployment input, and live runtime sections
- only then fall back to host-level
docker compose ps,docker inspect, or manual file inspection
This keeps DaoFlow Compose-first while still giving operators a visible escape hatch into the exact state the control plane believes it manages.
Upgrade Regression
If a newly pulled DaoFlow image regresses:
- pin
DAOFLOW_VERSIONback to the previous known-good tag docker compose pull && docker compose up -d- if the database schema changed incompatibly, restore your pre-upgrade database backup before bringing the older image back online
See Upgrading for the normal upgrade path.
Escalation Checklist
- Captured
docker compose ps - Captured logs from
daoflowand the affected dependency - Recorded the failing deployment, backup, or restore ID
- Confirmed whether Temporal mode was enabled
- Verified whether the incident is control-plane-local or remote-target-specific