PITR Backup¶
CloudNative-PG (CNPG) handles automatic backups for the Postgres addon. Two tiers ship today, mapped from the addon plan:
| Plan | Backup tier | Retention | WAL streaming |
|---|---|---|---|
free |
none |
— | — |
standard |
daily |
7 days | scheduled base backup only |
pro |
pitr |
35 days | continuous (point-in-time recovery) |
The tier is rendered as a spec.backup stanza on the CNPG
Cluster CR — barman handles the base/WAL upload to the
operator-provided S3 bucket; cert-manager-style operator
involvement is zero per tenant.
How it works¶
sequenceDiagram
participant U as Operator
participant CP as Control Plane
participant K as Kubernetes
participant CNPG as CNPG operator
participant Barman as barman-cloud
participant S3 as OVH S3
Note over CP: cycle 3 wire-up — for now the spec is built but<br/>create_addon_generic doesn't yet pass it through.
CP->>CP: build_cluster_spec_full(version, plan, "pitr", tenant, app)
CP->>K: create Cluster CR with spec.backup stanza
K->>CNPG: reconciles Cluster
loop continuous WAL
CNPG->>Barman: stream WAL segment
Barman->>S3: upload to s3://paas-pg-backups/{tenant}/{app}/wals/
end
loop daily / scheduled
CNPG->>Barman: trigger base backup
Barman->>S3: upload to s3://paas-pg-backups/{tenant}/{app}/base/
end
U->>CP: POST /v1/apps/{id}/addons/{addon}/restore { timestamp }
CP->>CP: validate ISO 8601 timestamp
CP->>CP: audit_repository::log_event("addon.restore_pitr", …)
CP-->>U: { restore_id, status: "restore_triggered", target: "new_cluster" }
Note over CP,K: cycle 3 wire-up — calls create_recovery_cluster<br/>which spawns a NEW Cluster CR with spec.bootstrap.recovery
API endpoints¶
| Verb | Path | Body | Notes |
|---|---|---|---|
| GET | /v1/apps/{app_id}/addons/{addon_id}/backups |
— | barman base + WAL backups, projected from CNPG Backup CRs |
| POST | /v1/apps/{app_id}/addons/{addon_id}/restore |
{ "timestamp": "ISO 8601", "target"?: "new_cluster" } |
fires addon.restore_pitr audit event, returns restore_id |
The pre-AE/44 flat stubs at /v1/addons/{addon_id}/backups and
/v1/addons/{addon_id}/restore stay wired for backwards-compat
but the dashboard targets the nested paths.
Restore example¶
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"timestamp":"2026-05-04T15:30:00Z"}' \
https://runtime.di2amp.com/api/v1/apps/$APP/addons/$ADDON/restore
Response:
{
"data": {
"restore_id": "0bd91…",
"app_id": "…",
"addon_id": "db-acme-app-1",
"status": "restore_triggered",
"timestamp": "2026-05-04T15:30:00Z",
"target": "new_cluster"
}
}
| target | Effect |
|---|---|
new_cluster (default) |
spawns a new CNPG Cluster alongside the source via spec.bootstrap.recovery; the operator promotes manually after verification |
in_place |
reserved — not implemented in cycle 2; the validate path accepts it for forward-compat |
Backup spec on the Cluster CR¶
The spec.backup stanza emitted by
paas_database::cnpg::backup_config_for_plan(tier, tenant, app):
spec:
backup:
barmanObjectStore:
destinationPath: "s3://paas-pg-backups/{tenant}/{app}"
endpointURL: "https://s3.gra.io.cloud.ovh.net"
s3Credentials:
accessKeyId: { name: ovh-s3-creds, key: AWS_ACCESS_KEY_ID }
secretAccessKey: { name: ovh-s3-creds, key: AWS_SECRET_ACCESS_KEY }
wal:
compression: gzip
archivingMode: continuous # only on `pitr`
retentionPolicy: 35d # 7d on daily, 35d on pitr
Naming pinned by tests:
- destination path is per-tenant per-app (no shared bucket prefix
across tenants),
- credentials live in a single ovh-s3-creds Secret managed by
the platform operator (NOT auto-provisioned per tenant),
- compression is always gzip.
Recovery cluster spec¶
The recovery path emits a Cluster with
spec.bootstrap.recovery.recoveryTarget.targetTime and an
externalClusters[] entry pointing at the source's S3 archive:
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:16
storage: { size: 50Gi }
bootstrap:
recovery:
source: db-acme-app-1
recoveryTarget:
targetTime: "2026-05-04T15:30:00Z"
externalClusters:
- name: db-acme-app-1
barmanObjectStore:
destinationPath: "s3://paas-pg-backups/db-acme-app-1"
endpointURL: "https://s3.gra.io.cloud.ovh.net"
s3Credentials: …
The new Cluster runs alongside the source — destructive in-place restore is intentionally not supported in cycle 2 (a botched recovery should never destroy live data).
Audit trail¶
Every POST /restore writes a row to paas_audit_events (action
addon.restore_pitr) with details_jsonb carrying the app_id,
addon_id, and timestamp. The fire-and-forget call doesn't
block the response — a transient DB hiccup never fails the
restore start.
SELECT created_at, target_id, details_jsonb
FROM paas_audit_events
WHERE action = 'addon.restore_pitr'
ORDER BY created_at DESC
LIMIT 10;
Operator setup¶
Backups need an S3 bucket and a Kubernetes Secret with the OVH credentials. One-time runbook (the platform doesn't auto-provision the bucket — operator-owned):
# 1. Create the OVH S3 bucket (or any S3-compatible endpoint).
ovhai bucket create paas-pg-backups
# 2. Generate AWS-flavored credentials.
ovhai credentials create --bucket paas-pg-backups
# 3. Plant the Secret in the addon namespace (paas-apps).
kubectl create secret generic ovh-s3-creds \
--namespace paas-apps \
--from-literal=AWS_ACCESS_KEY_ID=… \
--from-literal=AWS_SECRET_ACCESS_KEY=…
Until step 3 is done, CNPG will emit BackupFailed events on
every scheduled run — the dashboard's backup list shows
status: "failed" and the operator gets an actionable error.
Failure modes¶
| Symptom | Likely cause | Recovery |
|---|---|---|
status: "failed" on every backup |
ovh-s3-creds Secret missing or wrong |
re-run the operator runbook above |
Restore returns restore_id but no Cluster appears |
wire-up not landed yet (cycle 3) | expected — the audit row is the proof the request was logged |
BackupFailed: cannot reach endpoint |
network policy denies egress to S3 | add an allow-https-egress exception for the S3 endpoint CIDR |
Phase 2¶
- Wire
build_cluster_spec_fullintocreate_addon_generic— cycle 3 (44f) routes the addon plan'sbackupfield through to the Cluster CR so aproplan actually emits the backup stanza. - Wire
create_recovery_clusterintorestore_addon_pitr— cycle 3 calls the kube helper after the audit log fires. - In-place restore (
target: "in_place") — destructive but faster; lands when the platform has a "blue-green" deploy story for addons.
Related¶
- Postgres Addon — the lifecycle this hangs off
- Add-ons — the unified catalogue
crates/database/src/cnpg.rs::backup_config_for_plan— the pure builder pinned by 4 tests