Rolling Deploy¶
PaaS Runtime ships every deploy as a Kubernetes RollingUpdate with
maxSurge: 1 and maxUnavailable: 0 — one extra pod comes up before
the old one is torn down, so single-replica apps keep serving traffic
through the roll.
How it works¶
sequenceDiagram
participant CP as Control Plane
participant K as Kubernetes
participant LB as Service / Traefik
participant U as User
CP->>K: apply Deployment (image vN)
K->>K: maxSurge +1 → spawn pod vN
K->>K: readinessProbe GET / on :8080
Note over K: pod vN ready → endpoints add
LB->>U: route traffic to vN AND vN-1
K->>K: SIGTERM pod vN-1
K->>K: preStop sleep 5s (drain LB)
Note over K: terminationGracePeriod 30s
K->>K: pod vN-1 evicted, endpoints remove
LB->>U: route traffic to vN only
A paas.io/deploy-id={uuid} label is stamped on the pod template at
roll time, so operators can grep logs for the exact roll under
investigation:
Zero-downtime guarantee¶
| Knob | Value | Effect |
|---|---|---|
maxSurge |
1 |
Spawn one extra pod before tearing down the old one |
maxUnavailable |
0 |
Never drop below the desired replica count |
readinessProbe |
GET / :8080 (5s/10s/3 fails) |
LB only sends traffic when the new pod actually serves |
preStop |
sleep 5 |
Existing connections drain before SIGTERM |
terminationGracePeriodSeconds |
30 |
App has 25s of graceful shutdown after the preStop sleep returns |
Together these make a single-replica deploy survive a roll without dropping a single in-flight request — the LB sees the new pod ready before it removes the old one from its endpoint set.
Configure via paas.toml¶
Default values match the podspec.rs hard-coded strategy, so an
empty [deploy] section is a no-op:
[deploy]
surge = 1
max_unavailable = 0
timeout = 600 # progress deadline, seconds
readiness_probe_path = "/healthz"
| Field | Default | Meaning |
|---|---|---|
surge |
1 |
maxSurge for the RollingUpdate |
max_unavailable |
0 |
maxUnavailable |
timeout |
600 |
Seconds before the rollout is marked Timeout |
readiness_probe_path |
"/" |
HTTP path the readiness probe hits |
A higher surge (2 or more) speeds up multi-replica rolls but uses
more headroom on the node — keep it 1 unless you have spare
capacity.
Rollout status API¶
Polled by the dashboard's RolloutProgress component every 3s while
in flight, then stops as soon as the status is terminal:
curl -sf -H "Authorization: Bearer $TOKEN" \
https://runtime.di2amp.com/api/v1/apps/$APP_ID/deploys/latest \
| jq '{ id, status, rollout_status, replicas }'
Sample response while a roll is in progress:
{
"id": "deploy-7b2c…",
"status": "deploying",
"rollout_status": "InProgress",
"replicas": { "desired": 3, "ready": 1, "available": 1 }
}
status |
Meaning |
|---|---|
queued |
Deploy created, waiting for build to finish |
building |
Tekton PipelineRun in progress |
deploying |
Image pushed, RollingUpdate in flight |
ready |
All pods ready, rollout complete |
failed |
Build failed, image push failed, or rollout failed |
timeout |
Rollout still in progress after timeout seconds |
Rollout timeout¶
When the rollout doesn't reach Completed within
paas.toml [deploy] timeout seconds (default 600s), the control plane
marks the deploy timeout and stops polling. The pods themselves
keep going — operator can inspect the K8s events
(kubectl describe deployment …) to see whether it was a stuck image
pull, a failing readiness probe, or resource pressure.
A future Phase 2 enhancement may auto-rollback on timeout; today the operator runs the rollback explicitly.
Filter pods by deploy¶
The paas.io/deploy-id label is the cleanest cross-section for
diagnostics:
# All pods of a specific roll
kubectl get pod -l paas.io/deploy-id=$DEPLOY_ID
# Logs of every container of every pod in this roll
kubectl logs -l paas.io/deploy-id=$DEPLOY_ID --all-containers --tail=500
# Compare CPU between the new and old roll mid-flight
kubectl top pod -l paas.io/app=$APP_ID --no-headers \
| sort -k2 -h
Related¶
- Build Cache — what runs before the rollout
- Dockerfile Support — image-build path
- Buildpacks Detection — Paketo path