Skip to content

Build Cache

PaaS Runtime keeps a per-app build cache so subsequent deploys reuse buildpack layers (Paketo) and image layers (Kaniko) instead of re-downloading and re-compiling everything from scratch. The cache lives in a dedicated 5 Gi ReadWriteOnce PVC named build-cache-<app_id> in the paas-build namespace.

How it works

sequenceDiagram
  participant U as User
  participant CP as Control Plane
  participant K as Kubernetes
  participant TK as Tekton (paas-build)

  U->>CP: POST /v1/apps/{id}/builds
  CP->>K: GET pvc build-cache-{app_id}
  alt PVC missing (Cold)
    CP->>K: CREATE pvc 5Gi RWO
    CP->>CP: cache_state = Cold
  else PVC found (Warm)
    CP->>CP: cache_state = Warm
  end
  CP->>CP: paas_build_start_total{cache_state, pipeline} += 1
  CP->>TK: PipelineRun (workspace cache → PVC)
  TK->>TK: kaniko / paketo reads $WORKSPACES_CACHE_PATH
  TK-->>CP: image URL

The control plane decides Cold vs Warm right before kicking off the PipelineRun. The decision is recorded in the build's BuildResult (cache_state field on the JSON returned to the API caller) and in the paas_build_start_total Prometheus counter.

States — Cold vs Warm

State When What's reused
Cold First build for this app, or after a manual flush Nothing — PVC was just created
Warm At least one prior build completed Buildpack layers (Paketo /cache), kaniko local cache

A typical Node.js app sees a 5–10× speed-up on Warm builds (the heavy npm ci step is cached as a buildpack layer).

Buildpack-specific cache paths

Language Cached content
Node.js node_modules, npm metadata
Python pip wheels, pyproject.toml resolved deps
Go $GOMODCACHE, compiled module cache
Rust ~/.cargo/registry, target/ (incremental)
Ruby Gemfile.lock resolved gems
PHP Composer downloads
Java Maven ~/.m2, Gradle cache
Elixir deps/, _build/

Paketo writes these under $CNB_CACHE_DIR which the pipeline maps to the cache workspace.

Manual cache flush

Use this when a buildpack upgrade or a corrupted layer is making subsequent builds flaky. The next build will run Cold:

TOKEN=$(curl -sf -X POST "https://runtime.di2amp.com/api/v1/auth/login" \
  -H "Content-Type: application/json" \
  -d '{"email":"you@example.com","password":"…"}' | jq -r '.data.access_token')

curl -s -X DELETE \
  -H "Authorization: Bearer $TOKEN" \
  "https://runtime.di2amp.com/api/v1/apps/$APP_ID/build_cache/flush"

Response:

{
  "data": {
    "app_id": "my-app",
    "flushed": true,
    "message": "Build cache PVC for app 'my-app' deleted."
  }
}

flushed: false is returned (HTTP 200) if there was no PVC to delete — it's a noop, not an error.

Inspect cache pointer

curl -s -H "Authorization: Bearer $TOKEN" \
  "https://runtime.di2amp.com/api/v1/apps/$APP_ID/build_cache"
{
  "data": {
    "app_id": "my-app",
    "pvc_name": "build-cache-my-app",
    "namespace": "paas-build"
  }
}

This doesn't read the PVC contents — it just tells you where to look in kubectl.

Metrics

Every start_build increments paas_build_start_total with the {cache_state, pipeline} labels. Useful queries:

# Cache hit ratio (last 5 min)
sum(rate(paas_build_start_total{cache_state="warm"}[5m]))
  / sum(rate(paas_build_start_total[5m]))

# Cold builds per pipeline (last hour)
sum by (pipeline) (
  increase(paas_build_start_total{cache_state="cold"}[1h])
)

A sustained low cache-hit ratio on a single app usually means either a manual flush was triggered repeatedly, or the buildpack version changed (forcing a Cold rebuild).

Cache invalidation (Phase 2)

Today the cache only invalidates on a manual DELETE /build_cache/flush. Phase 2 will add automatic lock-file-change detection so a single dependency upgrade triggers a selective reset rather than a full Cold rebuild. Supported lock files:

Ecosystem Lock file(s)
Node.js package-lock.json, yarn.lock, pnpm-lock.yaml
Python requirements.txt, Pipfile.lock, poetry.lock
Go go.sum
Rust Cargo.lock
Java pom.xml, gradle.lockfile
Ruby Gemfile.lock
PHP composer.lock
Elixir mix.lock

When a lock file's hash changes between two builds, the dependency cache layers are invalidated (re-downloaded from the registry) while the build-output cache (compiled artefacts that don't depend on the lock content) is preserved — so only the affected layers rebuild.

Current behaviour: the cache PVC is monolithic, with no per-layer hashing. A lock file change does not trigger automatic invalidation. Workaround: DELETE /v1/apps/{app_id}/build_cache/flush for a full manual reset, then trigger a rebuild.

Cache size & eviction (Phase 2)

Current limit: 5 Gi PVC per app (ReadWriteOnce, storageClassName: default). The PVC is created the first time the app is built; it persists until DELETE /build_cache/flush or until the app is deleted.

Phase 2 — LRU eviction when a cache PVC exceeds 80% capacity:

  • A scheduled CronJob scans every build-cache-* PVC in paas-build namespace.
  • An init container runs du -sh /workspace/cache to measure actual usage.
  • LRU eviction removes the oldest cache directories until usage drops below 60%.
  • The tenant dashboard surfaces a warning when cache_used_percent > 80.
  • A new build is refused (HTTP 507 Insufficient Storage) if it would push the PVC past 5 Gi — the tenant is told to flush the cache first.

Recommendation today: keep the workspace lean. Use multi-stage builds, a tight .dockerignore, and prefer dependency lock files (reproducible) over re-resolving from latest constraints (always pulls fresh layers).