Skip to content

TLS Auto

Every custom domain attached to an app gets an HTTPS certificate automatically. There's no operator step beyond pointing DNS at the platform — cert-manager runs the ACME HTTP-01 challenge against Let's Encrypt, traefik picks up the issued secret, and the next TLS handshake on https://your-domain serves the new certificate.

How it works

sequenceDiagram
  participant CP as Control Plane
  participant T as Traffic Service
  participant CM as cert-manager
  participant LE as Let's Encrypt (ACMEv2)
  participant K as Kubernetes
  participant Tr as Traefik

  CP->>T: add_custom_domain(app, domain)
  T->>K: patch IngressRoute (add host)
  T->>K: ensure_certificate(name, dnsNames=[domain], secret=tls-…)
  K->>CM: Certificate CR (issuerRef=letsencrypt-prod)
  CM->>LE: order, HTTP-01 challenge
  LE->>Tr: GET /.well-known/acme-challenge/<token>
  Tr-->>LE: <validation>
  LE-->>CM: certificate issued
  CM->>K: Secret tls-… (cert + key)
  Note over Tr: traefik watches Secret, hot-reloads TLS
  CP->>CM: get Certificate.status.conditions[Ready]
  CP->>CP: parse_cert_status → "issued"
  CP-->>operator: GET /domains/{domain} → tls_status="issued"

ClusterIssuer

deploy/k8s/cert-manager-clusterissuer-prod.yaml registers a ClusterIssuer named letsencrypt-prod, and the platform code hard-codes that name in crates/traffic/src/tls.rs::CertificateSpec. Switching to a different issuer (staging, internal CA, DNS-01 for wildcards) means applying a new ClusterIssuer and tweaking that one constant.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ops@di2amp.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          ingress:
            class: traefik

LIVE check after kubectl apply:

$ kubectl get clusterissuer letsencrypt-prod \
    -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
True

Certificate CR

tls::ensure_certificate(client, ns, name, dns_names, secret) is idempotent — the custom-domain flow can call it on every POST /domains without 409-AlreadyExists. Existing rows get their dnsNames merged in via strategic-merge-patch with field manager paas-control-plane so ownership is visible:

$ kubectl get certificate -n paas-apps cert-app-123-blog-example-com -o yaml
metadata:
  managedFields:
    - manager: paas-control-plane
      operation: Apply
      spec:
  secretName: tls-app-123-blog-example-com
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - blog.example.com

Naming convention (pinned by tests in tls.rs):

Resource Pattern
Certificate name cert-{app_id}-{domain.replace('.', '-')}
Secret name tls-{app_id}-{domain.replace('.', '-')}

Status vocabulary

tls::parse_cert_status projects cert-manager's status.conditions[type=Ready] into a small operator-facing vocabulary the dashboard renders:

cert-manager condition tls_status
Ready=True issued
Ready=False failed
missing / not yet pending

Surfaced via:

$ curl -H "Authorization: Bearer $TOKEN" \
    https://runtime.di2amp.com/api/v1/apps/$APP/domains/$DOMAIN
{ "data": { "domain": "...", "tls_status": "issued", "dns_configured": true,  } }

Renewal

cert-manager renews automatically 30 days before expiration. Operators don't touch this — the new Secret is hot-reloaded by traefik on the next handshake without restarting any pod. A recurring tls_status: "renewing" value is not surfaced today (there's no distinct cert-manager condition for renewal-in-flight); the row just stays issued while a fresh cert quietly replaces the old one.

Wildcard / DNS-01 (Phase 2)

HTTP-01 only validates exact hostnames — wildcard certs (*.example.com) need DNS-01. deploy/k8s/cert-manager-clusterissuer-prod-dns.yaml ships as a placeholder for an OVH-DNS DNS-01 ClusterIssuer; activation needs the OVH API credentials in a Kubernetes Secret. See the operator runbook in bilans/ad32-summary.md for the bring-up steps.

Failures and recovery

Symptom Likely cause Recovery
tls_status: "pending" for >5 min DNS not propagated check CNAME with dig, then re-call POST /verify
tls_status: "failed" rate-limit, invalid domain, HTTP-01 challenge couldn't reach traefik kubectl describe certificate ..., fix the underlying issue, then kubectl delete certificate ... (cert-manager re-creates)
tls_status: "issued" but browser still shows old cert traefik / HSTS cache give traefik 30s to pick up the new Secret
  • Custom Domain — the lifecycle the TLS path hangs off (POST /domains → ensure_certificate → verify)
  • Apps — the auto-subdomain app-{id}.runtime.di2amp.com uses the same ClusterIssuer
  • crates/traffic/src/tls.rs — Rust implementation of all of the above, with the parse_cert_status projection contract pinned by 6 tests (cycle 1)