OpenSSH CA in production: a complete guide

OpenSSH has supported certificates since version 5.4, released in 2010. Despite that, most teams that try to roll out an SSH Certificate Authority run into the same five problems in the first week. They are all small, all configurable, and all easy to debug once you know they exist. This article walks through each in order, with the configuration you need to ship, and the verification step to prove it works.

The reference target is OpenSSH 8.4 (released September 2020) or later on both client and server. Earlier versions work, with one substantive caveat covered below. The relevant standards are RFC 4252 (SSH authentication protocol), RFC 4253 (transport layer), and the certificate format from PROTOCOL.certkeys in the OpenSSH source tree, which never made it into an RFC because Tatu Ylonen and the OpenSSH maintainers disagreed about the IETF process. The de-facto standard works fine.

Pitfall 1: certificate TTL chosen badly

The first decision is how long a certificate is valid. The two failure modes are symmetric. Too long and revocation becomes a real problem (you have to push KRLs, see below). Too short and engineers re-authenticate constantly and start scripting around the CA, which defeats the audit story.

The sweet spot for human users is 4 to 8 hours. That is one work session. The engineer signs in once in the morning, the cert covers them until lunch or until end-of-day, and they re-authenticate cleanly. For service accounts and CI workloads, a much shorter TTL (5 to 15 minutes) is correct, because the certificate is requested programmatically per job and there is no friction.

The cert validity window is set at signing time:

# Sign a user cert valid for 4 hours, identity alice@example.com
ssh-keygen -s ca_user -I "alice@example.com" \
  -n "alice,alice@example.com" \
  -V +4h \
  -O source-address=10.0.0.0/8,192.168.0.0/16 \
  -z $(date +%s) \
  alice_id_ed25519.pub

# Inspect the resulting cert
ssh-keygen -L -f alice_id_ed25519-cert.pub

The -V +4h sets the validity period. The -n flag is the principal list (covered next). The -z sets a serial number, useful for revocation. The source-address critical option binds the cert to a CIDR set — connections from outside those ranges fail closed at sshd with no fallback to other auth methods. That last bit is worth its weight: a leaked cert outside your VPN range is useless.

Pitfall 2: principals do not match what sshd expects

The -n flag stamps a list of principals onto the cert. The host's sshd only accepts the cert if at least one of those principals appears in the AuthorizedPrincipalsFile for the Unix user the engineer is logging in as. A common foot-gun: the cert says alice@example.com and the principals file on the host says alice, with no email suffix. The cert is valid, the signature checks out, and sshd still rejects the login. The debug message in journalctl -u sshd says certificate "..." signed by ... is not authorized, which is unhelpfully terse.

Pick one principal format, write it down in the runbook, and use it everywhere. Most teams settle on the IdP email (alice@example.com) because it survives the engineer changing teams. The AuthorizedPrincipalsFile looks like this:

# /etc/ssh/sshd_config
TrustedUserCAKeys /etc/ssh/ca_users.pub
AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u
RevokedKeys /etc/ssh/revoked_keys.krl

# /etc/ssh/auth_principals/ubuntu
alice@example.com
bob@example.com
oncall@example.com

# /etc/ssh/auth_principals/postgres
db-admin@example.com

The %u token expands to the local Unix username at login time. So a single sshd config supports per-account principal lists, with no service restart needed when you add a user.

Pitfall 3: KRL distribution lag

Certificates expire on their own, which is the whole point. Sometimes you need to revoke a cert before it expires (a laptop is lost, an engineer is offboarded mid-session). OpenSSH supports this via the Key Revocation List (KRL): a compact binary file listing revoked certs by serial number or fingerprint. The RevokedKeys directive points sshd at it.

The catch: the KRL file is on the host. If you generate a new KRL on the CA side and your hosts never see it, the revocation is theatre. Three distribution patterns work in practice:

Pull at boot plus periodic. The host agent polls the control plane every 30 to 60 seconds for the current KRL hash. If the hash changed, fetch the new file. Simple, slightly chatty, works air-gapped if the control plane lives inside the VPC.
Push via configuration management. Ansible / Chef / Puppet run distributes the KRL on each cycle. Pro: existing tooling. Con: the cycle is rarely under 15 minutes, so revocation lag is meaningful.
Push via the agent over a long-lived gRPC stream. The control plane streams KRL updates to every host within seconds. This is what Linux Identity does in production: revocation propagates to every host in the fleet within 60 seconds, typically under 5.

The KRL is small. A fleet with 5,000 revoked serials across a year has a KRL of about 80 KB. Distribution bandwidth is not a concern.

The verification step: revoke a known-good cert, attempt to connect, and confirm the connection fails. Then time how long it took from revocation to that failure across a sample of hosts. If the answer is "15 minutes," you have a configuration-management-paced KRL and need to think about whether that is good enough for your threat model.

Pitfall 4: forgetting host certificates

Most teams that adopt SSH CAs do user certs only and skip host certs. That is fine on day one and a tech-debt note for day 100. Without host certs, every engineer's laptop carries ~/.ssh/known_hosts with one line per fingerprint. Adding a new host requires either accepting the fingerprint manually (TOFU) or distributing fingerprints out of band.

Host certs solve this. The CA signs the host's public key with a host-cert variant, and the client trusts the CA via @cert-authority entries in known_hosts:

# Sign a host cert valid for 90 days, naming the host's DNS names
ssh-keygen -s ca_host -I "web-001.prod" \
  -h \
  -n "web-001.prod.example.com,web-001" \
  -V +90d \
  /etc/ssh/ssh_host_ed25519_key.pub

# Distribute to the host as /etc/ssh/ssh_host_ed25519_key-cert.pub
# Then in sshd_config:
HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

# On the client side, ~/.ssh/known_hosts:
@cert-authority *.prod.example.com ssh-ed25519 AAAAC3Nz... ca_host

The -h flag is what makes it a host cert, not a user cert. The -n list contains the DNS names or hostnames the cert is valid for. A connecting client sees the host cert, validates it against the CA public key in known_hosts, and confirms the requested hostname matches. No TOFU prompt. New hosts join the fleet with no client-side configuration change.

Pitfall 5: the ssh-rsa cert algorithm regression

OpenSSH 8.2 (February 2020) deprecated the ssh-rsa signature algorithm by default for security reasons (SHA-1 collisions). OpenSSH 8.8 (September 2021) disabled it by default outright. This bites two cohorts. First, anyone whose CA key is an RSA key signing certificates with the legacy ssh-rsa algorithm; the client will refuse to use them against an 8.8+ server. Second, anyone running CentOS 7 or Ubuntu 18.04 servers, which ship OpenSSH 7.4 and 7.6 respectively and only know the legacy algorithm.

The fix on the CA side is to use either ed25519 or ECDSA P-256 keys. Both are well-supported across OpenSSH 8.x and 9.x. Avoid generating a fresh RSA CA in 2026; if you have a legacy RSA CA, plan a CA rotation rather than papering over the algorithm choice with -O ssh-rsa client overrides.

The fix on the legacy-server side is to upgrade. Ubuntu 18.04 is past its standard support window; CentOS 7 ended in June 2024. If you cannot upgrade those hosts, the SSH-CA rollout is a forcing function: those hosts will not accept certs from a modern CA unless you re-enable the deprecated algorithm in the host's sshd_config via HostKeyAlgorithms +ssh-rsa, which is exactly the kind of audit finding you are trying to avoid.

One more thing: source-address binding

Not a pitfall, an opportunity. The source-address critical option on user certs is one of the most underused features. It binds the cert to a CIDR range. A cert with source-address=10.0.0.0/8,192.168.0.0/16 only validates from those ranges; presented from anywhere else, sshd rejects the cert with the certificate is not valid from this address debug message.

That is useful in two scenarios. First, when an engineer's laptop is on the corporate VPN with a known CIDR; binding the cert to that range means a leaked cert outside the VPN is dead on arrival. Second, for break-glass certs that should only work from a designated jump host or operations subnet.

Watch list: CVEs and version pins

OpenSSH is well-maintained but not bug-free. CVE-2023-38408 (July 2023) was a remote code execution in the ssh-agent forwarding path involving PKCS#11 provider loading; affected versions are OpenSSH 5.5 through 9.3p1. Most distributions backported the fix; verify by checking ssh -V against your distribution's security advisory. CVE-2024-6387 (regreSSHion, July 2024) was a race condition in sshd signal handling on glibc Linux, affecting OpenSSH 8.5p1 through 9.7p1 (and earlier than 4.4p1 in a different path); the fix landed in 9.8p1 and was backported. If you operate any internet-exposed bastions, your patch SLA on openssh-server needs to be measured in days, not weeks.

The certificate format itself has been remarkably stable. No certificate-issuance CVEs in OpenSSH have surfaced since the feature shipped in 2010.

Putting it together: a working sshd_config

# /etc/ssh/sshd_config — production-ready SSH-CA host config
Port 22
AddressFamily inet
PermitRootLogin no
PasswordAuthentication no
ChallengeResponseAuthentication no
UsePAM yes

# Cert authority for users
TrustedUserCAKeys /etc/ssh/ca_users.pub
AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u
RevokedKeys /etc/ssh/revoked_keys.krl

# Host certificate
HostKey /etc/ssh/ssh_host_ed25519_key
HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

# Modern algorithms only
KexAlgorithms curve25519-sha256@libssh.org,curve25519-sha256
HostKeyAlgorithms ssh-ed25519-cert-v01@openssh.com,ssh-ed25519
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com

LogLevel VERBOSE
ClientAliveInterval 300
ClientAliveCountMax 2
MaxAuthTries 3
LoginGraceTime 30

Drop that on a host, run sshd -t to syntax-check, restart sshd, and verify with ssh -v from the client that authentication succeeds via publickey with the cert. The verbose output names which CA principal matched.

Where to go from here

The companion piece on replacing static SSH keys over 90 days puts these configurations into a rollout schedule. The Series A overview explains why this pattern beats a static-key fleet for a 5 to 15 person team. The security page documents how Linux Identity custodies the CA private key (managed KMS, ECDSA P-256, FIPS 140-2 Level 2).