Background
We recently provisioned two new staging environments (staging1.mydomain.com and staging) mirroring our production Rails infrastructure. Production uses a Cloudflare load balancer fronting multiple origin servers running nginx, with a cron-driven script to renew Let’s Encrypt certificates across all origins.2.mydomain.com
When we added the staging environments, the existing renew_certificate.sh cron script wasn’t set up on them yet — so the certificates expired. This post documents everything we encountered trying to fix it, every error we hit, and how we resolved each one.
The Renewal Architecture
Before diving in, it’s worth understanding how SSL renewal works in this setup:
Cloudflare (DNS + Load Balancer) ↓ nginx (origin server) /apps/mydomain/current ← Rails app lives here /etc/letsencrypt/live/ ← Certs live here
The renewal script (scripts/cron/renew_certificate.sh) does the following:
- Fetches the load balancer pool config from the Cloudflare API
- Disables all origin servers in the pool except the GCP instance (takes servers out of rotation)
- Turns off Cloudflare’s “Always Use HTTPS” setting (allows HTTP for the ACME challenge)
- Runs
sudo certbot renewlocally - Copies the new cert to all other origin servers via SCP and SSH
- Re-enables all origin servers in the Cloudflare pool
- Re-enables “Always Use HTTPS”
The problem: this script was never added to the cron on staging1/staging2, so the certs expired.
First Attempt: Running the Renewal Script Manually
SSH’d into staging2 and ran:
bash /apps/mydomain/current/scripts/cron/renew_certificate.sh
Error #1: RSpec / webmock LoadError
An error occurred while loading spec_helper. - Did you mean? rspec ./spec/helper.rbFailure/Error: require 'webmock/rspec'LoadError: cannot load such file -- webmock/rspec
What happened: The script calls bundle exec rake google_chat:send_message[...] to send failure notifications to Google Chat. On staging, test gems like webmock aren’t installed in the bundle, so the rake task blew up loading the Rails environment.
Lesson: This is a notification side-effect, not the core renewal logic. But it masked the real error.
Error #2: certbot failing because port 80 was in use
After isolating the issue, running sudo certbot renew directly gave:
Renewing an existing certificate for staging2.mydomain.ca and www.staging2.mydomain.caFailed to renew certificate staging2.mydomain.ca with error: Could not bind TCP port 80because it is already in use by another process on this system (such as a web server).Please stop the program in question and then try again.
What happened: The original certificate was issued using certbot’s standalone authenticator, which spins up its own HTTP server on port 80 to answer the ACME challenge. Since nginx was already running on port 80, the renewal failed.
Meanwhile there was a second certificate (staging2.) that had been created earlier with mydomain.ca-0001sudo certbot --nginx -d staging2.. This cert was valid — but it created a mess.mydomain.ca
Inspecting the Damage
sudo certbot certificates
Output:
Renewal configuration file /etc/letsencrypt/renewal/staging2.mydomain.ca.conf producedan unexpected error: expected /etc/letsencrypt/live/staging2.mydomain.ca-0001/cert.pemto be a symlink. Skipping.The following renewal configurations were invalid: /etc/letsencrypt/renewal/staging2.mydomain.ca.conf
The nginx config at /etc/nginx/sites-enabled/ was also a mess — certbot had injected its own server block for the HTTP→HTTPS redirect, and the two 443 server blocks were pointing to different cert paths:mydomain
# Certbot-injected block (unwanted)server { if ($host = staging2.mydomain.ca) { return 301 https://$host$request_uri; } # managed by Certbot ...}# Redirect server pointing to -0001 certs (also unwanted)server { server_name staging2.mydomain.ca; listen 443 ssl http2; ssl_certificate /etc/letsencrypt/live/staging2.mydomain.ca-0001/fullchain.pem; # managed by Certbot ssl_certificate_key /etc/letsencrypt/live/staging2.mydomain.ca-0001/privkey.pem; # managed by Certbot ...}# Main www server pointing to original pathserver { server_name www.staging2.mydomain.ca; listen 443 ssl http2; ssl_certificate /etc/letsencrypt/live/staging2.mydomain.ca/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/staging2.mydomain.ca/privkey.pem; ...}
The Fix
Step 1: Remove all broken certbot state
sudo rm -f /etc/letsencrypt/renewal/staging2.mydomain.ca.confsudo rm -f /etc/letsencrypt/renewal/staging2.mydomain.ca-0001.confsudo rm -rf /etc/letsencrypt/live/staging2.mydomain.casudo rm -rf /etc/letsencrypt/live/staging2.mydomain.ca-0001sudo rm -rf /etc/letsencrypt/archive/staging2.mydomain.casudo rm -rf /etc/letsencrypt/archive/staging2.mydomain.ca-0001
Step 2: Stop nginx and get a fresh cert with standalone authenticator
sudo service nginx stopsudo certbot certonly --standalone -d staging2.mydomain.ca -d www.staging2.mydomain.casudo service nginx start
This gave us a clean, single certificate at /etc/letsencrypt/live/staging2.mydomain.ca/.
Step 3: Clean up the nginx config
Removed the certbot-injected if ($host = ...) server block, and updated both 443 server blocks to point to the same cert path:
ssl_certificate /etc/letsencrypt/live/staging2.mydomain.ca/fullchain.pem;ssl_certificate_key /etc/letsencrypt/live/staging2.mydomain.ca/privkey.pem;
Reloaded nginx:
sudo service nginx reload
The site was live again with a valid cert.
Making Future Renewals Work Without Stopping nginx
The next problem: the cert renewal config was still using standalone authenticator. Future automated renewals would fail again the moment nginx was running.
The fix is to switch to the webroot authenticator. Our nginx config already had an ACME challenge location block:
location ^~ /.well-known/acme-challenge/ { root /apps/certbot; default_type "text/plain"; allow all;}
This means certbot can write a challenge file to /apps/certbot and nginx will serve it over HTTP — no need to stop nginx.
Attempt 1: Manually edit the renewal config
Edited /etc/letsencrypt/renewal/staging2.mydomain.ca.conf:
[renewalparams]authenticator = webrootserver = https://acme-v02.api.letsencrypt.org/directorykey_type = ecdsa[[webroot]]staging2.mydomain.ca = /apps/certbotwww.staging2.mydomain.ca = /apps/certbot
Dry run:
sudo certbot renew --dry-run
Error #3: webroot mapping not found
Failed to renew certificate staging2.mydomain.ca with error: Missing command line flag orconfig entry for this setting:Input the webroot for staging2.mydomain.ca:
The config looked correct but certbot was still asking interactively. This is a known certbot quirk — manually converting a standalone config to webroot doesn’t always work reliably because of how certbot parses its internal config format.
Attempt 2: Delete and re-issue with webroot from the start (this worked)
sudo certbot delete --cert-name staging2.mydomain.casudo mkdir -p /apps/certbotsudo certbot certonly --webroot -w /apps/certbot \ -d staging2.mydomain.ca \ -d www.staging2.mydomain.ca
This time certbot generated the renewal config correctly itself. Dry run:
sudo certbot renew --dry-run
Simulating renewal of an existing certificate for staging2.mydomain.ca and www.staging2.mydomain.caCongratulations, all simulated renewals succeeded: /etc/letsencrypt/live/staging2.mydomain.ca/fullchain.pem (success)
Key Lessons
- Never run
certbot --nginxon a server where you manage the nginx config manually. It injects its own server blocks and creates confusing duplicate certs with-0001suffixes. - Standalone vs webroot authenticator: Standalone is simpler to set up initially but requires stopping nginx. Webroot is the right choice for servers where nginx runs continuously — as long as you have the ACME challenge location block configured.
- Manually editing certbot renewal configs is fragile. Let certbot generate the renewal config by passing the correct authenticator flags at issuance time.
certbot renew --dry-runis your best friend. Always confirm future renewals will work before leaving the server. Discovering a broken renewal config 2 days before expiry is stressful.- Let’s Encrypt ACME server outages are real but brief. If dry-run fails with “The service is down for maintenance”, check https://letsencrypt.status.io/ and retry in a few hours.
A Clean Auto-Renewal Script for nginx + webroot
Here’s a standalone script you can drop into any server using this stack. It handles renewal, nginx reload, and sends a notification if anything fails. It assumes the webroot authenticator is already configured in the certbot renewal config.
#!/bin/bash# /etc/cron.d/certbot-renew or called via crontab# Requires: certbot, nginx, curl (for Slack/Google Chat webhook)set -euo pipefailCERT_NAME="${CERT_NAME:-}" # e.g. staging2.mydomain.caNOTIFY_WEBHOOK="${NOTIFY_WEBHOOK:-}" # Slack or Google Chat webhook URLACME_WEBROOT="${ACME_WEBROOT:-/apps/certbot}"LOG_FILE="/var/log/certbot-renew.log"TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')log() { echo "[$TIMESTAMP] $*" | tee -a "$LOG_FILE"}notify() { local message="$1" log "NOTIFY: $message" if [[ -n "$NOTIFY_WEBHOOK" ]]; then curl -s -X POST "$NOTIFY_WEBHOOK" \ -H "Content-Type: application/json" \ -d "{\"text\": \"$message\"}" \ >> "$LOG_FILE" 2>&1 || true fi}# Ensure webroot directory existsmkdir -p "$ACME_WEBROOT"log "Starting certificate renewal..."# Attempt renewalRENEW_OUTPUT=$(sudo certbot renew \ --quiet \ --non-interactive \ ${CERT_NAME:+--cert-name "$CERT_NAME"} \ 2>&1) || { notify "SSL RENEWAL FAILED on $(hostname): $RENEW_OUTPUT" log "ERROR: $RENEW_OUTPUT" exit 1}# Check if any cert was actually renewed (certbot exits 0 even if nothing renewed)if echo "$RENEW_OUTPUT" | grep -q "Congratulations"; then log "Certificate renewed. Reloading nginx..." sudo service nginx reload || { notify "SSL RENEWAL WARNING on $(hostname): cert renewed but nginx reload failed!" exit 1 } notify "SSL cert successfully renewed on $(hostname)" log "Done."else log "No certificates due for renewal. Nothing to do."fi
Usage
# Set executablechmod +x /usr/local/bin/certbot-renew.sh# Set environment variables and runCERT_NAME=staging2.mydomain.ca \NOTIFY_WEBHOOK=https://chat.googleapis.com/v1/spaces/.../messages?key=... \/usr/local/bin/certbot-renew.sh
Crontab entry (runs twice daily — Let’s Encrypt recommendation)
0 3,15 * * * deployer CERT_NAME=staging2.mydomain.ca NOTIFY_WEBHOOK=https://... /usr/local/bin/certbot-renew.sh
Running twice daily ensures that if one attempt fails due to a transient ACME server issue, the next attempt 12 hours later will succeed — giving you plenty of time before expiry.
Summary
| Problem | Root Cause | Fix |
|---|---|---|
| certbot failed to bind port 80 | standalone authenticator conflicted with nginx | Switch to webroot authenticator |
Duplicate -0001 cert created | Ran certbot --nginx after standalone cert existed | Delete all cert state, re-issue cleanly |
| nginx serving expired cert | Mixed cert paths after certbot injected its own config | Manually fix nginx config to consistent paths |
| Manual webroot config edit didn’t work | Certbot’s conf format is fragile when converted manually | Delete and re-issue with --webroot flag from scratch |
Happy Debugging!