LetsEncrypt – The Rails Drop

Background

We recently provisioned two new staging environments (staging1.mydomain.com and staging2.mydomain.com) mirroring our production Rails infrastructure. Production uses a Cloudflare load balancer fronting multiple origin servers running nginx, with a cron-driven script to renew Let’s Encrypt certificates across all origins.

When we added the staging environments, the existing renew_certificate.sh cron script wasn’t set up on them yet — so the certificates expired. This post documents everything we encountered trying to fix it, every error we hit, and how we resolved each one.

The Renewal Architecture

Before diving in, it’s worth understanding how SSL renewal works in this setup:

			
Cloudflare (DNS + Load Balancer)
        ↓
   nginx (origin server)
   /apps/mydomain/current  ← Rails app lives here
   /etc/letsencrypt/live/   ← Certs live here

		

The renewal script (scripts/cron/renew_certificate.sh) does the following:

Fetches the load balancer pool config from the Cloudflare API
Disables all origin servers in the pool except the GCP instance (takes servers out of rotation)
Turns off Cloudflare’s “Always Use HTTPS” setting (allows HTTP for the ACME challenge)
Runs sudo certbot renew locally
Copies the new cert to all other origin servers via SCP and SSH
Re-enables all origin servers in the Cloudflare pool
Re-enables “Always Use HTTPS”

The problem: this script was never added to the cron on staging1/staging2, so the certs expired.

First Attempt: Running the Renewal Script Manually

SSH’d into staging2 and ran:

bash /apps/mydomain/current/scripts/cron/renew_certificate.sh

Error #1: RSpec / webmock LoadError

			
An error occurred while loading spec_helper. - Did you mean?
                    rspec ./spec/helper.rb
Failure/Error: require 'webmock/rspec'
LoadError:
  cannot load such file -- webmock/rspec

		

What happened: The script calls bundle exec rake google_chat:send_message[...] to send failure notifications to Google Chat. On staging, test gems like webmock aren’t installed in the bundle, so the rake task blew up loading the Rails environment.

Lesson: This is a notification side-effect, not the core renewal logic. But it masked the real error.

Error #2: certbot failing because port 80 was in use

After isolating the issue, running sudo certbot renew directly gave:

			
Renewing an existing certificate for staging2.mydomain.ca and www.staging2.mydomain.ca
Failed to renew certificate staging2.mydomain.ca with error: Could not bind TCP port 80
because it is already in use by another process on this system (such as a web server).
Please stop the program in question and then try again.

What happened: The original certificate was issued using certbot’s standalone authenticator, which spins up its own HTTP server on port 80 to answer the ACME challenge. Since nginx was already running on port 80, the renewal failed.

Meanwhile there was a second certificate (staging2.mydomain.ca-0001) that had been created earlier with sudo certbot --nginx -d staging2.mydomain.ca. This cert was valid — but it created a mess.

Inspecting the Damage

sudo certbot certificates

Output:

			
Renewal configuration file /etc/letsencrypt/renewal/staging2.mydomain.ca.conf produced
an unexpected error: expected /etc/letsencrypt/live/staging2.mydomain.ca-0001/cert.pem
to be a symlink. Skipping.
The following renewal configurations were invalid:
  /etc/letsencrypt/renewal/staging2.mydomain.ca.conf

		

The nginx config at /etc/nginx/sites-enabled/mydomain was also a mess — certbot had injected its own server block for the HTTP→HTTPS redirect, and the two 443 server blocks were pointing to different cert paths:

			
# Certbot-injected block (unwanted)
server {
    if ($host = staging2.mydomain.ca) {
        return 301 https://$host$request_uri;
    } # managed by Certbot
  ...
}
# Redirect server pointing to -0001 certs (also unwanted)
server {
  server_name   staging2.mydomain.ca;
  listen        443 ssl http2;
  ssl_certificate /etc/letsencrypt/live/staging2.mydomain.ca-0001/fullchain.pem; # managed by Certbot
  ssl_certificate_key /etc/letsencrypt/live/staging2.mydomain.ca-0001/privkey.pem; # managed by Certbot
  ...
}
# Main www server pointing to original path
server {
  server_name   www.staging2.mydomain.ca;
  listen        443 ssl http2;
  ssl_certificate     /etc/letsencrypt/live/staging2.mydomain.ca/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/staging2.mydomain.ca/privkey.pem;
  ...
}

		

The Fix

Step 1: Remove all broken certbot state

			
sudo rm -f /etc/letsencrypt/renewal/staging2.mydomain.ca.conf
sudo rm -f /etc/letsencrypt/renewal/staging2.mydomain.ca-0001.conf
sudo rm -rf /etc/letsencrypt/live/staging2.mydomain.ca
sudo rm -rf /etc/letsencrypt/live/staging2.mydomain.ca-0001
sudo rm -rf /etc/letsencrypt/archive/staging2.mydomain.ca
sudo rm -rf /etc/letsencrypt/archive/staging2.mydomain.ca-0001

		

Step 2: Stop nginx and get a fresh cert with standalone authenticator

			
sudo service nginx stop
sudo certbot certonly --standalone -d staging2.mydomain.ca -d www.staging2.mydomain.ca
sudo service nginx start

This gave us a clean, single certificate at /etc/letsencrypt/live/staging2.mydomain.ca/.

Step 3: Clean up the nginx config

Removed the certbot-injected if ($host = ...) server block, and updated both 443 server blocks to point to the same cert path:

			
ssl_certificate     /etc/letsencrypt/live/staging2.mydomain.ca/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/staging2.mydomain.ca/privkey.pem;

Reloaded nginx:

sudo service nginx reload

The site was live again with a valid cert.

Making Future Renewals Work Without Stopping nginx

The next problem: the cert renewal config was still using standalone authenticator. Future automated renewals would fail again the moment nginx was running.

The fix is to switch to the webroot authenticator. Our nginx config already had an ACME challenge location block:

			
location ^~ /.well-known/acme-challenge/ {
    root         /apps/certbot;
    default_type "text/plain";
    allow        all;
}

		

This means certbot can write a challenge file to /apps/certbot and nginx will serve it over HTTP — no need to stop nginx.

Attempt 1: Manually edit the renewal config

Edited /etc/letsencrypt/renewal/staging2.mydomain.ca.conf:

			
[renewalparams]
authenticator = webroot
server = https://acme-v02.api.letsencrypt.org/directory
key_type = ecdsa
[[webroot]]
staging2.mydomain.ca = /apps/certbot
www.staging2.mydomain.ca = /apps/certbot

		

Dry run:

sudo certbot renew --dry-run

Error #3: webroot mapping not found

			
Failed to renew certificate staging2.mydomain.ca with error: Missing command line flag or
config entry for this setting:
Input the webroot for staging2.mydomain.ca:

The config looked correct but certbot was still asking interactively. This is a known certbot quirk — manually converting a standalone config to webroot doesn’t always work reliably because of how certbot parses its internal config format.

Attempt 2: Delete and re-issue with webroot from the start (this worked)

			
sudo certbot delete --cert-name staging2.mydomain.ca
sudo mkdir -p /apps/certbot
sudo certbot certonly --webroot -w /apps/certbot \
  -d staging2.mydomain.ca \
  -d www.staging2.mydomain.ca

		

This time certbot generated the renewal config correctly itself. Dry run:

sudo certbot renew --dry-run

			
Simulating renewal of an existing certificate for staging2.mydomain.ca and www.staging2.mydomain.ca
Congratulations, all simulated renewals succeeded:
  /etc/letsencrypt/live/staging2.mydomain.ca/fullchain.pem (success)

Key Lessons

Never run certbot --nginx on a server where you manage the nginx config manually. It injects its own server blocks and creates confusing duplicate certs with -0001 suffixes.
Standalone vs webroot authenticator: Standalone is simpler to set up initially but requires stopping nginx. Webroot is the right choice for servers where nginx runs continuously — as long as you have the ACME challenge location block configured.
Manually editing certbot renewal configs is fragile. Let certbot generate the renewal config by passing the correct authenticator flags at issuance time.
certbot renew --dry-run is your best friend. Always confirm future renewals will work before leaving the server. Discovering a broken renewal config 2 days before expiry is stressful.
Let’s Encrypt ACME server outages are real but brief. If dry-run fails with “The service is down for maintenance”, check https://letsencrypt.status.io/ and retry in a few hours.

A Clean Auto-Renewal Script for nginx + webroot

Here’s a standalone script you can drop into any server using this stack. It handles renewal, nginx reload, and sends a notification if anything fails. It assumes the webroot authenticator is already configured in the certbot renewal config.

			
#!/bin/bash
# /etc/cron.d/certbot-renew or called via crontab
# Requires: certbot, nginx, curl (for Slack/Google Chat webhook)
set -euo pipefail
CERT_NAME="${CERT_NAME:-}"                          # e.g. staging2.mydomain.ca
NOTIFY_WEBHOOK="${NOTIFY_WEBHOOK:-}"                # Slack or Google Chat webhook URL
ACME_WEBROOT="${ACME_WEBROOT:-/apps/certbot}"
LOG_FILE="/var/log/certbot-renew.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
log() {
  echo "[$TIMESTAMP] $*" | tee -a "$LOG_FILE"
}
notify() {
  local message="$1"
  log "NOTIFY: $message"
  if [[ -n "$NOTIFY_WEBHOOK" ]]; then
    curl -s -X POST "$NOTIFY_WEBHOOK" \
      -H "Content-Type: application/json" \
      -d "{\"text\": \"$message\"}" \
      >> "$LOG_FILE" 2>&1 || true
  fi
}
# Ensure webroot directory exists
mkdir -p "$ACME_WEBROOT"
log "Starting certificate renewal..."
# Attempt renewal
RENEW_OUTPUT=$(sudo certbot renew \
  --quiet \
  --non-interactive \
  ${CERT_NAME:+--cert-name "$CERT_NAME"} \
  2>&1) || {
  notify "SSL RENEWAL FAILED on $(hostname): $RENEW_OUTPUT"
  log "ERROR: $RENEW_OUTPUT"
  exit 1
}
# Check if any cert was actually renewed (certbot exits 0 even if nothing renewed)
if echo "$RENEW_OUTPUT" | grep -q "Congratulations"; then
  log "Certificate renewed. Reloading nginx..."
  sudo service nginx reload || {
    notify "SSL RENEWAL WARNING on $(hostname): cert renewed but nginx reload failed!"
    exit 1
  }
  notify "SSL cert successfully renewed on $(hostname)"
  log "Done."
else
  log "No certificates due for renewal. Nothing to do."
fi

		

Usage

			
# Set executable
chmod +x /usr/local/bin/certbot-renew.sh
# Set environment variables and run
CERT_NAME=staging2.mydomain.ca \
NOTIFY_WEBHOOK=https://chat.googleapis.com/v1/spaces/.../messages?key=... \
/usr/local/bin/certbot-renew.sh

		

Crontab entry (runs twice daily — Let’s Encrypt recommendation)

			
0 3,15 * * * deployer CERT_NAME=staging2.mydomain.ca NOTIFY_WEBHOOK=https://... /usr/local/bin/certbot-renew.sh

Running twice daily ensures that if one attempt fails due to a transient ACME server issue, the next attempt 12 hours later will succeed — giving you plenty of time before expiry.

Summary

Problem	Root Cause	Fix
certbot failed to bind port 80	`standalone` authenticator conflicted with nginx	Switch to `webroot` authenticator
Duplicate `-0001` cert created	Ran `certbot --nginx` after standalone cert existed	Delete all cert state, re-issue cleanly
nginx serving expired cert	Mixed cert paths after certbot injected its own config	Manually fix nginx config to consistent paths
Manual webroot config edit didn’t work	Certbot’s conf format is fragile when converted manually	Delete and re-issue with `--webroot` flag from scratch

Happy Debugging!

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Tag: LetsEncrypt

Fixing Let’s Encrypt SSL Certificate Renewal on a Server: A Step-by-Step Guide