November 19, 2025 – The Rails Drop

Stack: Ruby 3+, Rails 7+
Audience: Backend engineers building or maintaining production-grade Rails services
Goal: Add real-time observability and on-call alerting to a critical business process

Introduction

When you’re running an enterprise web application, two questions keep engineering teams up at night:

“Is our system healthy right now?”
“If something breaks at 3 AM, will we know before our customers do?”

Datadog and PagerDuty together answer both. Datadog gives you the metrics, dashboards, and visibility. PagerDuty turns critical metrics into actionable alerts that reach the right person at the right time. This post walks you through integrating both into a Rails 7+ application — from gem installation to a live production dashboard — using a real-world billing health monitor as the example.

What is Datadog?

Datadog is a cloud-based observability and monitoring platform. It collects metrics, traces, and logs from your infrastructure and applications and surfaces them in a unified UI.

Core capabilities relevant to Rails apps:

Feature	What it does
APM (Application Performance Monitoring)	Traces every Rails request, shows latency, errors, and bottlenecks
StatsD / DogStatsD	Accepts custom business metrics (gauges, counters, histograms) via UDP
Dashboards	Visualize any metric over time — single chart or full ops dashboard
Monitors & Alerts	Trigger notifications when a metric crosses a threshold
Log Management	Centralized log search and correlation with traces
Infrastructure Monitoring	CPU, memory, disk — the full host/container picture

For this guide, we focus on custom business metrics via DogStatsD — the most powerful and underused feature for application teams.

What is PagerDuty?

PagerDuty is an incident management platform. When something breaks in production, PagerDuty decides who gets notified, how (phone call, SMS, push notification, Slack), and when to escalate if the alert isn’t acknowledged.

Key concepts:

Concept	Description
Service	A logical grouping of alerts (e.g., “Billing Service”)
Integration Key	The secret key your app uses to send events to a PagerDuty service
Incident	A triggered alert that requires human acknowledgment
Dedup Key	A unique string that prevents duplicate incidents for the same root cause
Escalation Policy	Defines who gets paged and in what order if the incident isn’t acknowledged
Severity	`critical`, `error`, `warning`, or `info`

PagerDuty integrates with Datadog (you can alert from DD monitors), but for critical business logic alerts — like a billing pipeline failing — it’s often better to trigger PagerDuty directly from your application code, giving you full control over deduplication and context.

Why These Are Must-Have Integrations for Enterprise Apps

If you’re running any of the following, you need both:

Scheduled jobs / cron tasks that process money, orders, or user data
Background workers (Sidekiq, Delayed Job) that can silently fail
Third-party payment or fulfillment pipelines with no built-in alerting
SLAs that require uptime or processing guarantees
On-call rotations where the right person needs to be paged — not just an email inbox

The core problem both solve: Rails applications fail silently. A rescue clause that logs an error to Rails.logger does nothing at 2 AM. A Sidekiq deadlock on your billing job won’t send you an email. Without Datadog and PagerDuty:

You find out about failures from customers, not dashboards
You can’t tell when a metric degraded or how long it’s been broken
There’s no escalation path — the alert that fires at 3 AM goes nowhere

With both integrated, you get: visibility (Datadog) + accountability (PagerDuty).

Architecture Overview

			
Rails App / Cron Job
       │
       ├──► Datadog Agent (UDP :8125)
       │         └──► Datadog Cloud ──► Dashboard / Monitor
       │
       └──► PagerDuty Events API (HTTPS)
                  └──► On-call Engineer ──► Slack / Phone / SMS

		

The Datadog Agent runs as a daemon on your server or as a sidecar container. Your app sends lightweight UDP packets to it (fire-and-forget). The agent batches and forwards them to Datadog’s cloud.

PagerDuty receives events over HTTPS directly from your app — no local agent needed.

Part 1: Datadog Integration

1.1 Install the Gems

			
# Gemfile
gem 'ddtrace', '~> 2.0'       # APM tracing
gem 'dogstatsd-ruby', '~> 5.0' # Custom metrics via StatsD

bundle install

1.2 Configure the Datadog Initializer

Create config/initializers/datadog.rb:

			
require 'datadog/statsd'
require 'datadog'
enabled = Rails.application.credentials[Rails.env.to_sym][:datadog_integration_enabled]
service_name = "myapp-#{Rails.env}"
Datadog.configure do |c|
  c.tracing.enabled = enabled
  c.runtime_metrics.enabled = enabled
  c.tracing.instrument :rails, service_name: service_name
  c.tracing.instrument :rake, enabled: false  # avoid tracing long-running tasks
  # Consolidate HTTP client spans under one service name to reduce noise
  c.tracing.instrument :faraday,     service_name: service_name
  c.tracing.instrument :httpclient,  service_name: service_name
  c.tracing.instrument :http,        service_name: service_name
  c.tracing.instrument :rest_client, service_name: service_name
end

		

Store the flag in Rails credentials:

rails credentials:edit --environment production

			
# config/credentials/production.yml.enc
datadog_integration_enabled: true

Important: The datadog_integration_enabled flag controls APM tracing only. Custom StatsD metrics (gauges, counters) are sent by Datadog::Statsd regardless of this flag — as long as the Datadog Agent is running.

1.3 Install and Configure the Datadog Agent

The Datadog Agent must be running on the host where your app runs. It listens for UDP packets on port 8125 and forwards them to Datadog’s cloud.

Docker Compose (recommended for containerized apps):

			
# docker-compose.yml
services:
  app:
    environment:
      DD_AGENT_HOST: datadog-agent
      DD_DOGSTATSD_PORT: 8125
  datadog-agent:
    image: datadog/agent:latest
    environment:
      DD_API_KEY: ${DATADOG_API_KEY}
      DD_DOGSTATSD_NON_LOCAL_TRAFFIC: "true"
    ports:
      - "8125:8125/udp"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /proc/:/host/proc/:ro
      - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro

		

Bare metal / VM:

			
DD_API_KEY=your_api_key bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"

1.4 Emit Custom Business Metrics

Now the interesting part — emitting metrics from your business logic.

Create a service class for a billing health check at app/lib/monitoring/billing_health_check.rb:

			
# frozen_string_literal: true
class Monitoring::BillingHealthCheck
  UNBILLED_THRESHOLD = ENV.fetch('BILLING_UNBILLED_THRESHOLD', 10).to_i
  def initialize(date:)
    @date = date
  end
  def run
    results = collect_metrics
    fire_datadog_metrics(results)
    alert_if_unhealthy(results)
    results
  end
  private
    def collect_metrics
      billed_ids  = BillingRecord.where(date: @date).pluck(:order_id)
      missing_order_ids    = billed_ids - Order.where(date: @date).ids
      unbilled_count     = Order.active.where(week: @date, billed: false).count
      failed_charges     = Order.joins(:bills)
                                .where(date: @date, billed: false, bills: { success: false })
                                .distinct
                                .count
      {
        missing_order_ids:              missing_order_ids,
        missing_order_records_count: missing_order_ids.size,
        unbilled_orders_count:         unbilled_count,
        failed_charges_count:          failed_charges
      }
    end
    def fire_datadog_metrics(results)
      host   = ENV.fetch('DD_AGENT_HOST', '127.0.0.1')
      port   = ENV.fetch('DD_DOGSTATSD_PORT', 8125).to_i
      statsd = Datadog::Statsd.new(host, port)
      statsd.gauge('billing.unbilled_orders',          results[:unbilled_orders_count])
      statsd.gauge('billing.missing_billing_records',  results[:missing_billing_records_count])
      statsd.gauge('billing.failed_charges',           results[:failed_charges_count])
      statsd.close
    rescue => e
      Rails.logger.error("Failed to emit Datadog metrics: #{e.message}")
    end
    # ... alerting covered in Part 2
end

		

Why Datadog::Statsd.new(host, port) instead of Datadog::Statsd.new?

The no-argument form defaults to 127.0.0.1:8125. In containerized environments, the Datadog Agent runs as a separate container/service with a different hostname. Always read the host from an environment variable so the code works in every environment without changes.

1.5 Choosing the Right Metric Type

Type	Method	Use when
Gauge	`statsd.gauge('name', value)`	Current snapshot value (queue depth, count at a point in time)
Counter	`statsd.increment('name')`	Counting occurrences (requests, errors)
Histogram	`statsd.histogram('name', value)`	Distribution of values (response times, batch sizes)
Timing	`statsd.timing('name', ms)`	Duration in milliseconds

For billing health metrics — unbilled orders, failed charges — gauge is correct because you want the current count, not a running total.

1.6 Debugging: Why Aren’t My Metrics Appearing?

This is the most common issue. Because StatsD uses UDP, failures are completely silent.

Checklist:

			
# 1. Is the Datadog Agent reachable from your app container/host?
# Run in Rails console:
require 'socket'
UDPSocket.new.send("test:1|g", 0, ENV.fetch('DD_AGENT_HOST', '127.0.0.1'), 8125)
# 2. Send a test gauge and wait 2-3 minutes
statsd = Datadog::Statsd.new(ENV.fetch('DD_AGENT_HOST', '127.0.0.1'), 8125)
statsd.gauge('debug.connectivity_test', 1)
statsd.close
puts "Sent — check Datadog metric/explorer in 2-3 minutes"
# 3. Check if the integration flag is blocking APM (not metrics, but worth knowing)
Rails.application.credentials[Rails.env.to_sym][:datadog_integration_enabled]

		

Then in the Datadog UI:

Go to Metrics → Explorer
Type your metric name (e.g., billing.) in the graph field — it should autocomplete
If it doesn’t autocomplete after 5 minutes, the agent is not receiving the packets

Common root causes in staging/dev environments:

Symptom	Likely cause
No metrics in any env	Agent not running or wrong host
Metrics in production only	`DD_AGENT_HOST` not set, defaults to `127.0.0.1` but agent is on a different host in staging
Intermittent metrics	UDP packet loss (rare, but can happen under high load)

Part 2: PagerDuty Integration

2.1 Install the Gem

			
# Gemfile
gem 'pagerduty', '~> 3.0'

bundle install

2.2 Create a PagerDuty Service

Log in to PagerDuty → Services → Service Directory → + New Service
Name it (e.g., “Billing Pipeline”)
Under Integrations, select “Use our API directly” → choose Events API v2
Copy the Integration Key — you’ll need this in credentials

2.3 Store Credentials Securely

rails credentials:edit --environment production

			
# config/credentials/production.yml.enc
pagerduty_billing_integration_key: your_integration_key_here
google_chat_monitoring_webhook: https://chat.googleapis.com/v1/spaces/...

2.4 Create a PagerDuty Wrapper

Create a lightweight wrapper at app/lib/pagerduty/wrapper.rb:

			
# frozen_string_literal: true
class Pagerduty::Wrapper
  def initialize(integration_key:, api_version: 2)
    @integration_key = integration_key
    @api_version     = api_version
  end
  def client
    @client ||= Pagerduty.build(
      integration_key: @integration_key,
      api_version:     @api_version
    )
  end
end

		

2.5 Wire Up Alerting in Your Service Class

Continuing the billing health check class:

			
def alert_if_unhealthy(results)
  issues = []
  if results[:missing_billing_records_count] > 0
    missing_names = results[:missing_regions].map(&:name).join(', ')
    issues << "Missing billing records for regions: #{missing_names}"
  end
  if results[:unbilled_orders_count] > UNBILLED_THRESHOLD
    issues << "#{results[:unbilled_orders_count]} unbilled orders (threshold: #{UNBILLED_THRESHOLD})"
  end
  return if issues.empty?
  summary = build_alert_summary(results, issues)
  trigger_pagerduty(summary)
  send_google_chat_notification(summary)
end
private
  def build_alert_summary(results, issues)
    [
      "Billing Health Check FAILED at #{Time.zone.now.strftime('%Y-%m-%d %H:%M:%S %Z')}",
      "Week: #{@billing_week}",
      *issues,
      "Failed charges: #{results[:failed_charges_count]}"
    ].join(" | ")
  end
  def trigger_pagerduty(summary)
    dedup_key = "billing-health-#{@billing_week}"
    Pagerduty::Wrapper.new(
      integration_key: pagerduty_integration_key
    ).client.incident(dedup_key).trigger(
      summary:  summary,
      source:   Rails.application.routes.default_url_options[:host],
      severity: "critical"
    )
  rescue => e
    Rails.logger.error("Failed to trigger PagerDuty: #{e.message}")
  end
  def send_google_chat_notification(message)
    # Post to your team's Google Chat / Slack webhook
    HTTParty.post(
      google_chat_webhook,
      body:    { text: message }.to_json,
      headers: { 'Content-Type' => 'application/json' }
    )
  rescue => e
    Rails.logger.error("Failed to send Google Chat notification: #{e.message}")
  end
  def pagerduty_integration_key
    Rails.application.credentials[Rails.env.to_sym][:pagerduty_billing_integration_key]
  end
  def google_chat_webhook
    Rails.application.credentials[Rails.env.to_sym][:google_chat_monitoring_webhook]
  end

		

2.6 The Dedup Key — Why It Matters

dedup_key = "billing-health-#{@billing_week}"

PagerDuty uses the dedup_key to group events about the same incident. If your billing check runs at 8:30 AM and again at 9:00 AM (e.g., after a retry), PagerDuty will update the existing incident instead of creating a second one and paging your on-call engineer twice.

Best practices for dedup keys:

Make them specific to the root cause, not the timestamp
Include the resource identifier (week date, job ID, etc.)
Use a format like {service}-{resource}-{date} for easy filtering in PagerDuty

Happy Integration!

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Day: November 19, 2025

How to Integrate Datadog and PagerDuty into an Enterprise Rails Application – Part 1