How to Integrate Datadog and PagerDuty into an Enterprise Rails Application – Part 1

Stack: Ruby 3+, Rails 7+
Audience: Backend engineers building or maintaining production-grade Rails services
Goal: Add real-time observability and on-call alerting to a critical business process


Introduction

When you’re running an enterprise web application, two questions keep engineering teams up at night:

  1. “Is our system healthy right now?”
  2. “If something breaks at 3 AM, will we know before our customers do?”

Datadog and PagerDuty together answer both. Datadog gives you the metrics, dashboards, and visibility. PagerDuty turns critical metrics into actionable alerts that reach the right person at the right time. This post walks you through integrating both into a Rails 7+ application — from gem installation to a live production dashboard — using a real-world billing health monitor as the example.

What is Datadog?

Datadog is a cloud-based observability and monitoring platform. It collects metrics, traces, and logs from your infrastructure and applications and surfaces them in a unified UI.

Core capabilities relevant to Rails apps:

FeatureWhat it does
APM (Application Performance Monitoring)Traces every Rails request, shows latency, errors, and bottlenecks
StatsD / DogStatsDAccepts custom business metrics (gauges, counters, histograms) via UDP
DashboardsVisualize any metric over time — single chart or full ops dashboard
Monitors & AlertsTrigger notifications when a metric crosses a threshold
Log ManagementCentralized log search and correlation with traces
Infrastructure MonitoringCPU, memory, disk — the full host/container picture

For this guide, we focus on custom business metrics via DogStatsD — the most powerful and underused feature for application teams.


What is PagerDuty?

PagerDuty is an incident management platform. When something breaks in production, PagerDuty decides who gets notified, how (phone call, SMS, push notification, Slack), and when to escalate if the alert isn’t acknowledged.

Key concepts:

ConceptDescription
ServiceA logical grouping of alerts (e.g., “Billing Service”)
Integration KeyThe secret key your app uses to send events to a PagerDuty service
IncidentA triggered alert that requires human acknowledgment
Dedup KeyA unique string that prevents duplicate incidents for the same root cause
Escalation PolicyDefines who gets paged and in what order if the incident isn’t acknowledged
Severitycritical, error, warning, or info

PagerDuty integrates with Datadog (you can alert from DD monitors), but for critical business logic alerts — like a billing pipeline failing — it’s often better to trigger PagerDuty directly from your application code, giving you full control over deduplication and context.


Why These Are Must-Have Integrations for Enterprise Apps

If you’re running any of the following, you need both:

  • Scheduled jobs / cron tasks that process money, orders, or user data
  • Background workers (Sidekiq, Delayed Job) that can silently fail
  • Third-party payment or fulfillment pipelines with no built-in alerting
  • SLAs that require uptime or processing guarantees
  • On-call rotations where the right person needs to be paged — not just an email inbox

The core problem both solve: Rails applications fail silently. A rescue clause that logs an error to Rails.logger does nothing at 2 AM. A Sidekiq deadlock on your billing job won’t send you an email. Without Datadog and PagerDuty:

  • You find out about failures from customers, not dashboards
  • You can’t tell when a metric degraded or how long it’s been broken
  • There’s no escalation path — the alert that fires at 3 AM goes nowhere

With both integrated, you get: visibility (Datadog) + accountability (PagerDuty).


Architecture Overview

Rails App / Cron Job
├──► Datadog Agent (UDP :8125)
│ └──► Datadog Cloud ──► Dashboard / Monitor
└──► PagerDuty Events API (HTTPS)
└──► On-call Engineer ──► Slack / Phone / SMS

The Datadog Agent runs as a daemon on your server or as a sidecar container. Your app sends lightweight UDP packets to it (fire-and-forget). The agent batches and forwards them to Datadog’s cloud.

PagerDuty receives events over HTTPS directly from your app — no local agent needed.


Part 1: Datadog Integration

1.1 Install the Gems

# Gemfile
gem 'ddtrace', '~> 2.0' # APM tracing
gem 'dogstatsd-ruby', '~> 5.0' # Custom metrics via StatsD
bundle install

1.2 Configure the Datadog Initializer

Create config/initializers/datadog.rb:

require 'datadog/statsd'
require 'datadog'
enabled = Rails.application.credentials[Rails.env.to_sym][:datadog_integration_enabled]
service_name = "myapp-#{Rails.env}"
Datadog.configure do |c|
c.tracing.enabled = enabled
c.runtime_metrics.enabled = enabled
c.tracing.instrument :rails, service_name: service_name
c.tracing.instrument :rake, enabled: false # avoid tracing long-running tasks
# Consolidate HTTP client spans under one service name to reduce noise
c.tracing.instrument :faraday, service_name: service_name
c.tracing.instrument :httpclient, service_name: service_name
c.tracing.instrument :http, service_name: service_name
c.tracing.instrument :rest_client, service_name: service_name
end

Store the flag in Rails credentials:

rails credentials:edit --environment production
# config/credentials/production.yml.enc
datadog_integration_enabled: true

Important: The datadog_integration_enabled flag controls APM tracing only. Custom StatsD metrics (gauges, counters) are sent by Datadog::Statsd regardless of this flag — as long as the Datadog Agent is running.

1.3 Install and Configure the Datadog Agent

The Datadog Agent must be running on the host where your app runs. It listens for UDP packets on port 8125 and forwards them to Datadog’s cloud.

Docker Compose (recommended for containerized apps):

# docker-compose.yml
services:
app:
environment:
DD_AGENT_HOST: datadog-agent
DD_DOGSTATSD_PORT: 8125
datadog-agent:
image: datadog/agent:latest
environment:
DD_API_KEY: ${DATADOG_API_KEY}
DD_DOGSTATSD_NON_LOCAL_TRAFFIC: "true"
ports:
- "8125:8125/udp"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /proc/:/host/proc/:ro
- /sys/fs/cgroup/:/host/sys/fs/cgroup:ro

Bare metal / VM:

DD_API_KEY=your_api_key bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"

1.4 Emit Custom Business Metrics

Now the interesting part — emitting metrics from your business logic.

Create a service class for a billing health check at app/lib/monitoring/billing_health_check.rb:

# frozen_string_literal: true
class Monitoring::BillingHealthCheck
UNBILLED_THRESHOLD = ENV.fetch('BILLING_UNBILLED_THRESHOLD', 10).to_i
def initialize(date:)
@date = date
end
def run
results = collect_metrics
fire_datadog_metrics(results)
alert_if_unhealthy(results)
results
end
private
def collect_metrics
billed_ids = BillingRecord.where(date: @date).pluck(:order_id)
missing_order_ids = billed_ids - Order.where(date: @date).ids
unbilled_count = Order.active.where(week: @date, billed: false).count
failed_charges = Order.joins(:bills)
.where(date: @date, billed: false, bills: { success: false })
.distinct
.count
{
missing_order_ids: missing_order_ids,
missing_order_records_count: missing_order_ids.size,
unbilled_orders_count: unbilled_count,
failed_charges_count: failed_charges
}
end
def fire_datadog_metrics(results)
host = ENV.fetch('DD_AGENT_HOST', '127.0.0.1')
port = ENV.fetch('DD_DOGSTATSD_PORT', 8125).to_i
statsd = Datadog::Statsd.new(host, port)
statsd.gauge('billing.unbilled_orders', results[:unbilled_orders_count])
statsd.gauge('billing.missing_billing_records', results[:missing_billing_records_count])
statsd.gauge('billing.failed_charges', results[:failed_charges_count])
statsd.close
rescue => e
Rails.logger.error("Failed to emit Datadog metrics: #{e.message}")
end
# ... alerting covered in Part 2
end

Why Datadog::Statsd.new(host, port) instead of Datadog::Statsd.new?

The no-argument form defaults to 127.0.0.1:8125. In containerized environments, the Datadog Agent runs as a separate container/service with a different hostname. Always read the host from an environment variable so the code works in every environment without changes.

1.5 Choosing the Right Metric Type

TypeMethodUse when
Gaugestatsd.gauge('name', value)Current snapshot value (queue depth, count at a point in time)
Counterstatsd.increment('name')Counting occurrences (requests, errors)
Histogramstatsd.histogram('name', value)Distribution of values (response times, batch sizes)
Timingstatsd.timing('name', ms)Duration in milliseconds

For billing health metrics — unbilled orders, failed charges — gauge is correct because you want the current count, not a running total.

1.6 Debugging: Why Aren’t My Metrics Appearing?

This is the most common issue. Because StatsD uses UDP, failures are completely silent.

Checklist:

# 1. Is the Datadog Agent reachable from your app container/host?
# Run in Rails console:
require 'socket'
UDPSocket.new.send("test:1|g", 0, ENV.fetch('DD_AGENT_HOST', '127.0.0.1'), 8125)
# 2. Send a test gauge and wait 2-3 minutes
statsd = Datadog::Statsd.new(ENV.fetch('DD_AGENT_HOST', '127.0.0.1'), 8125)
statsd.gauge('debug.connectivity_test', 1)
statsd.close
puts "Sent — check Datadog metric/explorer in 2-3 minutes"
# 3. Check if the integration flag is blocking APM (not metrics, but worth knowing)
Rails.application.credentials[Rails.env.to_sym][:datadog_integration_enabled]

Then in the Datadog UI:

  • Go to Metrics → Explorer
  • Type your metric name (e.g., billing.) in the graph field — it should autocomplete
  • If it doesn’t autocomplete after 5 minutes, the agent is not receiving the packets

Common root causes in staging/dev environments:

SymptomLikely cause
No metrics in any envAgent not running or wrong host
Metrics in production onlyDD_AGENT_HOST not set, defaults to 127.0.0.1 but agent is on a different host in staging
Intermittent metricsUDP packet loss (rare, but can happen under high load)

Part 2: PagerDuty Integration

2.1 Install the Gem

# Gemfile
gem 'pagerduty', '~> 3.0'
bundle install

2.2 Create a PagerDuty Service

  1. Log in to PagerDuty → Services → Service Directory → + New Service
  2. Name it (e.g., “Billing Pipeline”)
  3. Under Integrations, select “Use our API directly” → choose Events API v2
  4. Copy the Integration Key — you’ll need this in credentials

2.3 Store Credentials Securely

rails credentials:edit --environment production
# config/credentials/production.yml.enc
pagerduty_billing_integration_key: your_integration_key_here
google_chat_monitoring_webhook: https://chat.googleapis.com/v1/spaces/...

2.4 Create a PagerDuty Wrapper

Create a lightweight wrapper at app/lib/pagerduty/wrapper.rb:

# frozen_string_literal: true
class Pagerduty::Wrapper
def initialize(integration_key:, api_version: 2)
@integration_key = integration_key
@api_version = api_version
end
def client
@client ||= Pagerduty.build(
integration_key: @integration_key,
api_version: @api_version
)
end
end

2.5 Wire Up Alerting in Your Service Class

Continuing the billing health check class:

def alert_if_unhealthy(results)
issues = []
if results[:missing_billing_records_count] > 0
missing_names = results[:missing_regions].map(&:name).join(', ')
issues << "Missing billing records for regions: #{missing_names}"
end
if results[:unbilled_orders_count] > UNBILLED_THRESHOLD
issues << "#{results[:unbilled_orders_count]} unbilled orders (threshold: #{UNBILLED_THRESHOLD})"
end
return if issues.empty?
summary = build_alert_summary(results, issues)
trigger_pagerduty(summary)
send_google_chat_notification(summary)
end
private
def build_alert_summary(results, issues)
[
"Billing Health Check FAILED at #{Time.zone.now.strftime('%Y-%m-%d %H:%M:%S %Z')}",
"Week: #{@billing_week}",
*issues,
"Failed charges: #{results[:failed_charges_count]}"
].join(" | ")
end
def trigger_pagerduty(summary)
dedup_key = "billing-health-#{@billing_week}"
Pagerduty::Wrapper.new(
integration_key: pagerduty_integration_key
).client.incident(dedup_key).trigger(
summary: summary,
source: Rails.application.routes.default_url_options[:host],
severity: "critical"
)
rescue => e
Rails.logger.error("Failed to trigger PagerDuty: #{e.message}")
end
def send_google_chat_notification(message)
# Post to your team's Google Chat / Slack webhook
HTTParty.post(
google_chat_webhook,
body: { text: message }.to_json,
headers: { 'Content-Type' => 'application/json' }
)
rescue => e
Rails.logger.error("Failed to send Google Chat notification: #{e.message}")
end
def pagerduty_integration_key
Rails.application.credentials[Rails.env.to_sym][:pagerduty_billing_integration_key]
end
def google_chat_webhook
Rails.application.credentials[Rails.env.to_sym][:google_chat_monitoring_webhook]
end

2.6 The Dedup Key — Why It Matters

dedup_key = "billing-health-#{@billing_week}"

PagerDuty uses the dedup_key to group events about the same incident. If your billing check runs at 8:30 AM and again at 9:00 AM (e.g., after a retry), PagerDuty will update the existing incident instead of creating a second one and paging your on-call engineer twice.

Best practices for dedup keys:

  • Make them specific to the root cause, not the timestamp
  • Include the resource identifier (week date, job ID, etc.)
  • Use a format like {service}-{resource}-{date} for easy filtering in PagerDuty

Happy Integration!