Stack: Ruby 3+, Rails 7+
Audience: Backend engineers building or maintaining production-grade Rails services
Goal: Add real-time observability and on-call alerting to a critical business process
Introduction
When you’re running an enterprise web application, two questions keep engineering teams up at night:
- “Is our system healthy right now?”
- “If something breaks at 3 AM, will we know before our customers do?”
Datadog and PagerDuty together answer both. Datadog gives you the metrics, dashboards, and visibility. PagerDuty turns critical metrics into actionable alerts that reach the right person at the right time. This post walks you through integrating both into a Rails 7+ application — from gem installation to a live production dashboard — using a real-world billing health monitor as the example.
What is Datadog?
Datadog is a cloud-based observability and monitoring platform. It collects metrics, traces, and logs from your infrastructure and applications and surfaces them in a unified UI.
Core capabilities relevant to Rails apps:
| Feature | What it does |
|---|---|
| APM (Application Performance Monitoring) | Traces every Rails request, shows latency, errors, and bottlenecks |
| StatsD / DogStatsD | Accepts custom business metrics (gauges, counters, histograms) via UDP |
| Dashboards | Visualize any metric over time — single chart or full ops dashboard |
| Monitors & Alerts | Trigger notifications when a metric crosses a threshold |
| Log Management | Centralized log search and correlation with traces |
| Infrastructure Monitoring | CPU, memory, disk — the full host/container picture |
For this guide, we focus on custom business metrics via DogStatsD — the most powerful and underused feature for application teams.
What is PagerDuty?
PagerDuty is an incident management platform. When something breaks in production, PagerDuty decides who gets notified, how (phone call, SMS, push notification, Slack), and when to escalate if the alert isn’t acknowledged.
Key concepts:
| Concept | Description |
|---|---|
| Service | A logical grouping of alerts (e.g., “Billing Service”) |
| Integration Key | The secret key your app uses to send events to a PagerDuty service |
| Incident | A triggered alert that requires human acknowledgment |
| Dedup Key | A unique string that prevents duplicate incidents for the same root cause |
| Escalation Policy | Defines who gets paged and in what order if the incident isn’t acknowledged |
| Severity | critical, error, warning, or info |
PagerDuty integrates with Datadog (you can alert from DD monitors), but for critical business logic alerts — like a billing pipeline failing — it’s often better to trigger PagerDuty directly from your application code, giving you full control over deduplication and context.
Why These Are Must-Have Integrations for Enterprise Apps
If you’re running any of the following, you need both:
- Scheduled jobs / cron tasks that process money, orders, or user data
- Background workers (Sidekiq, Delayed Job) that can silently fail
- Third-party payment or fulfillment pipelines with no built-in alerting
- SLAs that require uptime or processing guarantees
- On-call rotations where the right person needs to be paged — not just an email inbox
The core problem both solve: Rails applications fail silently. A rescue clause that logs an error to Rails.logger does nothing at 2 AM. A Sidekiq deadlock on your billing job won’t send you an email. Without Datadog and PagerDuty:
- You find out about failures from customers, not dashboards
- You can’t tell when a metric degraded or how long it’s been broken
- There’s no escalation path — the alert that fires at 3 AM goes nowhere
With both integrated, you get: visibility (Datadog) + accountability (PagerDuty).
Architecture Overview
Rails App / Cron Job │ ├──► Datadog Agent (UDP :8125) │ └──► Datadog Cloud ──► Dashboard / Monitor │ └──► PagerDuty Events API (HTTPS) └──► On-call Engineer ──► Slack / Phone / SMS
The Datadog Agent runs as a daemon on your server or as a sidecar container. Your app sends lightweight UDP packets to it (fire-and-forget). The agent batches and forwards them to Datadog’s cloud.
PagerDuty receives events over HTTPS directly from your app — no local agent needed.
Part 1: Datadog Integration
1.1 Install the Gems
# Gemfilegem 'ddtrace', '~> 2.0' # APM tracinggem 'dogstatsd-ruby', '~> 5.0' # Custom metrics via StatsD
bundle install
1.2 Configure the Datadog Initializer
Create config/initializers/datadog.rb:
require 'datadog/statsd'require 'datadog'enabled = Rails.application.credentials[Rails.env.to_sym][:datadog_integration_enabled]service_name = "myapp-#{Rails.env}"Datadog.configure do |c| c.tracing.enabled = enabled c.runtime_metrics.enabled = enabled c.tracing.instrument :rails, service_name: service_name c.tracing.instrument :rake, enabled: false # avoid tracing long-running tasks # Consolidate HTTP client spans under one service name to reduce noise c.tracing.instrument :faraday, service_name: service_name c.tracing.instrument :httpclient, service_name: service_name c.tracing.instrument :http, service_name: service_name c.tracing.instrument :rest_client, service_name: service_nameend
Store the flag in Rails credentials:
rails credentials:edit --environment production
# config/credentials/production.yml.encdatadog_integration_enabled: true
Important: The
datadog_integration_enabledflag controls APM tracing only. Custom StatsD metrics (gauges, counters) are sent byDatadog::Statsdregardless of this flag — as long as the Datadog Agent is running.
1.3 Install and Configure the Datadog Agent
The Datadog Agent must be running on the host where your app runs. It listens for UDP packets on port 8125 and forwards them to Datadog’s cloud.
Docker Compose (recommended for containerized apps):
# docker-compose.ymlservices: app: environment: DD_AGENT_HOST: datadog-agent DD_DOGSTATSD_PORT: 8125 datadog-agent: image: datadog/agent:latest environment: DD_API_KEY: ${DATADOG_API_KEY} DD_DOGSTATSD_NON_LOCAL_TRAFFIC: "true" ports: - "8125:8125/udp" volumes: - /var/run/docker.sock:/var/run/docker.sock:ro - /proc/:/host/proc/:ro - /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
Bare metal / VM:
DD_API_KEY=your_api_key bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"
1.4 Emit Custom Business Metrics
Now the interesting part — emitting metrics from your business logic.
Create a service class for a billing health check at app/lib/monitoring/billing_health_check.rb:
# frozen_string_literal: trueclass Monitoring::BillingHealthCheck UNBILLED_THRESHOLD = ENV.fetch('BILLING_UNBILLED_THRESHOLD', 10).to_i def initialize(date:) @date = date end def run results = collect_metrics fire_datadog_metrics(results) alert_if_unhealthy(results) results end private def collect_metrics billed_ids = BillingRecord.where(date: @date).pluck(:order_id) missing_order_ids = billed_ids - Order.where(date: @date).ids unbilled_count = Order.active.where(week: @date, billed: false).count failed_charges = Order.joins(:bills) .where(date: @date, billed: false, bills: { success: false }) .distinct .count { missing_order_ids: missing_order_ids, missing_order_records_count: missing_order_ids.size, unbilled_orders_count: unbilled_count, failed_charges_count: failed_charges } end def fire_datadog_metrics(results) host = ENV.fetch('DD_AGENT_HOST', '127.0.0.1') port = ENV.fetch('DD_DOGSTATSD_PORT', 8125).to_i statsd = Datadog::Statsd.new(host, port) statsd.gauge('billing.unbilled_orders', results[:unbilled_orders_count]) statsd.gauge('billing.missing_billing_records', results[:missing_billing_records_count]) statsd.gauge('billing.failed_charges', results[:failed_charges_count]) statsd.close rescue => e Rails.logger.error("Failed to emit Datadog metrics: #{e.message}") end # ... alerting covered in Part 2end
Why Datadog::Statsd.new(host, port) instead of Datadog::Statsd.new?
The no-argument form defaults to 127.0.0.1:8125. In containerized environments, the Datadog Agent runs as a separate container/service with a different hostname. Always read the host from an environment variable so the code works in every environment without changes.
1.5 Choosing the Right Metric Type
| Type | Method | Use when |
|---|---|---|
| Gauge | statsd.gauge('name', value) | Current snapshot value (queue depth, count at a point in time) |
| Counter | statsd.increment('name') | Counting occurrences (requests, errors) |
| Histogram | statsd.histogram('name', value) | Distribution of values (response times, batch sizes) |
| Timing | statsd.timing('name', ms) | Duration in milliseconds |
For billing health metrics — unbilled orders, failed charges — gauge is correct because you want the current count, not a running total.
1.6 Debugging: Why Aren’t My Metrics Appearing?
This is the most common issue. Because StatsD uses UDP, failures are completely silent.
Checklist:
# 1. Is the Datadog Agent reachable from your app container/host?# Run in Rails console:require 'socket'UDPSocket.new.send("test:1|g", 0, ENV.fetch('DD_AGENT_HOST', '127.0.0.1'), 8125)# 2. Send a test gauge and wait 2-3 minutesstatsd = Datadog::Statsd.new(ENV.fetch('DD_AGENT_HOST', '127.0.0.1'), 8125)statsd.gauge('debug.connectivity_test', 1)statsd.closeputs "Sent — check Datadog metric/explorer in 2-3 minutes"# 3. Check if the integration flag is blocking APM (not metrics, but worth knowing)Rails.application.credentials[Rails.env.to_sym][:datadog_integration_enabled]
Then in the Datadog UI:
- Go to Metrics → Explorer
- Type your metric name (e.g.,
billing.) in the graph field — it should autocomplete - If it doesn’t autocomplete after 5 minutes, the agent is not receiving the packets
Common root causes in staging/dev environments:
| Symptom | Likely cause |
|---|---|
| No metrics in any env | Agent not running or wrong host |
| Metrics in production only | DD_AGENT_HOST not set, defaults to 127.0.0.1 but agent is on a different host in staging |
| Intermittent metrics | UDP packet loss (rare, but can happen under high load) |
Part 2: PagerDuty Integration
2.1 Install the Gem
# Gemfilegem 'pagerduty', '~> 3.0'
bundle install
2.2 Create a PagerDuty Service
- Log in to PagerDuty → Services → Service Directory → + New Service
- Name it (e.g., “Billing Pipeline”)
- Under Integrations, select “Use our API directly” → choose Events API v2
- Copy the Integration Key — you’ll need this in credentials
2.3 Store Credentials Securely
rails credentials:edit --environment production
# config/credentials/production.yml.encpagerduty_billing_integration_key: your_integration_key_heregoogle_chat_monitoring_webhook: https://chat.googleapis.com/v1/spaces/...
2.4 Create a PagerDuty Wrapper
Create a lightweight wrapper at app/lib/pagerduty/wrapper.rb:
# frozen_string_literal: trueclass Pagerduty::Wrapper def initialize(integration_key:, api_version: 2) @integration_key = integration_key @api_version = api_version end def client @client ||= Pagerduty.build( integration_key: @integration_key, api_version: @api_version ) endend
2.5 Wire Up Alerting in Your Service Class
Continuing the billing health check class:
def alert_if_unhealthy(results) issues = [] if results[:missing_billing_records_count] > 0 missing_names = results[:missing_regions].map(&:name).join(', ') issues << "Missing billing records for regions: #{missing_names}" end if results[:unbilled_orders_count] > UNBILLED_THRESHOLD issues << "#{results[:unbilled_orders_count]} unbilled orders (threshold: #{UNBILLED_THRESHOLD})" end return if issues.empty? summary = build_alert_summary(results, issues) trigger_pagerduty(summary) send_google_chat_notification(summary)endprivate def build_alert_summary(results, issues) [ "Billing Health Check FAILED at #{Time.zone.now.strftime('%Y-%m-%d %H:%M:%S %Z')}", "Week: #{@billing_week}", *issues, "Failed charges: #{results[:failed_charges_count]}" ].join(" | ") end def trigger_pagerduty(summary) dedup_key = "billing-health-#{@billing_week}" Pagerduty::Wrapper.new( integration_key: pagerduty_integration_key ).client.incident(dedup_key).trigger( summary: summary, source: Rails.application.routes.default_url_options[:host], severity: "critical" ) rescue => e Rails.logger.error("Failed to trigger PagerDuty: #{e.message}") end def send_google_chat_notification(message) # Post to your team's Google Chat / Slack webhook HTTParty.post( google_chat_webhook, body: { text: message }.to_json, headers: { 'Content-Type' => 'application/json' } ) rescue => e Rails.logger.error("Failed to send Google Chat notification: #{e.message}") end def pagerduty_integration_key Rails.application.credentials[Rails.env.to_sym][:pagerduty_billing_integration_key] end def google_chat_webhook Rails.application.credentials[Rails.env.to_sym][:google_chat_monitoring_webhook] end
2.6 The Dedup Key — Why It Matters
dedup_key = "billing-health-#{@billing_week}"
PagerDuty uses the dedup_key to group events about the same incident. If your billing check runs at 8:30 AM and again at 9:00 AM (e.g., after a retry), PagerDuty will update the existing incident instead of creating a second one and paging your on-call engineer twice.
Best practices for dedup keys:
- Make them specific to the root cause, not the timestamp
- Include the resource identifier (week date, job ID, etc.)
- Use a format like
{service}-{resource}-{date}for easy filtering in PagerDuty
Happy Integration!