How to Integrate Datadog and PagerDuty into an Enterprise Rails Application – Part 2

Stack: Ruby 3+, Rails 7+
Audience: Backend engineers building or maintaining production-grade Rails services
Goal: Add real-time observability and on-call alerting to a critical business process

Part 3: Hooking It All Together — Rake Task + Cron

3.1 Rake Task

Create lib/tasks/billing.rake:

namespace :billing do
desc "Run billing health check: emit Datadog metrics and alert if unhealthy"
task health_check: :environment do
Monitoring::BillingHealthCheck.new(
billing_week: BillingWeek.current
).run
end
end

Run it manually:

bundle exec rake billing:health_check

3.2 Cron Script

Create scripts/cron/billing_health_check.sh:

#!/bin/bash
source /apps/myapp/current/scripts/env.sh
bundle exec rake billing:health_check

Using Healthchecks.io (or similar) to wrap the cron gives you a second layer of alerting: if the cron doesn’t ping within the expected window, you get an alert – even if the app never starts.

3.3 Crontab Entry

# Run billing health check every Thursday at 5:30 AM
30 5 * * 4 . /apps/myapp/current/scripts/cron/billing_monitoring.sh

⚠️ Important for managed deployments: If your crontab is version-controlled but not auto-deployed (e.g., Capistrano without cron management), changes to the file in your repo do not automatically update the server. Always verify with crontab -l after deploying.


Part 4: Building the Datadog Dashboard

Once metrics are flowing, set up a dashboard for at-a-glance visibility.

4.1 Create the Dashboard

  1. Datadog → Dashboards → New Dashboard
  2. Name it: “Billing Health Monitor”
  3. Click + Add Widgets

4.2 Add Timeseries Widgets

For each metric, add a Timeseries widget:

Widget titleMetricVisualization
Unbilled Ordersbilling.unbilled_ordersLine chart
Missing Billing Recordsbilling.missing_billing_recordsLine chart
Failed Chargesbilling.failed_chargesLine chart

Widget configuration:

  • Graph: select metric → billing.unbilled_orders
  • Display as: Line
  • Timeframe: Set to “Past 1 Week” or “Past 1 Month” after data starts flowing (not “Past 1 Hour” which shows nothing between weekly runs)

4.3 Add Reference Lines (Optional but Useful)

For the unbilled orders widget, add a constant line at your alert threshold:

  • In the widget editor → Markers → Add marker at y = 10 (your BILLING_UNBILLED_THRESHOLD)
  • Color it red to make the threshold visually obvious

4.4 Where to Find Your Custom Metrics


Part 5: Testing the Integration End-to-End

5.1 Test Datadog Metrics (no alerts, safe in any env)

# Rails console
require 'datadog/statsd'
host = ENV.fetch('DD_AGENT_HOST', '127.0.0.1')
statsd = Datadog::Statsd.new(host, 8125)
statsd.gauge('billing.unbilled_orders', 0)
statsd.gauge('billing.missing_billing_records', 0)
statsd.gauge('billing.failed_charges', 0)
statsd.close
puts "Sent — check /metric/explorer in Datadog in ~2-3 minutes"

5.2 Test PagerDuty (staging)

# Rails console — staging
# First, verify the key exists:
Rails.application.credentials[:staging][:pagerduty_billing_integration_key].present?
# Then trigger a test incident:
svc = Monitoring::BillingHealthCheck.new(billing_week: BillingWeek.current)
svc.send(:trigger_pagerduty, "TEST: Billing health check — staging validation #{Time.current}")
# Remember to resolve the incident in PagerDuty UI immediately after!

5.3 Test PagerDuty (production) — Preferred Method

Use PagerDuty’s built-in test instead of triggering from code:

  1. PagerDuty → Services → Billing Pipeline → Integrations
  2. Find the integration → click “Send Test Event”

This fires through the same pipeline without touching your app or risking a real alert.

5.4 Test PagerDuty (production) — via Rails Console

If you must test via code in production, use a unique dedup key so it doesn’t collide with real billing alerts, and coordinate with your on-call engineer first:

svc = Monitoring::BillingHealthCheck.new(billing_week: BillingWeek.current)
Pagerduty::Wrapper.new(
integration_key: svc.send(:pagerduty_integration_key)
).client.incident("billing-health-test-#{Time.current.to_i}").trigger(
summary: "TEST ONLY — please ignore — integration validation",
source: "rails-console",
severity: "critical"
)

5.5 Test the Full Service Class (production, after billing has run)

Once billing has completed successfully for the week, all counts will be 0 and no PagerDuty alert will fire:

result = Monitoring::BillingHealthCheck.new(billing_week: BillingWeek.current).run
puts result
# => { unbilled_orders_count: 0, missing_billing_records_count: 0, failed_charges_count: 0, ... }

Common Gotchas

1. StatsD is Fire-and-Forget

UDP has no acknowledgment. If the agent isn’t running, your statsd.gauge() calls return normally with no error. Always verify the agent is reachable by checking for your metric in the Datadog UI after sending — don’t rely on exception-free code as proof of delivery.

2. Metric Volume vs Metric Explorer

  • Metric Volume (/metric/volume): Confirms Datadog received the metric. Good for first-time setup verification.
  • Metric Explorer (/metric/explorer): Lets you actually graph and analyze the metric over time. This is where you do your monitoring work.

3. Rescue Around Everything

Both emit_datadog_metrics and trigger_pagerduty should have rescue blocks. Your monitoring code must never crash your main business process. The job that failed to alert is better than the job that crashed silently because the alert raised an exception.

def emit_datadog_metrics(results)
# ... emit metrics
rescue => e
Rails.logger.error("Failed to emit Datadog metrics: #{e.message}")
# Do NOT re-raise — monitoring failure is never a reason to abort the job
end

4. Environment Parity for the Datadog Agent

In production the agent runs as a sidecar or daemon. In local development and staging, it often doesn’t. This is fine — just make sure your code uses ENV.fetch('DD_AGENT_HOST', '127.0.0.1') so the host is configurable per environment, and don’t be alarmed when staging metrics don’t appear in Datadog.

5. PagerDuty Dedup Keys Prevent Double-Paging

If your cron job or health check can run more than once for the same underlying issue (retry logic, manual reruns), always use a stable dedup_key tied to the resource and time period — not a timestamp. A timestamp-based key creates a new PagerDuty incident on every run.


Summary

ConcernToolHow
Custom business metricsDatadog StatsDDatadog::Statsd#gauge via local agent (UDP)
APM / request tracingDatadog ddtraceDatadog.configure initializer
Metric visualizationDatadog DashboardsTimeseries widgets per metric
Critical alert on failurePagerDuty Events API v2Pagerduty::Wrapper + dedup key
Secondary notificationGoogle Chat / Slack webhookHTTP POST to webhook URL
Scheduled executionCron + RakeShell script wrapping bundle exec rake
Cron liveness monitoringHealthchecks.ioPing before/after cron run

Both integrations together give you a complete observability loop: your scheduled jobs run on time, emit metrics to Datadog for trending and analysis, and page the right engineer via PagerDuty the moment something goes wrong — before any customer notices.


Further Reading

Happy Integration!

Unknown's avatar

Author: Abhilash

Hi, I’m Abhilash! A seasoned web developer with 15 years of experience specializing in Ruby and Ruby on Rails. Since 2010, I’ve built scalable, robust web applications and worked with frameworks like Angular, Sinatra, Laravel, Node.js, Vue and React. Passionate about clean, maintainable code and continuous learning, I share insights, tutorials, and experiences here. Let’s explore the ever-evolving world of web development together!

Leave a comment