How to Integrate Datadog and PagerDuty into an Enterprise Rails Application – Part 2

Stack: Ruby 3+, Rails 7+
Audience: Backend engineers building or maintaining production-grade Rails services
Goal: Add real-time observability and on-call alerting to a critical business process

Part 3: Hooking It All Together — Rake Task + Cron

3.1 Rake Task

Create lib/tasks/billing.rake:

			
namespace :billing do
  desc "Run billing health check: emit Datadog metrics and alert if unhealthy"
  task health_check: :environment do
    Monitoring::BillingHealthCheck.new(
      billing_week: BillingWeek.current
    ).run
  end
end

		

Run it manually:

bundle exec rake billing:health_check

3.2 Cron Script

Create scripts/cron/billing_health_check.sh:

			
#!/bin/bash
source /apps/myapp/current/scripts/env.sh
bundle exec rake billing:health_check

Using Healthchecks.io (or similar) to wrap the cron gives you a second layer of alerting: if the cron doesn’t ping within the expected window, you get an alert – even if the app never starts.

3.3 Crontab Entry

			
# Run billing health check every Thursday at 5:30 AM
30 5 * * 4 . /apps/myapp/current/scripts/cron/billing_monitoring.sh

⚠️ Important for managed deployments: If your crontab is version-controlled but not auto-deployed (e.g., Capistrano without cron management), changes to the file in your repo do not automatically update the server. Always verify with crontab -l after deploying.

Part 4: Building the Datadog Dashboard

Once metrics are flowing, set up a dashboard for at-a-glance visibility.

4.1 Create the Dashboard

Datadog → Dashboards → New Dashboard
Name it: “Billing Health Monitor”
Click + Add Widgets

4.2 Add Timeseries Widgets

For each metric, add a Timeseries widget:

Widget title	Metric	Visualization
Unbilled Orders	`billing.unbilled_orders`	Line chart
Missing Billing Records	`billing.missing_billing_records`	Line chart
Failed Charges	`billing.failed_charges`	Line chart

Widget configuration:

Graph: select metric → billing.unbilled_orders
Display as: Line
Timeframe: Set to “Past 1 Week” or “Past 1 Month” after data starts flowing (not “Past 1 Hour” which shows nothing between weekly runs)

4.3 Add Reference Lines (Optional but Useful)

For the unbilled orders widget, add a constant line at your alert threshold:

In the widget editor → Markers → Add marker at y = 10 (your BILLING_UNBILLED_THRESHOLD)
Color it red to make the threshold visually obvious

4.4 Where to Find Your Custom Metrics

Metric Explorer: app.datadoghq.com/metric/explorer — type billing. to autocomplete and graph any metric
Metric Volume: app.datadoghq.com/metric/volume — confirms Datadog has received the metric (appears within 2-5 minutes of first emission)

Part 5: Testing the Integration End-to-End

5.1 Test Datadog Metrics (no alerts, safe in any env)

			
# Rails console
require 'datadog/statsd'
host   = ENV.fetch('DD_AGENT_HOST', '127.0.0.1')
statsd = Datadog::Statsd.new(host, 8125)
statsd.gauge('billing.unbilled_orders',         0)
statsd.gauge('billing.missing_billing_records', 0)
statsd.gauge('billing.failed_charges',          0)
statsd.close
puts "Sent — check /metric/explorer in Datadog in ~2-3 minutes"

		

5.2 Test PagerDuty (staging)

			
# Rails console — staging
# First, verify the key exists:
Rails.application.credentials[:staging][:pagerduty_billing_integration_key].present?
# Then trigger a test incident:
svc = Monitoring::BillingHealthCheck.new(billing_week: BillingWeek.current)
svc.send(:trigger_pagerduty, "TEST: Billing health check — staging validation #{Time.current}")
# Remember to resolve the incident in PagerDuty UI immediately after!

		

5.3 Test PagerDuty (production) — Preferred Method

Use PagerDuty’s built-in test instead of triggering from code:

PagerDuty → Services → Billing Pipeline → Integrations
Find the integration → click “Send Test Event”

This fires through the same pipeline without touching your app or risking a real alert.

5.4 Test PagerDuty (production) — via Rails Console

If you must test via code in production, use a unique dedup key so it doesn’t collide with real billing alerts, and coordinate with your on-call engineer first:

			
svc = Monitoring::BillingHealthCheck.new(billing_week: BillingWeek.current)
Pagerduty::Wrapper.new(
  integration_key: svc.send(:pagerduty_integration_key)
).client.incident("billing-health-test-#{Time.current.to_i}").trigger(
  summary:  "TEST ONLY — please ignore — integration validation",
  source:   "rails-console",
  severity: "critical"
)

		

5.5 Test the Full Service Class (production, after billing has run)

Once billing has completed successfully for the week, all counts will be 0 and no PagerDuty alert will fire:

			
result = Monitoring::BillingHealthCheck.new(billing_week: BillingWeek.current).run
puts result
# => { unbilled_orders_count: 0, missing_billing_records_count: 0, failed_charges_count: 0, ... }

Common Gotchas

1. StatsD is Fire-and-Forget

UDP has no acknowledgment. If the agent isn’t running, your statsd.gauge() calls return normally with no error. Always verify the agent is reachable by checking for your metric in the Datadog UI after sending — don’t rely on exception-free code as proof of delivery.

2. Metric Volume vs Metric Explorer

Metric Volume (/metric/volume): Confirms Datadog received the metric. Good for first-time setup verification.
Metric Explorer (/metric/explorer): Lets you actually graph and analyze the metric over time. This is where you do your monitoring work.

3. Rescue Around Everything

Both emit_datadog_metrics and trigger_pagerduty should have rescue blocks. Your monitoring code must never crash your main business process. The job that failed to alert is better than the job that crashed silently because the alert raised an exception.

			
def emit_datadog_metrics(results)
  # ... emit metrics
rescue => e
  Rails.logger.error("Failed to emit Datadog metrics: #{e.message}")
  # Do NOT re-raise — monitoring failure is never a reason to abort the job
end

		

4. Environment Parity for the Datadog Agent

In production the agent runs as a sidecar or daemon. In local development and staging, it often doesn’t. This is fine — just make sure your code uses ENV.fetch('DD_AGENT_HOST', '127.0.0.1') so the host is configurable per environment, and don’t be alarmed when staging metrics don’t appear in Datadog.

5. PagerDuty Dedup Keys Prevent Double-Paging

If your cron job or health check can run more than once for the same underlying issue (retry logic, manual reruns), always use a stable dedup_key tied to the resource and time period — not a timestamp. A timestamp-based key creates a new PagerDuty incident on every run.

Summary

Concern	Tool	How
Custom business metrics	Datadog StatsD	`Datadog::Statsd#gauge` via local agent (UDP)
APM / request tracing	Datadog ddtrace	`Datadog.configure` initializer
Metric visualization	Datadog Dashboards	Timeseries widgets per metric
Critical alert on failure	PagerDuty Events API v2	`Pagerduty::Wrapper` + dedup key
Secondary notification	Google Chat / Slack webhook	HTTP POST to webhook URL
Scheduled execution	Cron + Rake	Shell script wrapping `bundle exec rake`
Cron liveness monitoring	Healthchecks.io	Ping before/after cron run

Both integrations together give you a complete observability loop: your scheduled jobs run on time, emit metrics to Datadog for trending and analysis, and page the right engineer via PagerDuty the moment something goes wrong — before any customer notices.

Author: Abhilash

Hi, I’m Abhilash! A seasoned web developer with 15 years of experience specializing in Ruby and Ruby on Rails. Since 2010, I’ve built scalable, robust web applications and worked with frameworks like Angular, Sinatra, Laravel, Node.js, Vue and React. Passionate about clean, maintainable code and continuous learning, I share insights, tutorials, and experiences here. Let’s explore the ever-evolving world of web development together! View all posts by Abhilash

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

How to Integrate Datadog and PagerDuty into an Enterprise Rails Application – Part 2

Part 3: Hooking It All Together — Rake Task + Cron

3.1 Rake Task

3.2 Cron Script

3.3 Crontab Entry

Part 4: Building the Datadog Dashboard

4.1 Create the Dashboard

4.2 Add Timeseries Widgets

4.3 Add Reference Lines (Optional but Useful)

4.4 Where to Find Your Custom Metrics

Part 5: Testing the Integration End-to-End

5.1 Test Datadog Metrics (no alerts, safe in any env)

5.2 Test PagerDuty (staging)

5.3 Test PagerDuty (production) — Preferred Method

5.4 Test PagerDuty (production) — via Rails Console

5.5 Test the Full Service Class (production, after billing has run)

Common Gotchas

1. StatsD is Fire-and-Forget

2. Metric Volume vs Metric Explorer

3. Rescue Around Everything

4. Environment Parity for the Datadog Agent

5. PagerDuty Dedup Keys Prevent Double-Paging

Summary

Further Reading

Author: Abhilash

Leave a comment Cancel reply

Part 3: Hooking It All Together — Rake Task + Cron

3.1 Rake Task

3.2 Cron Script

3.3 Crontab Entry

Part 4: Building the Datadog Dashboard

4.1 Create the Dashboard

4.2 Add Timeseries Widgets

4.3 Add Reference Lines (Optional but Useful)

4.4 Where to Find Your Custom Metrics

Part 5: Testing the Integration End-to-End

5.1 Test Datadog Metrics (no alerts, safe in any env)

5.2 Test PagerDuty (staging)

5.3 Test PagerDuty (production) — Preferred Method

5.4 Test PagerDuty (production) — via Rails Console

5.5 Test the Full Service Class (production, after billing has run)

Common Gotchas

1. StatsD is Fire-and-Forget

2. Metric Volume vs Metric Explorer

3. Rescue Around Everything

4. Environment Parity for the Datadog Agent

5. PagerDuty Dedup Keys Prevent Double-Paging

Summary

Further Reading

Share this:

Related

Author: Abhilash

Leave a comment Cancel reply