profiling – The Rails Drop

Classic Performance Debugging Problems in Rails Apps 🔬 — Part 3: Advanced Techniques: Query Plans, Indexing, Profiling & Production Diagnostics

🧭 Overview — what we’ll cover

How to read and act on EXPLAIN ANALYZE output (Postgres) — with exact commands and examples.
Index strategy: b-tree, composite, INCLUDE, covering indexes, partials, GIN/GIN_TRGM where relevant.
Practical before/after for the Flipper join query.
Database-level tooling: pg_stat_statements, slow query logging, ANALYZE, vacuum, stats targets.
Advanced Rails-side profiling: CPU sampling (rbspy), Ruby-level profilers (stackprof, ruby-prof), flamegraphs, allocation profiling.
Memory profiling & leak hunting: derailed_benchmarks, memory_profiler, allocation tracing.
Production-safe profiling and APMs: Skylight, New Relic, Datadog, and guidelines for low-risk sampling.
Other advanced optimizations: connection pool sizing, backgrounding heavy work, keyset pagination, materialized views, denormalization, and caching patterns.
A checklist & playbook you can run when a high-traffic route is slow.

1) Deep dive: EXPLAIN ANALYZE (Postgres)

Why use it

`EXPLAIN` shows the planner’s chosen plan. `EXPLAIN ANALYZE` runs the query and shows *actual* times and row counts. This is the single most powerful tool to understand why a query is slow. <h3>Run it from psql</h3>

sql EXPLAIN ANALYZE SELECT flipper_features.key AS feature_key, flipper_gates.key, flipper_gates.value FROM flipper_features LEFT OUTER JOIN flipper_gates ON flipper_features.key = flipper_gates.feature_key;

Or add verbosity, buffers and JSON output:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT ...;

Then pipe JSON to jq for readability:

psql -c "EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ..." | jq .

Run it from Rails console

res = ActiveRecord::Base.connection.execute(<<~SQL) EXPLAIN ANALYZE SELECT ... SQL puts res.values.flatten.join("\n")

`res.values.flatten` will give the lines of the textual plan.

How to read the plan (key fields)

A typical node line: `Nested Loop (cost=0.00..123.45 rows=100 width=48) (actual time=0.123..5.678 rows=100 loops=1) ` – **Plan node**: e.g., Seq Scan, Index Scan, Nested Loop, Hash Join, Merge Join. – **cost=** planner estimates (startup..total). Not actual time. – **actual time=** real measured times: start..end. The end value for the top node is total time. – **rows=** estimated rows; **actual rows** follow in `actual time` block. If estimates are very different from actuals → bad statistics or wrong assumptions. – **loops=** how many times the node ran (outer loop counts). Multiply loops × actual time to know total work. – **Buffers** (if `BUFFERS` requested) show disk vs shared buffer I/O — important for I/O-bound queries. <h3>Interpretation checklist</h3> – Is Postgres doing a `Seq Scan` on a table that should use an index? → candidate for index. – Are `actual rows` much larger than `estimated rows`? → statistics outdated (`ANALYZE`) or stats target insufficient. – Is the planner using `Nested Loop` with a large inner table and many outer loops? → might need a different join strategy or indexes to support index scans, or to rewrite query. – High `buffers` read from disk → cold cache or I/O pressure. Consider tuning or adding indexes to reduce full scans, or faster disks/IO.

2) Indexing strategies — practical rules

B-tree indexes (default)

– Good for equality (`=`) and range (`<`, `>`) queries and joins on scalar columns. – Add a single-column index when you join on that column often.

Migration example:

class AddIndexToFlipperGatesFeatureKey < ActiveRecord::Migration[7.0]
  def change
    add_index :flipper_gates, :feature_key, name: 'index_flipper_gates_on_feature_key'
  end
end

Composite index

– Useful when WHERE or JOIN uses multiple columns together in order. – The left-most prefix rule: index `(a,b,c)` supports lookups on `a`, `a,b`, `a,b,c` — not `b` alone. <h3>`INCLUDE` for covering indexes (Postgres)</h3> – Use `INCLUDE` to add non-key columns to the index payload so the planner can do an index-only scan.

`add_index :orders, [:user_id, :created_at], include: [:total_amount] ` This avoids heap lookup for those included columns. <h3>Partial indexes</h3> – Index only a subset of rows where conditions often match:

add_index :users, :email, unique: true, where: "email IS NOT NULL"

GIN / GIST indexes

– For full-text search or array/JSONB: use GIN (or trigram GIN for `ILIKE` fuzzy matches).

– Example: `CREATE INDEX ON table USING GIN (jsonb_col);`

Index maintenance

– Run `ANALYZE` after large data load to keep statistics fresh. – Consider `REINDEX` if index bloat occurs. – Use `pg_stat_user_indexes` to check index usage.

<h2>3) Example: Flipper join query — BEFORE & AFTER</h2> <h3>Problem query (recap)</h3

“`sql SELECT flipper_features.key AS feature_key, flipper_gates.key, flipper_gates.value FROM flipper_features LEFT OUTER JOIN flipper_gates ON flipper_features.key = flipper_gates.feature_key; “`

This was running repeatedly and slow (60–200ms) in many requests. <h3>Diagnosis</h3>

– The `flipper_gates` table had a composite index `(feature_key, key, value)`. Because your join only used `feature_key`, Postgres sometimes didn’t pick the composite index effectively, or the planner preferred seq scan due to small table size or outdated stats. – Repetition (many calls to `Flipper.enabled?`) magnified cost.

<h3>Fix 1 — Add a direct index on `feature_key`</h3>

Migration: “`ruby class AddIndexFlipperGatesOnFeatureKey < ActiveRecord::Migration[7.0] def change add_index :flipper_gates, :feature_key, name: ‘index_flipper_gates_on_feature_key’ end end “`

<h3>Fix 2 — Optionally make it a covering index (if you select `key, value` often)</h3>

“`ruby add_index :flipper_gates, :feature_key, name: ‘index_flipper_gates_on_feature_key_include’, using: :btree, include: [:key, :value] “` This lets Postgres perform an index-only scan without touching the heap for `key` and `value`.

<h3>EXPLAIN ANALYZE before vs after (expected)</h3

BEFORE (hypothetical):

Nested Loop
  -> Seq Scan on flipper_features (cost=...)
  -> Seq Scan on flipper_gates (cost=...)  <-- heavy
Actual Total Time: 120ms

AFTER:

Nested Loop
  -> Seq Scan on flipper_features (small)
  -> Index Scan using index_flipper_gates_on_feature_key on flipper_gates (cost=... actual time=0.2ms)
Actual Total Time: 1.5ms

Add EXPLAIN ANALYZE to your pipeline and confirm the plan uses Index Scan rather than Seq Scan.

<h3>Important note</h3>

On tiny tables, sometimes Postgres still chooses Seq Scan (cheap), but when repeated or run many times per request, even small scans add up. Index ensures stable, predictable behaviour when usage grows.

<h2>4) Database-level tools & monitoring</h2>

<h3>`pg_stat_statements` (must be enabled)</h3>

Aggregate query statistics (calls, total time). Great to find heavy queries across the whole DB. Query example: “`sql SELECT query, calls, total_time, mean_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20; “` This points to the most expensive queries over time (not just single slow execution).

<h3>Slow query logging</h3>

Enable `log_min_duration_statement` in `postgresql.conf` (e.g., 200ms) to log slow queries. Then analyze logs with `pgbadger` or `pg_activity`.

<h3>`ANALYZE`, `VACUUM`</h3>

`ANALYZE` updates table statistics — helps the planner choose better plans. Run after bulk loads. – `VACUUM` frees up space and maintains visibility map; `VACUUM FULL` locks table — use carefully.

<h3>Lock and activity checks</h3>

See long-running queries and blocking:

“`sql SELECT pid, query, state, age(now(), query_start) AS runtime FROM pg_stat_activity WHERE state <> ‘idle’ AND now() – query_start > interval ‘5 seconds’; “`

<h2>5) Ruby / Rails advanced profiling</h2>

You already use rack-mini-profiler. For CPU & allocation deep dives, combine sampling profilers and Ruby-level profilers.

<h3>Sampling profilers (production-safe-ish)</h3>

rbspy (native sampling for Ruby processes) — low overhead, no code changes:

rbspy record --pid <PID> -- ruby bin/rails server
rbspy flamegraph --output flame.svg

rbspy collects native stack samples and generates a flamegraph. Good for CPU hotspots in production.

rbspy notes

Does not modify code; low overhead.
Requires installing rbspy on the host.

<h3>stackprof + flamegraph (Ruby-level)</h3>

Add to Gemfile (in safe envs):

gem 'stackprof'
gem 'flamegraph'

Run a block you want to profile:

require 'stackprof'

StackProf.run(mode: :wall, out: 'tmp/stackprof.dump', raw: true) do
  # run code you want to profile (a request, a job, etc)
end

# to read
stackprof tmp/stackprof.dump --text
# or generate flamegraph with stackprof or use flamegraph gem:
require 'flamegraph'
Flamegraph.generate('tmp/fg.svg') { your_code_here }

<h3>ruby-prof (detailed callgraphs)</h3>

Much higher overhead; generates call-graphs. Use in QA or staging, not production.

“`ruby require ‘ruby-prof’ RubyProf.start # run code result = RubyProf.stop printer = RubyProf::GraphHtmlPrinter.new(result) printer.print(File.open(“tmp/ruby_prof.html”, “w”), {}) “`

<h3>Allocation profiling</h3>

Use `derailed_benchmarks` gem for bundle and memory allocations:

“`bash bundle exec derailed bundle:mem bundle exec derailed exec perf:objects # or memory “` – `memory_profiler` gem gives detailed allocations:

“`ruby require ‘memory_profiler’ report = MemoryProfiler.report { run_code } report.pretty_print(to_file: ‘tmp/memory_report.txt’) “`

<h3>Flamegraphs for request lifecycles</h3>

You can capture a request lifecycle and render a flamegraph using stackprof or rbspy, then open SVG.

<h2>6) Memory & leak investigations</h2>

<h3>Symptoms</h3>

Memory grows over time in production processes. – Frequent GC pauses. – OOM kills.

<h3>Tools</h3> – `derailed_benchmarks` (hotspots and gem bloat). – `memory_profiler` for allocation snapshots (see above). – `objspace` built-in inspector (`ObjectSpace.each_object(Class)` helps count objects). – Heap dumps with `rbtrace` or `memory_profiler` for object graphs. <h3>Common causes & fixes</h3> – Caching big objects in-process (use Redis instead). – Retaining references in global arrays or singletons. – Large temporary arrays in request lifecycle — memoize or stream responses. <h3>Example patterns to avoid</h3> – Avoid storing large AR model sets in global constants. – Use `find_each` to iterate large result sets. – Use streaming responses for very large JSON/XML.

<h2>7) Production profiling — safe practices & APMs</h2> <h3>APMs</h3> – **Skylight / NewRelic / Datadog / Scout** — they give per-endpoint timings, slow traces, and SQL breakdowns in production with low overhead. Use them to find hotspots without heavy manual profiling. <h3>Sampling vs continuous profiling</h3> – Use *sampling* profilers (rbspy, production profilers) in short windows to avoid high overhead. – Continuous APM tracing (like New Relic) integrates naturally and is production-friendly. <h3>Instrument carefully</h3> – Only enable heavy profiling when you have a plan; capture for short durations. – Prefer off-peak hours or blue/green deployments to avoid affecting users.

<h2>8) Other advanced DB & Rails optimizations</h2> <h3>Connection pool tuning</h3> – Puma workers & threads must match DB pool size. Example `database.yml`: “`yaml production: pool: <%= ENV.fetch(“DB_POOL”, 5) %> “` – If Puma threads > DB pool, requests will block waiting for DB connection — can appear as slow requests. <h3>Background jobs</h3> – Anything non-critical to request latency (e.g., sending emails, analytics, resizing images) should be moved to background jobs (Sidekiq, ActiveJob). – Synchronous mailers or external API calls are common causes of slow requests. <h3>Keyset pagination (avoid OFFSET)</h3> – For large result sets use keyset pagination: “`sql SELECT * FROM posts WHERE (created_at, id) < (?, ?) ORDER BY created_at DESC, id DESC LIMIT 20 “` This is far faster than `OFFSET` for deep pages. <h3>Materialized views for heavy aggregations</h3> – Pre-compute heavy joins/aggregates into materialized views and refresh periodically or via triggers. <h3>Denormalization & caching</h3> – Counter caches: store counts in a column and update via callbacks to avoid COUNT(*) queries. – Cache pre-rendered fragments or computed JSON blobs for heavy pages (with care about invalidation).

<h2>9) Serialization & JSON performance</h2> <h3>Problems</h3> – Serializing huge AR objects or many associations can be expensive. <h3>Solutions</h3> – Use serializers that only include necessary fields: `fast_jsonapi` (jsonapi-serializer) or `JBuilder` with simple `as_json(only: …)`. – Return minimal payloads and paginate. – Use `pluck` when you only need a few columns.

<h2>10) Playbook: step-by-step when a route is slow (quick reference)</h2>

Reproduce the slow request locally or in staging if possible.
Tail the logs (tail -f log/production.log) and check SQL statements and controller timings.
Run EXPLAIN (ANALYZE, BUFFERS) for suspect queries.
If Seq Scan appears where you expect an index, add or adjust indexes. Run ANALYZE.
Check for N+1 queries with Bullet or rack-mini-profiler and fix with includes.
If many repeated small DB queries (Flipper-like), add caching (Redis or adapter-specific cache) or preloading once per request.
If CPU-bound, collect a sampling profile (rbspy) for 30–60s and generate a flamegraph — find hot Ruby methods. Use stackprof for deeper dive.
If memory-bound, run memory_profiler or derailed, find object retainers.
If urgent and unknown, turn on APM traces for a short window to capture slow traces in production.
After changes, run load test (k6, wrk) if at scale, and monitor pg_stat_statements to confirm improvement.

<h2>11) Example commands and snippets (cheat-sheet)</h2>

EXPLAIN ANALYZE psql

psql -d mydb -c "EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ...;" | jq .

EXPLAIN from Rails console

res = ActiveRecord::Base.connection.execute("EXPLAIN ANALYZE SELECT ...")
puts res.values.flatten.join("\n")

Add index migration

class AddIndexFlipperGatesOnFeatureKey < ActiveRecord::Migration[7.0]
  def change
    add_index :flipper_gates, :feature_key, name: 'index_flipper_gates_on_feature_key'
  end
end

ANALYZE

ANALYZE flipper_gates;
ANALYZE flipper_features;

pg_stat_statements

SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC LIMIT 10;

rbspy recording (system-level sampling)

rbspy record --pid <PID> --output /tmp/rbspy.dump
rbspy flamegraph --input /tmp/rbspy.dump --output /tmp/flame.svg

stackprof example

StackProf.run(mode: :cpu, out: 'tmp/stackprof.dump') do
  # exercise the app code, e.g. issue a request
end
stackprof tmp/stackprof.dump --text

memory_profiler

report = MemoryProfiler.report { run_heavy_code }
report.pretty_print
report.write_html('/tmp/memory_report.html') if report.respond_to?(:write_html)

Check DB connections

SELECT count(*) FROM pg_stat_activity WHERE state='active';

<h2>12) Final notes & best practices</h2>

Measure before you change. Always capture a baseline (EXPLAIN ANALYZE, APM traces).
Small fixes add up — one index + reducing a few N+1s often reduces response times dramatically.
Cache with thought — caching solves many repeated-read problems (Flipper example), but invalidation must be correct. Use namespaced keys and hook invalidation to mutations.
Use the right tool for the job: sampling profilers for CPU hotspots; EXPLAIN ANALYZE for DB; memory tools for leaks.
Automate monitoring — pg_stat_statements, APMs, dashboards, and alerts let you spot regressions early.

If you want, I’ll now:

Generate a ready-to-run EXPLAIN ANALYZE before/after script for your Flipper query and the index migration, or
Provide the exact commands to run in staging to capture a stackprof or rbspy flamegraph for a slow request (and a sample SVG), or
Draft a one-page playbook you can paste in a team wiki for on-call performance steps.

Which of those would you like me to produce next?

Profiling 📊 Ruby on Rails 8 Applications: Essential Tools and Techniques

Introduction

Performance optimization is critical for delivering fast, responsive Rails applications. This comprehensive guide covers the most important profiling tools you should implement in your Rails 8 application, complete with setup instructions and practical examples.

Why Profiling Matters

Before diving into tools, let’s understand why profiling is essential:

Identify bottlenecks: Pinpoint exactly which parts of your application are slowing things down
Optimize resource usage: Reduce memory consumption and CPU usage
Improve user experience: Faster response times lead to happier users
Reduce infrastructure costs: Efficient applications require fewer server resources

Essential Profiling Tools for Rails 8

1. Rack MiniProfiler

What it does: Provides real-time profiling of your application’s performance directly in your browser.

Why it’s important: It’s the quickest way to see performance metrics without leaving your development environment.

Installation:

# Gemfile
gem 'rack-mini-profiler', group: :development

Usage example:
After installation, it automatically appears in your browser’s corner showing:

SQL query times
Ruby execution time
Memory allocation
Flamegraphs (with additional setup)

Advantages:

No configuration needed for basic setup
Shows N+1 query warnings
Integrates with Rails out of the box

GitHub: https://github.com/MiniProfiler/rack-mini-profiler
Documentation: https://miniprofiler.com/

2. Bullet

What it does: Detects N+1 queries, unused eager loading, and missing counter caches.

Why it’s important: N+1 queries are among the most common performance issues in Rails applications.

Installation:

# Gemfile
gem 'bullet', group: :development

Configuration:

# config/environments/development.rb
config.after_initialize do
  Bullet.enable = true
  Bullet.alert = true
  Bullet.bullet_logger = true
  Bullet.console = true
  Bullet.rails_logger = true
end

Example output:

GET /posts
USE eager loading detected
  Post => [:comments]
  Add to your query: Post.includes([:comments])

Advantages:

Catches common ORM performance issues early
Provides specific recommendations for fixes
Works across all environments

GitHub: https://github.com/flyerhzm/bullet
Documentation: https://github.com/flyerhzm/bullet/blob/master/README.md

3. Ruby Prof (and StackProf)

What it does: Low-level Ruby code profiler that shows exactly where time is being spent.

Why it’s important: When you need deep insight into method-level performance characteristics.

Installation:

# Gemfile
gem 'ruby-prof', group: :development
gem 'stackprof', group: :development

Usage example:

# In your controller or service object
result = RubyProf.profile do
  # Code you want to profile
end

printer = RubyProf::GraphPrinter.new(result)
printer.print(STDOUT, {})

For StackProf:

StackProf.run(mode: :cpu, out: 'tmp/stackprof.dump') do
  # Code to profile
end

Advantages:

Method-level granularity
Multiple output formats (call graphs, flamegraphs)
StackProf is sampling-based so has lower overhead

GitHub: https://github.com/ruby-prof/ruby-prof
Documentation: https://github.com/ruby-prof/ruby-prof/blob/master/README.md

StackProf Alternative:
GitHub: https://github.com/tmm1/stackprof
Documentation: https://github.com/tmm1/stackprof/blob/master/README.md

4. Memory Profiler

What it does: Tracks memory allocations and helps identify memory bloat.

Why it’s important: Memory issues can lead to slow performance and even crashes.

Installation:

# Gemfile
gem 'memory_profiler', group: :development

Usage example:

report = MemoryProfiler.report do
  # Code to profile
end

report.pretty_print(to_file: 'memory_report.txt')

Advantages:

Shows allocated objects by class and location
Tracks retained memory after GC
Helps find memory leaks

GitHub: https://github.com/SamSaffron/memory_profiler
Documentation: https://github.com/SamSaffron/memory_profiler/blob/master/README.md

5. Skylight

What it does: Production-grade application performance monitoring (APM).

Why it’s important: Understanding real-world performance characteristics is different from development profiling.

Installation:

# Gemfile
gem 'skylight'

Configuration:

# config/skylight.yml
production:
  authentication: [YOUR_AUTH_TOKEN]

Advantages:

Low-overhead production profiling
Endpoint-level performance breakdowns
Database query analysis
Exception tracking

Website: https://www.skylight.io
Documentation: https://docs.skylight.io
GitHub: https://github.com/skylightio/skylight-ruby

6. AppSignal

What it does: Full-stack performance monitoring and error tracking.

Why it’s important: Provides comprehensive insights across your entire application stack.

Installation:

# Gemfile
gem 'appsignal'

Then run:

bundle exec appsignal install YOUR_PUSH_API_KEY

Advantages:

Error tracking alongside performance
Host metrics integration
Background job monitoring
Magic Dashboard for quick insights

Website: https://appsignal.com
Documentation: https://docs.appsignal.com/ruby
GitHub: https://github.com/appsignal/appsignal-ruby

7. Derailed Benchmarks

What it does: Suite of benchmarks and performance tests for your application.

Why it’s important: Helps catch performance regressions before they hit production.

Installation:

# Gemfile
group :development, :test do
  gem 'derailed_benchmarks'
end

Usage examples:

# Memory usage at boot
bundle exec derailed bundle:mem

# Performance per route
bundle exec derailed exec perf:routes

Advantages:

CI-friendly performance testing
Memory usage analysis
Route-based performance testing

GitHub: https://github.com/schneems/derailed_benchmarks
Documentation: https://github.com/schneems/derailed_benchmarks/blob/master/README.md

8. Flamegraph Generation

What it does: Visual representation of where time is being spent in your application.

Why it’s important: Provides an intuitive way to understand call stacks and hot paths.

Installation:

# Gemfile
gem 'flamegraph'
gem 'stackprof' # if not already installed

Usage example:

Flamegraph.generate('flamegraph.html') do
  # Code to profile
end

Advantages:

Visual representation of performance
Easy to spot hot paths
Interactive exploration

GitHub: https://github.com/SamSaffron/flamegraph
Documentation: http://samsaffron.github.io/flamegraph/rails-startup.html

Additional Helpful Tools 🔧

9. Benchmark-ips

Benchmark-ips (iterations per second) is a superior benchmarking tool compared to Ruby’s standard Benchmark library. It provides:

Iterations-per-second measurement – More intuitive than raw time measurements
Statistical analysis – Shows standard deviation between runs
Comparison mode – Easily compare different implementations
Warmup phase – Accounts for JIT and cache warming effects

Benchmark-ips solves these problems and is particularly valuable for:

Comparing algorithm implementations
Testing performance optimizations
Benchmarking gem alternatives
Validating performance-critical code

GitHub: https://github.com/evanphx/benchmark-ips
Documentation: https://github.com/evanphx/benchmark-ips/blob/master/README.md

Installation

# Gemfile
gem 'benchmark-ips', group: :development

Basic Usage:

require 'benchmark/ips'

Benchmark.ips do |x|
  x.report("addition") { 1 + 2 }
  x.report("addition with to_s") { (1 + 2).to_s }
  x.compare!
end

Advanced Features:

Benchmark.ips do |x|
  x.time = 5 # Run each benchmark for 5 seconds
  x.warmup = 2 # Warmup time of 2 seconds
  
  x.report("Array#each") { [1,2,3].each { |i| i * i } }
  x.report("Array#map") { [1,2,3].map { |i| i * i } }
  
  # Add custom statistics
  x.config(stats: :bootstrap, confidence: 95)
  
  x.compare!
end

# Memory measurement
require 'benchmark/memory'

Benchmark.memory do |x|
  x.report("method1") { ... }
  x.report("method2") { ... }
  x.compare!
end

# Disable GC for more consistent results
Benchmark.ips do |x|
  x.config(time: 5, warmup: 2, suite: GCSuite.new)
end

Sample Output:

Warming up --------------------------------------
            addition    281.899k i/100ms
  addition with to_s    261.831k i/100ms
Calculating -------------------------------------
            addition      8.614M (± 1.2%) i/s -     43.214M in   5.015800s
  addition with to_s      7.017M (± 1.8%) i/s -     35.347M in   5.038446s

Comparison:
            addition:  8613594.0 i/s
  addition with to_s:  7016953.3 i/s - 1.23x slower

Key Advantages

Accurate comparisons with statistical significance
Warmup phase eliminates JIT/caching distortions
Memory measurements available through extensions
Customizable reporting with various statistics options

10. Rails Performance (Dashboard)

What is Rails Performance?

Rails Performance is a self-hosted alternative to New Relic/Skylight that provides:

Request performance tracking
Background job monitoring
Slowest endpoints identification
Error tracking
Custom event monitoring

Why It’s Important

For teams that:

Can’t use commercial SaaS solutions
Need to keep performance data in-house
Want historical performance tracking
Need simple setup without complex infrastructure

GitHub: https://github.com/igorkasyanchuk/rails_performance
Documentation: https://github.com/igorkasyanchuk/rails_performance/blob/master/README.md

Installation

# Gemfile
gem 'rails_performance', group: :development

Then run:

rails g rails_performance:install
rake db:migrate

Configuration

# config/initializers/rails_performance.rb
RailsPerformance.setup do |config|
  config.redis = Redis.new # optional, will use Rails.cache otherwise
  config.duration = 4.hours # store requests for 4 hours
  config.enabled = Rails.env.production?
  config.http_basic_authentication_enabled = true
  config.http_basic_authentication_user_name = 'admin'
  config.http_basic_authentication_password = 'password'
end

Accessing the Dashboard:

After installation, access the dashboard at:

http://localhost:3000/rails/performance

Custom Tracking:

# Track custom events
RailsPerformance.trace("custom_event", tags: { type: "import" }) do
  # Your code here
end

# Track background jobs
class MyJob < ApplicationJob
  around_perform do |job, block|
    RailsPerformance.trace(job.class.name, tags: job.arguments) do
      block.call
    end
  end
end

# Add custom fields to requests
RailsPerformance.attach_extra_payload do |payload|
  payload[:user_id] = current_user.id if current_user
end

# Track slow queries
ActiveSupport::Notifications.subscribe("sql.active_record") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  if event.duration > 100 # ms
    RailsPerformance.trace("slow_query", payload: {
      sql: event.payload[:sql],
      duration: event.duration
    })
  end
end

Sample Dashboard Views:

Requests Overview:
- Average response time
- Requests per minute
- Slowest actions
Detailed Request View:
- SQL queries breakdown
- View rendering time
- Memory allocation
Background Jobs:
- Job execution time
- Failures
- Queue times

Key Advantages

Self-hosted solution – No data leaves your infrastructure
Simple setup – No complex dependencies
Historical data – Track performance over time
Custom events – Track any application events
Background jobs – Full visibility into async processes

Implementing a Complete Profiling Strategy

For a comprehensive approach, combine these tools at different stages:

Development:

Rack MiniProfiler (always on)
Bullet (catch N+1s early)
RubyProf/StackProf (for deep dives)

CI Pipeline:

Derailed Benchmarks
Memory tests

Production:

Skylight or AppSignal
Error tracking with performance context

Sample Rails 8 Configuration

Here’s how to set up a complete profiling environment in a new Rails 8 app:

# Gemfile

# Development profiling
group :development do
  # Basic profiling
  gem 'rack-mini-profiler'
  gem 'bullet'
  
  # Deep profiling
  gem 'ruby-prof'
  gem 'stackprof'
  gem 'memory_profiler'
  gem 'flamegraph'
  
  # Benchmarking
  gem 'derailed_benchmarks', require: false
  gem 'benchmark-ips'
  
  # Dashboard
  gem 'rails_performance'
end

# Production monitoring (choose one)
group :production do
  gem 'skylight'
  # or
  gem 'appsignal'
  # or
  gem 'newrelic_rpm' # Alternative option
end

Then create an initializer for development profiling:

# config/initializers/profiling.rb
if Rails.env.development?
  require 'rack-mini-profiler'
  Rack::MiniProfilerRails.initialize!(Rails.application)

  Rails.application.config.after_initialize do
    Bullet.enable = true
    Bullet.alert = true
    Bullet.bullet_logger = true
    Bullet.rails_logger = true
  end
end

Conclusion

Profiling your Rails 8 application shouldn’t be an afterthought. By implementing these tools throughout your development lifecycle, you’ll catch performance issues early, maintain a fast application, and provide better user experiences.

Remember:

Use development tools like MiniProfiler and Bullet daily
Run deeper profiles with RubyProf before optimization work
Monitor production with Skylight or AppSignal
Establish performance benchmarks with Derailed

With this toolkit, you’ll be well-equipped to build and maintain high-performance Rails 8 applications.

Enjoy Rails! 🚀