🧭 Overview — what we’ll cover
- How to read and act on
EXPLAIN ANALYZEoutput (Postgres) — with exact commands and examples. - Index strategy: b-tree, composite,
INCLUDE, covering indexes, partials, GIN/GIN_TRGM where relevant. - Practical before/after for the Flipper join query.
- Database-level tooling:
pg_stat_statements, slow query logging,ANALYZE, vacuum, stats targets. - Advanced Rails-side profiling: CPU sampling (rbspy), Ruby-level profilers (stackprof, ruby-prof), flamegraphs, allocation profiling.
- Memory profiling & leak hunting: derailed_benchmarks, memory_profiler, allocation tracing.
- Production-safe profiling and APMs: Skylight, New Relic, Datadog, and guidelines for low-risk sampling.
- Other advanced optimizations: connection pool sizing, backgrounding heavy work, keyset pagination, materialized views, denormalization, and caching patterns.
- A checklist & playbook you can run when a high-traffic route is slow.
1) Deep dive: EXPLAIN ANALYZE (Postgres)
Why use it
`EXPLAIN` shows the planner’s chosen plan. `EXPLAIN ANALYZE` runs the query and shows *actual* times and row counts. This is the single most powerful tool to understand why a query is slow. <h3>Run it from psql</h3>
sql EXPLAIN ANALYZE SELECT flipper_features.key AS feature_key, flipper_gates.key, flipper_gates.value FROM flipper_features LEFT OUTER JOIN flipper_gates ON flipper_features.key = flipper_gates.feature_key;
Or add verbosity, buffers and JSON output:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT ...;
Then pipe JSON to jq for readability:
psql -c "EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ..." | jq .
Run it from Rails console
res = ActiveRecord::Base.connection.execute(<<~SQL) EXPLAIN ANALYZE SELECT ... SQL puts res.values.flatten.join("\n")
`res.values.flatten` will give the lines of the textual plan.
How to read the plan (key fields)
A typical node line: `Nested Loop (cost=0.00..123.45 rows=100 width=48) (actual time=0.123..5.678 rows=100 loops=1) ` – **Plan node**: e.g., Seq Scan, Index Scan, Nested Loop, Hash Join, Merge Join. – **cost=** planner estimates (startup..total). Not actual time. – **actual time=** real measured times: start..end. The end value for the top node is total time. – **rows=** estimated rows; **actual rows** follow in `actual time` block. If estimates are very different from actuals → bad statistics or wrong assumptions. – **loops=** how many times the node ran (outer loop counts). Multiply loops × actual time to know total work. – **Buffers** (if `BUFFERS` requested) show disk vs shared buffer I/O — important for I/O-bound queries. <h3>Interpretation checklist</h3> – Is Postgres doing a `Seq Scan` on a table that should use an index? → candidate for index. – Are `actual rows` much larger than `estimated rows`? → statistics outdated (`ANALYZE`) or stats target insufficient. – Is the planner using `Nested Loop` with a large inner table and many outer loops? → might need a different join strategy or indexes to support index scans, or to rewrite query. – High `buffers` read from disk → cold cache or I/O pressure. Consider tuning or adding indexes to reduce full scans, or faster disks/IO.
2) Indexing strategies — practical rules
B-tree indexes (default)
– Good for equality (`=`) and range (`<`, `>`) queries and joins on scalar columns. – Add a single-column index when you join on that column often.
Migration example:
class AddIndexToFlipperGatesFeatureKey < ActiveRecord::Migration[7.0]
def change
add_index :flipper_gates, :feature_key, name: 'index_flipper_gates_on_feature_key'
end
end
Composite index
– Useful when WHERE or JOIN uses multiple columns together in order. – The left-most prefix rule: index `(a,b,c)` supports lookups on `a`, `a,b`, `a,b,c` — not `b` alone. <h3>`INCLUDE` for covering indexes (Postgres)</h3> – Use `INCLUDE` to add non-key columns to the index payload so the planner can do an index-only scan.
`add_index :orders, [:user_id, :created_at], include: [:total_amount] ` This avoids heap lookup for those included columns. <h3>Partial indexes</h3> – Index only a subset of rows where conditions often match:
add_index :users, :email, unique: true, where: "email IS NOT NULL"
GIN / GIST indexes
– For full-text search or array/JSONB: use GIN (or trigram GIN for `ILIKE` fuzzy matches).
– Example: `CREATE INDEX ON table USING GIN (jsonb_col);`
Index maintenance
– Run `ANALYZE` after large data load to keep statistics fresh. – Consider `REINDEX` if index bloat occurs. – Use `pg_stat_user_indexes` to check index usage.
<h2>3) Example: Flipper join query — BEFORE & AFTER</h2> <h3>Problem query (recap)</h3
“`sql SELECT flipper_features.key AS feature_key, flipper_gates.key, flipper_gates.value FROM flipper_features LEFT OUTER JOIN flipper_gates ON flipper_features.key = flipper_gates.feature_key; “`
This was running repeatedly and slow (60–200ms) in many requests. <h3>Diagnosis</h3>
– The `flipper_gates` table had a composite index `(feature_key, key, value)`. Because your join only used `feature_key`, Postgres sometimes didn’t pick the composite index effectively, or the planner preferred seq scan due to small table size or outdated stats. – Repetition (many calls to `Flipper.enabled?`) magnified cost.
<h3>Fix 1 — Add a direct index on `feature_key`</h3>
Migration: “`ruby class AddIndexFlipperGatesOnFeatureKey < ActiveRecord::Migration[7.0] def change add_index :flipper_gates, :feature_key, name: ‘index_flipper_gates_on_feature_key’ end end “`
<h3>Fix 2 — Optionally make it a covering index (if you select `key, value` often)</h3>
“`ruby add_index :flipper_gates, :feature_key, name: ‘index_flipper_gates_on_feature_key_include’, using: :btree, include: [:key, :value] “` This lets Postgres perform an index-only scan without touching the heap for `key` and `value`.
<h3>EXPLAIN ANALYZE before vs after (expected)</h3
BEFORE (hypothetical):
Nested Loop
-> Seq Scan on flipper_features (cost=...)
-> Seq Scan on flipper_gates (cost=...) <-- heavy
Actual Total Time: 120ms
AFTER:
Nested Loop
-> Seq Scan on flipper_features (small)
-> Index Scan using index_flipper_gates_on_feature_key on flipper_gates (cost=... actual time=0.2ms)
Actual Total Time: 1.5ms
Add EXPLAIN ANALYZE to your pipeline and confirm the plan uses Index Scan rather than Seq Scan.
<h3>Important note</h3>
On tiny tables, sometimes Postgres still chooses Seq Scan (cheap), but when repeated or run many times per request, even small scans add up. Index ensures stable, predictable behaviour when usage grows.
<h2>4) Database-level tools & monitoring</h2>
<h3>`pg_stat_statements` (must be enabled)</h3>
Aggregate query statistics (calls, total time). Great to find heavy queries across the whole DB. Query example: “`sql SELECT query, calls, total_time, mean_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20; “` This points to the most expensive queries over time (not just single slow execution).
<h3>Slow query logging</h3>
Enable `log_min_duration_statement` in `postgresql.conf` (e.g., 200ms) to log slow queries. Then analyze logs with `pgbadger` or `pg_activity`.
<h3>`ANALYZE`, `VACUUM`</h3>
`ANALYZE` updates table statistics — helps the planner choose better plans. Run after bulk loads. – `VACUUM` frees up space and maintains visibility map; `VACUUM FULL` locks table — use carefully.
<h3>Lock and activity checks</h3>
See long-running queries and blocking:
“`sql SELECT pid, query, state, age(now(), query_start) AS runtime FROM pg_stat_activity WHERE state <> ‘idle’ AND now() – query_start > interval ‘5 seconds’; “`
<h2>5) Ruby / Rails advanced profiling</h2>
You already use rack-mini-profiler. For CPU & allocation deep dives, combine sampling profilers and Ruby-level profilers.
<h3>Sampling profilers (production-safe-ish)</h3>
rbspy (native sampling for Ruby processes) — low overhead, no code changes:
rbspy record --pid <PID> -- ruby bin/rails server
rbspy flamegraph --output flame.svg
rbspy collects native stack samples and generates a flamegraph. Good for CPU hotspots in production.
rbspy notes
- Does not modify code; low overhead.
- Requires installing
rbspyon the host.
<h3>stackprof + flamegraph (Ruby-level)</h3>
Add to Gemfile (in safe envs):
gem 'stackprof'
gem 'flamegraph'
Run a block you want to profile:
require 'stackprof'
StackProf.run(mode: :wall, out: 'tmp/stackprof.dump', raw: true) do
# run code you want to profile (a request, a job, etc)
end
# to read
stackprof tmp/stackprof.dump --text
# or generate flamegraph with stackprof or use flamegraph gem:
require 'flamegraph'
Flamegraph.generate('tmp/fg.svg') { your_code_here }
<h3>ruby-prof (detailed callgraphs)</h3>
Much higher overhead; generates call-graphs. Use in QA or staging, not production.
“`ruby require ‘ruby-prof’ RubyProf.start # run code result = RubyProf.stop printer = RubyProf::GraphHtmlPrinter.new(result) printer.print(File.open(“tmp/ruby_prof.html”, “w”), {}) “`
<h3>Allocation profiling</h3>
Use `derailed_benchmarks` gem for bundle and memory allocations:
“`bash bundle exec derailed bundle:mem bundle exec derailed exec perf:objects # or memory “` – `memory_profiler` gem gives detailed allocations:
“`ruby require ‘memory_profiler’ report = MemoryProfiler.report { run_code } report.pretty_print(to_file: ‘tmp/memory_report.txt’) “`
<h3>Flamegraphs for request lifecycles</h3>
You can capture a request lifecycle and render a flamegraph using stackprof or rbspy, then open SVG.
<h2>6) Memory & leak investigations</h2>
<h3>Symptoms</h3>
Memory grows over time in production processes. – Frequent GC pauses. – OOM kills.
<h3>Tools</h3> – `derailed_benchmarks` (hotspots and gem bloat). – `memory_profiler` for allocation snapshots (see above). – `objspace` built-in inspector (`ObjectSpace.each_object(Class)` helps count objects). – Heap dumps with `rbtrace` or `memory_profiler` for object graphs. <h3>Common causes & fixes</h3> – Caching big objects in-process (use Redis instead). – Retaining references in global arrays or singletons. – Large temporary arrays in request lifecycle — memoize or stream responses. <h3>Example patterns to avoid</h3> – Avoid storing large AR model sets in global constants. – Use `find_each` to iterate large result sets. – Use streaming responses for very large JSON/XML.
<h2>7) Production profiling — safe practices & APMs</h2> <h3>APMs</h3> – **Skylight / NewRelic / Datadog / Scout** — they give per-endpoint timings, slow traces, and SQL breakdowns in production with low overhead. Use them to find hotspots without heavy manual profiling. <h3>Sampling vs continuous profiling</h3> – Use *sampling* profilers (rbspy, production profilers) in short windows to avoid high overhead. – Continuous APM tracing (like New Relic) integrates naturally and is production-friendly. <h3>Instrument carefully</h3> – Only enable heavy profiling when you have a plan; capture for short durations. – Prefer off-peak hours or blue/green deployments to avoid affecting users.
<h2>8) Other advanced DB & Rails optimizations</h2> <h3>Connection pool tuning</h3> – Puma workers & threads must match DB pool size. Example `database.yml`: “`yaml production: pool: <%= ENV.fetch(“DB_POOL”, 5) %> “` – If Puma threads > DB pool, requests will block waiting for DB connection — can appear as slow requests. <h3>Background jobs</h3> – Anything non-critical to request latency (e.g., sending emails, analytics, resizing images) should be moved to background jobs (Sidekiq, ActiveJob). – Synchronous mailers or external API calls are common causes of slow requests. <h3>Keyset pagination (avoid OFFSET)</h3> – For large result sets use keyset pagination: “`sql SELECT * FROM posts WHERE (created_at, id) < (?, ?) ORDER BY created_at DESC, id DESC LIMIT 20 “` This is far faster than `OFFSET` for deep pages. <h3>Materialized views for heavy aggregations</h3> – Pre-compute heavy joins/aggregates into materialized views and refresh periodically or via triggers. <h3>Denormalization & caching</h3> – Counter caches: store counts in a column and update via callbacks to avoid COUNT(*) queries. – Cache pre-rendered fragments or computed JSON blobs for heavy pages (with care about invalidation).
<h2>9) Serialization & JSON performance</h2> <h3>Problems</h3> – Serializing huge AR objects or many associations can be expensive. <h3>Solutions</h3> – Use serializers that only include necessary fields: `fast_jsonapi` (jsonapi-serializer) or `JBuilder` with simple `as_json(only: …)`. – Return minimal payloads and paginate. – Use `pluck` when you only need a few columns.
<h2>10) Playbook: step-by-step when a route is slow (quick reference)</h2>
- Reproduce the slow request locally or in staging if possible.
- Tail the logs (
tail -f log/production.log) and check SQL statements and controller timings. - Run
EXPLAIN (ANALYZE, BUFFERS)for suspect queries. - If
Seq Scanappears where you expect an index, add or adjust indexes. RunANALYZE. - Check for N+1 queries with Bullet or rack-mini-profiler and fix with
includes. - If many repeated small DB queries (Flipper-like), add caching (Redis or adapter-specific cache) or preloading once per request.
- If CPU-bound, collect a sampling profile (rbspy) for 30–60s and generate a flamegraph — find hot Ruby methods. Use stackprof for deeper dive.
- If memory-bound, run memory_profiler or derailed, find object retainers.
- If urgent and unknown, turn on APM traces for a short window to capture slow traces in production.
- After changes, run load test (k6, wrk) if at scale, and monitor
pg_stat_statementsto confirm improvement.
<h2>11) Example commands and snippets (cheat-sheet)</h2>
EXPLAIN ANALYZE psql
psql -d mydb -c "EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ...;" | jq .
EXPLAIN from Rails console
res = ActiveRecord::Base.connection.execute("EXPLAIN ANALYZE SELECT ...")
puts res.values.flatten.join("\n")
Add index migration
class AddIndexFlipperGatesOnFeatureKey < ActiveRecord::Migration[7.0]
def change
add_index :flipper_gates, :feature_key, name: 'index_flipper_gates_on_feature_key'
end
end
ANALYZE
ANALYZE flipper_gates;
ANALYZE flipper_features;
pg_stat_statements
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC LIMIT 10;
rbspy recording (system-level sampling)
rbspy record --pid <PID> --output /tmp/rbspy.dump
rbspy flamegraph --input /tmp/rbspy.dump --output /tmp/flame.svg
stackprof example
StackProf.run(mode: :cpu, out: 'tmp/stackprof.dump') do
# exercise the app code, e.g. issue a request
end
stackprof tmp/stackprof.dump --text
memory_profiler
report = MemoryProfiler.report { run_heavy_code }
report.pretty_print
report.write_html('/tmp/memory_report.html') if report.respond_to?(:write_html)
Check DB connections
SELECT count(*) FROM pg_stat_activity WHERE state='active';
<h2>12) Final notes & best practices</h2>
- Measure before you change. Always capture a baseline (
EXPLAIN ANALYZE, APM traces). - Small fixes add up — one index + reducing a few N+1s often reduces response times dramatically.
- Cache with thought — caching solves many repeated-read problems (Flipper example), but invalidation must be correct. Use namespaced keys and hook invalidation to mutations.
- Use the right tool for the job: sampling profilers for CPU hotspots;
EXPLAIN ANALYZEfor DB; memory tools for leaks. - Automate monitoring —
pg_stat_statements, APMs, dashboards, and alerts let you spot regressions early.
If you want, I’ll now:
- Generate a ready-to-run
EXPLAIN ANALYZEbefore/after script for your Flipper query and the index migration, or - Provide the exact commands to run in staging to capture a stackprof or rbspy flamegraph for a slow request (and a sample SVG), or
- Draft a one-page playbook you can paste in a team wiki for on-call performance steps.
Which of those would you like me to produce next?


