🐘 Fixing PostgreSQL Startup Issues on macOS (Homebrew): A Real-World Troubleshooting Guide

Introduction

Recently, I encountered an interesting PostgreSQL issue on my MacBook.

PostgreSQL was installed via Homebrew and worked perfectly on one macOS user account. However, when switching to another account on the same machine, I was unable to connect to PostgreSQL using psql.

The error looked like this:

psql postgres
psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
No such file or directory
Is the server running locally and accepting connections on that socket?

This article walks through the investigation, root cause analysis, and final solution.


Understanding the Error

When PostgreSQL starts successfully, it creates a Unix socket file:

/tmp/.s.PGSQL.5432

The psql client uses this socket by default to connect to the local PostgreSQL server.

The error indicates one of two possibilities:

  1. PostgreSQL is not running.
  2. PostgreSQL is running but not listening on the expected socket.

In my case, PostgreSQL was simply not running for the current macOS user account.


Initial Verification

Verify PostgreSQL Client Installation

which psql

Output:

/opt/homebrew/bin/psql

Check version:

psql --version

Output:

psql (PostgreSQL) 14.17 (Homebrew)

This confirmed that PostgreSQL client tools were correctly installed.

Verify Installed PostgreSQL Version

brew list | grep postgres

Output:

postgresql@14

Check Whether PostgreSQL Is Running

pg_isready

Output:

/tmp:5432 - no response

This confirmed that PostgreSQL was not accepting connections.

Manual Startup Worked

Interestingly, PostgreSQL could be started manually:

/opt/homebrew/opt/postgresql@14/bin/pg_ctl \
-D /opt/homebrew/var/postgresql@14 \
-l /opt/homebrew/var/log/postgresql.log start

Output:

waiting for server to start.... done
server started

This was a critical clue.

It told us:

  • PostgreSQL binaries were healthy.
  • Database files were healthy.
  • Data directory was healthy.
  • The issue was likely related to Homebrew services or macOS LaunchAgents.

Investigating Homebrew Services

Checking service status:

brew services list

Output:

Name Status User
postgresql@14 error 78 abhilash

Attempting to start the service:

brew services start postgresql@14

Result:

Bootstrap failed: 5: Input/output error
launchctl bootstrap gui/501

This indicated a problem with the macOS LaunchAgent used by Homebrew.


Root Cause

Homebrew services rely on macOS launchctl.

Each macOS user account gets its own LaunchAgents configuration.

Although PostgreSQL was installed globally under Homebrew, the LaunchAgent configuration for this specific user account had become corrupted or stale.

As a result:

  • Manual startup worked.
  • Automatic startup through Homebrew failed.

Fixing the LaunchAgent

Stop Existing Service

brew services stop postgresql@14

Remove Existing LaunchAgent

rm ~/Library/LaunchAgents/homebrew.mxcl.postgresql@14.plist

Clean Up Homebrew Services

brew services cleanup

Verify Ownership

ls -ld /opt/homebrew/var/postgresql@14

If ownership is incorrect:

sudo chown -R $(whoami):staff /opt/homebrew/var/postgresql@14

Recreate the Service

After cleanup:

brew services start postgresql@14

Output:

Successfully started `postgresql@14`

Checking status:

brew services list

Output:

postgresql@14 started

Success!


Verifying PostgreSQL Is Running

pg_isready

Output:

/tmp:5432 - accepting connections

Connecting:

psql postgres

Output:

postgres=#

PostgreSQL was now functioning normally.


Understanding a New Error

While reviewing PostgreSQL logs, I noticed:

FATAL: database "abhilash" does not exist

At first glance, this looked concerning.

However, this is normal behavior.

When you run:

psql

PostgreSQL automatically tries to connect to a database matching your operating system username.

For example:

macOS username = abhilash

PostgreSQL attempts:

CONNECT TO abhilash;

Since that database didn’t exist, PostgreSQL logged:

FATAL: database "abhilash" does not exist

Creating a Personal Database

To make plain psql work:

CREATE DATABASE abhilash;

Now simply running:

psql

works because PostgreSQL can find a matching database.


Key Lessons Learned

1. Verify Whether PostgreSQL Is Actually Running

pg_isready

is often the fastest diagnostic tool.

2. Manual Startup Helps Isolate the Problem

If pg_ctl start works, your PostgreSQL installation and data files are probably fine.

3. Homebrew Services Depend on macOS LaunchAgents

A corrupted LaunchAgent can prevent PostgreSQL from auto-starting even when PostgreSQL itself is healthy.

4. Don’t Reinstall Immediately

Many developers jump directly to:

brew uninstall postgresql
brew install postgresql

In this case, reinstalling would not have fixed the issue and could have introduced additional problems.

5. Read the PostgreSQL Logs

Logs quickly reveal whether you’re dealing with:

  • Permission issues
  • Missing databases
  • Port conflicts
  • Startup failures
  • Authentication errors

Final Verification Checklist

brew services list
pg_isready
psql postgres

Expected results:

postgresql@14 started
/tmp:5432 - accepting connections
postgres=#

At this point, PostgreSQL is healthy and configured to start automatically after reboot.


Conclusion

What initially appeared to be a PostgreSQL installation problem turned out to be a macOS LaunchAgent issue specific to one user account.

By methodically checking:

  • PostgreSQL installation
  • Server status
  • Homebrew services
  • LaunchAgent configuration
  • PostgreSQL logs

we were able to restore automatic startup without reinstalling PostgreSQL or risking data loss.

This experience serves as a reminder that startup problems are often service-management issues rather than database issues.

Happy Debugging! 🚀

GCP Cloud SQL Disaster Recovery: A Practical Guide for Developers

When a production database goes down – whether from a bad migration, an accidental DROP TABLE, or a rogue script – the clock starts ticking. Every minute of downtime is lost revenue, broken trust, and a very stressful Slack channel.

This post walks through how Google Cloud SQL’s backup and recovery features work, common disaster scenarios, and the recovery playbook a developer should follow for each. The examples use a typical SaaS application backed by PostgreSQL on Cloud SQL, but the principles apply broadly.

Cloud SQL Backup Fundamentals

Before anything goes wrong, you need to understand what Cloud SQL gives you out of the box and what you need to configure yourself.

Automated Backups

Cloud SQL can take daily automated backups of your instance. These are full snapshots of the entire database and are retained for a configurable window (default 7 days, max 365).

# gcloud: verify automated backups are enabled
gcloud sql instances describe my-instance \
  --format="value(settings.backupConfiguration)"

Key settings to configure:

SettingRecommendationWhy
backupConfiguration.enabledtrueNon-negotiable for production
backupConfiguration.startTimeOff-peak hours (e.g. 04:00 UTC)Minimizes performance impact
backupConfiguration.backupRetentionSettings.retainedBackups14-30Gives you a wider recovery window
backupConfiguration.pointInTimeRecoveryEnabledtrueEnables PITR (see below)
backupConfiguration.transactionLogRetentionDays7How far back PITR can reach

Point-in-Time Recovery (PITR)

Automated backups give you daily snapshots. PITR fills the gaps by continuously archiving write-ahead logs (WAL for PostgreSQL, binary logs for MySQL). This lets you restore to any second within the retention window — not just to the time of the last backup.

# Enable PITR on an existing instance
gcloud sql instances patch my-instance \
  --enable-point-in-time-recovery \
  --retained-transaction-log-days=7

PITR is the single most important setting for disaster recovery. Without it, you lose every write between your last automated backup and the incident.

On-Demand Backups

You can trigger a backup manually before risky operations:

gcloud sql backups create --instance=my-instance \
  --description="pre-migration-backup-2026-04-08"

Rule of thumb: always take an on-demand backup before running migrations, bulk data operations, or any ad-hoc SQL against production.


Disaster Scenarios and Recovery Playbooks

Scenario 1: Accidental Table Drop or Data Deletion

What happened: A developer ran a DROP TABLE or DELETE FROM without a WHERE clause against production. Maybe it was a script meant for staging. Maybe an AI-generated SQL statement was executed without review.

Impact: One or more tables are gone or empty. The application is throwing 500s.

Recovery options:

Option A: PITR (best if available)

Restore to the moment just before the destructive command. You’ll need the approximate timestamp.

# Restore to a clone instance first — never restore directly over production
gcloud sql instances clone my-instance my-instance-recovery \
  --point-in-time="2026-04-08T10:59:00Z"

This creates a new instance with the database state at that exact second. You can then:

  1. Verify the data on the clone
  2. Export the affected tables from the clone
  3. Import them back into the production instance
# Export a specific table from the recovery clone
gcloud sql export sql my-instance-recovery gs://my-bucket/recovery/users-table.sql \
  --database=myapp_production \
  --table=users

# Import into production
gcloud sql import sql my-instance gs://my-bucket/recovery/users-table.sql \
  --database=myapp_production

Option B: Restore from automated backup

If PITR is not enabled, restore the most recent automated backup that predates the incident.

# List available backups
gcloud sql backups list --instance=my-instance

# Restore a specific backup (this overwrites the instance)
gcloud sql backups restore BACKUP_ID --restore-instance=my-instance

Warning: Restoring a backup directly onto your production instance overwrites everything. All writes since that backup are lost. Prefer cloning to a recovery instance first.

The data gap problem:

When you restore from a backup taken at, say, 4:00 AM, but the incident happened at 11:00 AM, you lose 7 hours of data. This is the gap you’ll need to address manually. Common strategies:

  • Application-level event logs: If your app publishes events to a message queue (Kafka, Pub/Sub), you can replay them.
  • Analytics replicas: If you replicate data to BigQuery, Snowflake, or another analytics store, you can query the missing records from there and re-import them.
  • Audit tables: If your application logs changes to an audit table in a separate database, those records survive.
-- Example: querying BigQuery for records created during the gap window
SELECT *
FROM `project.dataset.user_actions`
WHERE created_at BETWEEN TIMESTAMP('2026-04-08 04:00:00', 'America/Vancouver')
  AND TIMESTAMP('2026-04-08 11:00:00', 'America/Vancouver')
  AND action_type = 'account_status_change'

You then re-ingest these records into production, typically via a script run in your application’s console or through a migration task.


Scenario 2: Interrupted Background Job

What happened: A critical scheduled job — say, one that generates weekly records for all active users — was running when the incident occurred. The database was restored from backup, but the job was killed mid-execution. Some users got their records; others didn’t.

Impact: No application errors (the data that exists is valid), but there’s a silent gap. Some users are missing records they should have.

Recovery playbook:

Step 1 — Quantify the gap

Before doing anything, measure what’s missing:

# Find users who should have a record but don't
target_date = Date.parse('2026-05-30')
users_missing = User.where(status: ['active', 'subscribed'])
  .where.not(id: WeeklyRecord.where(week_date: target_date).select(:user_id))
users_missing.count

Record the count. You’ll need it for verification later.

Step 2 – Understand the generation logic

Before re-running anything, understand what the job does:

  • Does it check for existing records before creating? (idempotent?)
  • Does it behave differently based on user status? (e.g., suspended users get a different treatment)
  • Does it trigger side effects? (emails, webhooks, billing)

If the job is idempotent — meaning running it twice for the same user produces the same result without duplicates — you can safely re-run it for all users, not just the ones missing records. This is simpler and safer than trying to target only the gap.

Step 3 – Re-run with guardrails

Write a targeted script rather than re-triggering the entire job:

target_date = Date.parse('2026-05-30')
# Pre-check
baseline_count = WeeklyRecord.where(week_date: target_date).count
puts "Records before: #{baseline_count}"
# Find and process missing users
users_missing = User.where(status: ['active', 'subscribed'])
.where.not(id: WeeklyRecord.where(week_date: target_date).select(:user_id))
puts "Users missing records: #{users_missing.count}"
users_missing.find_each do |user|
WeeklyRecordGenerator.new(user).generate(target_date)
rescue => e
puts "Failed for User ##{user.id}: #{e.message}"
end
# Post-check
new_count = WeeklyRecord.where(week_date: target_date).count
puts "Records after: #{new_count}"
puts "Delta: #{new_count - baseline_count}"

Step 4 – Verify

Check that:

  • The record count increased by the expected amount
  • No duplicates were created
  • No users are still missing records
  • Any status-dependent logic was applied correctly (e.g., suspended users got the right treatment)

Scenario 3: Corrupted Data from a Bad Migration

What happened: A migration altered a column type, dropped a constraint, or backfilled data incorrectly. The application is running but producing wrong results.

Impact: Data is present but incorrect. This is often harder to detect than missing data.

Recovery playbook:

  1. Don’t panic-restore. If the app is functional (just producing wrong data), you have time to assess.
  2. Clone to a recovery instance from a backup predating the migration: gcloud sql instances clone my-instance pre-migration-clone \ --point-in-time="2026-04-07T23:00:00Z"
  3. Diff the data between production and the clone to understand exactly what changed: -- Compare row counts SELECT 'production' as source, count(*) FROM production.orders UNION ALL SELECT 'backup' as source, count(*) FROM backup_clone.orders; -- Find rows that differ SELECT p.id, p.amount as prod_amount, b.amount as backup_amount FROM production.orders p JOIN backup_clone.orders b ON p.id = b.id WHERE p.amount != b.amount;
  4. Write a targeted fix rather than a full restore (which would lose post-migration legitimate writes).
  5. Write a rollback migration if the schema change itself was the problem.

Scenario 4: Full Instance Failure

What happened: The Cloud SQL instance is unreachable – maybe a zone outage, maybe accidental instance deletion.

Recovery options:

If the instance still exists (zone outage):

Cloud SQL instances configured for high availability will automatically failover to a standby in another zone. If you don’t have HA enabled:

# Enable HA (requires instance restart)
gcloud sql instances patch my-instance --availability-type=REGIONAL

If the instance was deleted:

Deleted instances can be recovered within a limited window if deletion protection wasn’t bypassed:

# Enable deletion protection
gcloud sql instances patch my-instance --deletion-protection

If truly gone, restore from the most recent backup to a new instance:

gcloud sql instances create my-instance-restored \
--source-backup=BACKUP_ID \
--tier=db-custom-4-16384 \
--region=us-west1

Then update your application’s database connection string to point to the new instance.


Prevention Checklist

The best disaster recovery is the one you never need. Here’s what to set up before things go wrong:

Cloud SQL Configuration

# The production-ready configuration checklist
gcloud sql instances patch my-instance \
--backup-start-time=04:00 \
--enable-point-in-time-recovery \
--retained-transaction-log-days=7 \
--retained-backups-count=30 \
--deletion-protection \
--availability-type=REGIONAL

Operational Practices

1. Never run ad-hoc SQL directly against production

Use a read replica for investigative queries. If you must write, use a transaction with a manual ROLLBACK checkpoint:

BEGIN;

-- Your change here
UPDATE users SET status = 'inactive' WHERE last_login < '2025-01-01';

-- Verify before committing
SELECT count(*) FROM users WHERE status = 'inactive';

-- Only if the count looks right:
COMMIT;
-- Otherwise:
ROLLBACK;

2. Take on-demand backups before risky operations

gcloud sql backups create --instance=my-instance \
--description="pre-bulk-update-$(date +%Y%m%d-%H%M%S)"

3. Review AI-generated SQL before executing

AI tools are excellent at generating SQL, but they don’t understand your data invariants. A syntactically correct DROP TABLE or DELETE without a WHERE clause is still catastrophic. Always:

  • Read the generated SQL line by line
  • Run it on staging first
  • Wrap destructive operations in a transaction
  • Have a second pair of eyes for DDL changes

4. Maintain an analytics replica

Replicate critical tables to BigQuery or another analytics store. This serves as both an analytics platform and a recovery source. If your primary database loses data, you can query the replica for the gap window and re-ingest.

# Set up a BigQuery data transfer from Cloud SQL
bq mk --transfer_config \
--target_dataset=sql_replica \
--display_name="Production SQL Replica" \
--data_source=scheduled_query \
--schedule="every 1 hours"

5. Use IAM to restrict destructive operations

Not every developer needs cloudsql.instances.delete or direct SQL access to production:

# Create a read-only role for most developers
gcloud projects add-iam-policy-binding my-project \
--member="group:developers@company.com" \
--role="roles/cloudsql.viewer"
# Grant write access only to the ops team
gcloud projects add-iam-policy-binding my-project \
--member="group:database-ops@company.com" \
--role="roles/cloudsql.admin"

The Recovery Timeline: What Happens in Practice

Here’s what a real recovery typically looks like, end to end:

T+0min Incident detected (alerts fire, app errors spike)
T+5min Confirm the issue — is it a code bug or data loss?
T+10min Identify the last good backup / PITR target
T+15min Clone instance from backup (takes 5-30 min depending on size)
T+45min Verify restored data on the clone
T+60min Restore production from clone or selectively import tables
T+90min Identify the data gap (writes between backup and incident)
T+120min Query analytics replica / event logs for gap data
T+150min Re-ingest gap data, verify counts
T+180min Re-run interrupted jobs with verification
T+210min Final validation — all counts match, no duplicates, app healthy
T+240min Post-incident review

The total time depends on database size, gap complexity, and whether you had PITR enabled. With PITR, the gap is seconds. Without it, you could be looking at hours of manual data reconciliation.


Key Takeaways

  1. Enable PITR. It’s the difference between losing seconds of data and losing hours.
  2. Always clone to a recovery instance first. Never restore directly over production unless you have no other option.
  3. Maintain an analytics replica. It’s your insurance policy for the data gap.
  4. Quantify before you fix. Record counts before and after every recovery step. You can’t verify what you didn’t measure.
  5. Understand your jobs’ idempotency. If a background job was interrupted, knowing whether it’s safe to re-run is the difference between a smooth recovery and creating a bigger mess.
  6. Take on-demand backups before risky operations. The 30 seconds it takes could save you 4 hours of recovery.
  7. Review all SQL before execution. Especially AI-generated SQL. Trust, but verify.

Production incidents are stressful, but with the right configuration and a clear playbook, they don’t have to be catastrophic. Set up your backups today — future you will be grateful.

Happy fixing!


The Complete Guide to Rails Database Commands: From Basics to Production

Managing databases in Rails can seem overwhelming with all the available commands. This comprehensive guide will walk you through every essential Rails database command, from basic operations to complex real-world scenarios.

Basic Database Commands

Core Database Operations

# Create the database
rails db:create

# Drop (delete) the database
rails db:drop

# Run pending migrations
rails db:migrate

# Rollback the last migration
rails db:rollback

# Rollback multiple migrations
rails db:rollback STEP=3

Schema Management

# Load current schema into database
rails db:schema:load

# Dump current database structure to schema.rb
rails db:schema:dump

# Load structure from structure.sql (for complex databases)
rails db:structure:load

# Dump database structure to structure.sql
rails db:structure:dump

Seed Data

# Run the seed file (db/seeds.rb)
rails db:seed

Combined Commands: The Powerhouses

rails db:setup

What it does: Sets up database from scratch

rails db:setup

Equivalent to:

rails db:create
rails db:schema:load  # Loads from schema.rb
rails db:seed

When to use:

  • First time setting up project on new machine
  • Fresh development environment
  • CI/CD pipeline setup

rails db:reset

What it does: Nuclear option – completely rebuilds database

rails db:reset

Equivalent to:

rails db:drop
rails db:create
rails db:schema:load
rails db:seed

When to use:

  • Development when you want clean slate
  • After major schema changes
  • When your database is corrupted

⚠️ Warning: Destroys all data!

rails db:migrate:reset

What it does: Rebuilds database using migrations

rails db:migrate:reset

Equivalent to:

rails db:drop
rails db:create
rails db:migrate  # Runs all migrations from scratch

When to use:

  • Testing that migrations run cleanly
  • Debugging migration issues
  • Ensuring migration sequence works

Advanced Database Commands

Migration Management

# Rollback to specific migration
rails db:migrate:down VERSION=20240115123456

# Re-run specific migration
rails db:migrate:up VERSION=20240115123456

# Get current migration version
rails db:version

# Check migration status
rails db:migrate:status

Database Information

# Show database configuration
rails db:environment

# Validate database and pending migrations
rails db:abort_if_pending_migrations

# Check if database exists
rails db:check_protected_environments

Environment-Specific Commands

# Run commands on specific environment
rails db:create RAILS_ENV=production
rails db:migrate RAILS_ENV=staging
rails db:seed RAILS_ENV=test

Real-World Usage Scenarios

Scenario 1: New Developer Onboarding

# New developer joins the team
git clone project-repo
cd project
bundle install

# Set up database
rails db:setup

# Or if you prefer running migrations
rails db:create
rails db:migrate
rails db:seed

Scenario 2: Production Deployment

# Safe production deployment
rails db:migrate RAILS_ENV=production

# Never run these in production:
# rails db:reset        ❌ Will destroy data!
# rails db:schema:load  ❌ Will overwrite everything!

Scenario 3: Development Workflow

# Daily development cycle
git pull origin main
rails db:migrate          # Run any new migrations

# If you have conflicts or issues
rails db:rollback         # Undo last migration
# Fix migration file
rails db:migrate          # Re-run

# Major cleanup during development
rails db:reset           # Nuclear option

Scenario 4: Testing Environment

# Fast test database setup
rails db:schema:load RAILS_ENV=test

# Or use the test-specific command
rails db:test:prepare

Environment-Specific Best Practices

Development Environment

# Liberal use of reset commands
rails db:reset              # ✅ Safe to use
rails db:migrate:reset      # ✅ Safe to use
rails db:setup              # ✅ Safe for fresh start

Staging Environment

# Mirror production behavior
rails db:migrate RAILS_ENV=staging  # ✅ Recommended
rails db:seed RAILS_ENV=staging     # ✅ If needed

# Avoid
rails db:reset RAILS_ENV=staging    # ⚠️ Use with caution

Production Environment

# Only safe commands
rails db:migrate RAILS_ENV=production     # ✅ Safe
rails db:rollback RAILS_ENV=production    # ⚠️ With backup

# Never use in production
rails db:reset RAILS_ENV=production       # ❌ NEVER!
rails db:drop RAILS_ENV=production        # ❌ NEVER!
rails db:schema:load RAILS_ENV=production # ❌ NEVER!

Pro Tips and Gotchas

Migration vs Schema Loading

# For existing databases with data
rails db:migrate          # ✅ Incremental, safe

# For fresh databases
rails db:schema:load      # ✅ Faster, clean slate

Data vs Schema

Remember that some operations preserve data differently:

  • db:migrate: Preserves existing data, applies incremental changes
  • db:schema:load: Loads clean schema, no existing data
  • db:reset: Destroys everything, starts fresh

Common Workflow Commands

# The "fix everything" development combo
rails db:reset && rails db:migrate

# The "fresh start" combo  
rails db:drop db:create db:migrate db:seed

# The "production-safe" combo
rails db:migrate db:seed

Quick Reference Cheat Sheet

CommandUse CaseData SafetySpeed
db:migrateIncremental updates✅ SafeMedium
db:setupInitial setup✅ Safe (new DB)Fast
db:resetClean slate❌ Destroys allFast
db:migrate:resetTest migrations❌ Destroys allSlow
db:schema:loadFresh schema❌ No data migrationFast
db:seedAdd sample data✅ AdditiveFast

Conclusion

Understanding Rails database commands is crucial for efficient development and safe production deployments. Start with the basics (db:create, db:migrate, db:seed), get comfortable with the combined commands (db:setup, db:reset), and always remember the golden rule: be very careful with production databases!

The key is knowing when to use each command:

  • Development: Feel free to experiment with db:reset and friends
  • Production: Stick to db:migrate and always have backups
  • Team collaboration: Use migrations to keep everyone in sync

Remember: migrations tell the story of how your database evolved, while schema files show where you ended up. Both are important, and now you know how to use all the tools Rails gives you to manage them effectively.


Rails 8 App: Create an Academic software app using SQL without using ActiveRecord- Part 3

In this episode we move on from creating the tables with constraints, foreign keys, proper indexes, enums, reversing each migrations, seeded data etc. Now let’s check our seeded data with some ui tool that help us smell and modify our db data.

Setup a UI tool for analysing SQL and data

W’re using PostgreSQL. Here are the best SQL GUI tools for pure SQL data analysis and manipulation on macOS:

🏆 Top Recommendations for SQL Analysis

1. TablePlus ⭐️ (Highly Recommended)

  • Best for: Fast SQL queries, data filtering, before/after comparisons
  • Strengths:
    Lightning-fast query execution
    Excellent data filtering UI with SQL query builder
    Beautiful native macOS interface
    Export to CSV/JSON/SQL
    Query history and favorites
    Cost: Free tier (2 tabs), Pro $89
    Perfect for: Your use case of checking seeded data

2. Postico (macOS Native)

  • Best for: PostgreSQL-specific features and analysis
  • Strengths:
    Built specifically for PostgreSQL
    Excellent for large dataset analysis
    Advanced filtering and sorting
    Beautiful data visualization
    Cost: $49 (one-time)
    PostgreSQL-optimized: Uses all PostgreSQL features

3. pgAdmin (Free, Official)

  • Best for: Advanced PostgreSQL administration and complex queries
  • Strengths:
    Official PostgreSQL tool
    Advanced query planner visualization
    Excellent for performance analysis
    Complete database management
    Cost: Free
    Learning curve: Steeper but very powerful

4. DBeaver (Free, Cross-platform)

  • Best for: Advanced SQL analysis and scripting
  • Strengths:
    Powerful SQL editor with autocomplete
    Data export in multiple formats
    Query execution plans
    Visual query builder
    Cost: Free (Community), Pro $10/month
    Great for: Complex data analysis workflows

Quick Setup Commands

For TablePlus (easiest to get started):

# Install via Homebrew
brew install --cask tableplus

For pgAdmin:

# Install via Homebrew
brew install --cask pgadmin4

For Postico:

# Install via Homebrew
brew install --cask postico

🔧 Connection Details You’ll Need

Your PostgreSQL connection details:

  • Host: localhost (default)
  • Port: 5432 (default)
  • Database: academic_sql_software_development
  • Username: Your macOS username (default)
  • Password: None (default for local development)

💡 Pro Tips for Data Analysis

Once connected, you can:

  1. Check seeded data:
   SELECT COUNT(*) FROM users;
   SELECT COUNT(*) FROM orders;
   SELECT COUNT(*) FROM products;
  1. Analyze relationships:
   SELECT 
     u.first_name, u.last_name, 
     COUNT(o.id) as order_count
   FROM users u 
   LEFT JOIN orders o ON u.id = o.user_id 
   GROUP BY u.id, u.first_name, u.last_name
   ORDER BY order_count DESC;
  1. Filter and export specific datasets for before/after comparisons

My Recommendation: Start with TablePlus – it’s the most intuitive for our workflow of checking and filtering seeded data, with excellent performance for the data volumes we’re working with (10k users, 5k orders, etc.).

Let’s Go with TablePlus 🥳

🚀 Step 1: Install TablePlus

brew install --cask tableplus

📊 Step 2: Check Our Database Schema

 We have a great setup for learning SQL with realistic relationships. Let’s create a progressive SQL learning path using our actual data.

🔗 Step 3: Connect to Your Database

TablePlus Connection Details:

  • Host: localhost
  • Port: 5432
  • Database: academic_sql_software_development
  • User: (your macOS username)
  • Password: (leave blank)

📚 SQL Learning Path: Basic to Advanced

Change Font size, colour, theme etc:

Level 1: Basic SELECT Queries

-- 1. View all users
SELECT * FROM users LIMIT 10;

-- 2. Count total records
SELECT COUNT(*) FROM users;
SELECT COUNT(*) FROM orders;
SELECT COUNT(*) FROM products;

-- 3. Filter data
SELECT first_name, last_name, email 
FROM users 
WHERE gender = 'female' 
LIMIT 10;

-- 4. Sort data
SELECT first_name, last_name, date_of_birth 
FROM users 
ORDER BY date_of_birth DESC 
LIMIT 10;

-- 5. Filter with conditions
SELECT title, price, category 
FROM products 
WHERE price > 50 AND category = 'men' 
ORDER BY price DESC;

Level 2: Basic Aggregations

-- 1. Count by category
SELECT category, COUNT(*) as product_count 
FROM products 
GROUP BY category;

-- 2. Average prices by category
SELECT category, 
       AVG(price) as avg_price,
       MIN(price) as min_price,
       MAX(price) as max_price
FROM products 
GROUP BY category;

-- 3. Users by gender
SELECT gender, COUNT(*) as user_count 
FROM users 
WHERE gender IS NOT NULL
GROUP BY gender;

-- 4. Products with low stock
SELECT COUNT(*) as low_stock_products 
FROM products 
WHERE stock_quantity < 10;

Level 3: Inner Joins

-- 1. Users with their orders
SELECT u.first_name, u.last_name, u.email, o.id as order_id, o.created_at
FROM users u
INNER JOIN orders o ON u.id = o.user_id
ORDER BY o.created_at DESC
LIMIT 20;

-- 2. Orders with product details
SELECT o.id as order_id, 
       p.title as product_name, 
       p.price, 
       p.category,
       o.created_at
FROM orders o
INNER JOIN products p ON o.product_id = p.id
ORDER BY o.created_at DESC
LIMIT 20;

-- 3. Complete order information (3-table join)
SELECT u.first_name, u.last_name,
       p.title as product_name,
       p.price,
       p.category,
       o.created_at as order_date
FROM orders o
INNER JOIN users u ON o.user_id = u.id
INNER JOIN products p ON o.product_id = p.id
ORDER BY o.created_at DESC
LIMIT 20;

Level 4: Left Joins (Show Missing Data)

-- 1. All users and their order count (including users with no orders)
SELECT u.first_name, u.last_name, u.email,
       COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.first_name, u.last_name, u.email
ORDER BY order_count DESC;

-- 2. Users who haven't placed any orders
SELECT u.first_name, u.last_name, u.email, u.created_at
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL
ORDER BY u.created_at DESC;

-- 3. Products that have never been ordered
SELECT p.title, p.price, p.category, p.stock_quantity
FROM products p
LEFT JOIN orders o ON p.id = o.product_id
WHERE o.id IS NULL
ORDER BY p.price DESC;

Level 5: Advanced Aggregations & Grouping

-- 1. Top customers by order count
SELECT u.first_name, u.last_name,
       COUNT(o.id) as total_orders,
       SUM(p.price) as total_spent
FROM users u
INNER JOIN orders o ON u.id = o.user_id
INNER JOIN products p ON o.product_id = p.id
GROUP BY u.id, u.first_name, u.last_name
HAVING COUNT(o.id) > 1
ORDER BY total_spent DESC
LIMIT 10;

-- 2. Most popular products
SELECT p.title, p.category, p.price,
       COUNT(o.id) as times_ordered,
       SUM(p.price) as total_revenue
FROM products p
INNER JOIN orders o ON p.id = o.product_id
GROUP BY p.id, p.title, p.category, p.price
ORDER BY times_ordered DESC
LIMIT 10;

-- 3. Monthly order analysis
SELECT DATE_TRUNC('month', o.created_at) as month,
       COUNT(o.id) as order_count,
       COUNT(DISTINCT o.user_id) as unique_customers,
       SUM(p.price) as total_revenue
FROM orders o
INNER JOIN products p ON o.product_id = p.id
GROUP BY DATE_TRUNC('month', o.created_at)
ORDER BY month;

Level 6: Student Enrollment Analysis (Complex Joins)

-- 1. Students with their course and school info
SELECT u.first_name, u.last_name,
       c.title as course_name,
       s.title as school_name,
       st.enrolment_date
FROM students st
INNER JOIN users u ON st.user_id = u.id
INNER JOIN courses c ON st.course_id = c.id
INNER JOIN schools s ON st.school_id = s.id
ORDER BY st.enrolment_date DESC
LIMIT 20;

-- 2. Course popularity by school
SELECT s.title as school_name,
       c.title as course_name,
       COUNT(st.id) as student_count
FROM students st
INNER JOIN courses c ON st.course_id = c.id
INNER JOIN schools s ON st.school_id = s.id
GROUP BY s.id, s.title, c.id, c.title
ORDER BY student_count DESC;

-- 3. Schools with enrollment stats
SELECT s.title as school_name,
       COUNT(st.id) as total_students,
       COUNT(DISTINCT st.course_id) as courses_offered,
       MIN(st.enrolment_date) as first_enrollment,
       MAX(st.enrolment_date) as latest_enrollment
FROM schools s
LEFT JOIN students st ON s.id = st.school_id
GROUP BY s.id, s.title
ORDER BY total_students DESC;

Level 7: Advanced Concepts

-- 1. Subqueries: Users who spent more than average
WITH user_spending AS (
  SELECT u.id, u.first_name, u.last_name,
         SUM(p.price) as total_spent
  FROM users u
  INNER JOIN orders o ON u.id = o.user_id
  INNER JOIN products p ON o.product_id = p.id
  GROUP BY u.id, u.first_name, u.last_name
)
SELECT first_name, last_name, total_spent
FROM user_spending
WHERE total_spent > (SELECT AVG(total_spent) FROM user_spending)
ORDER BY total_spent DESC;

-- 2. Window functions: Ranking customers
SELECT u.first_name, u.last_name,
       COUNT(o.id) as order_count,
       SUM(p.price) as total_spent,
       RANK() OVER (ORDER BY SUM(p.price) DESC) as spending_rank
FROM users u
INNER JOIN orders o ON u.id = o.user_id
INNER JOIN products p ON o.product_id = p.id
GROUP BY u.id, u.first_name, u.last_name
ORDER BY spending_rank
LIMIT 20;

-- 3. Case statements for categorization
SELECT u.first_name, u.last_name,
       COUNT(o.id) as order_count,
       CASE 
         WHEN COUNT(o.id) >= 5 THEN 'VIP Customer'
         WHEN COUNT(o.id) >= 2 THEN 'Regular Customer'
         ELSE 'New Customer'
       END as customer_type
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.first_name, u.last_name
ORDER BY order_count DESC;

Level 8: Self-Joins & Advanced Analysis

-- 1. Find users enrolled in the same course (pseudo self-join)
SELECT DISTINCT 
       u1.first_name || ' ' || u1.last_name as student1,
       u2.first_name || ' ' || u2.last_name as student2,
       c.title as course_name
FROM students s1
INNER JOIN students s2 ON s1.course_id = s2.course_id AND s1.user_id < s2.user_id
INNER JOIN users u1 ON s1.user_id = u1.id
INNER JOIN users u2 ON s2.user_id = u2.id
INNER JOIN courses c ON s1.course_id = c.id
ORDER BY c.title, student1
LIMIT 20;

-- 2. Complex business question: Multi-role users
SELECT u.first_name, u.last_name, u.email,
       COUNT(DISTINCT o.id) as orders_placed,
       COUNT(DISTINCT st.id) as courses_enrolled,
       CASE 
         WHEN COUNT(DISTINCT o.id) > 0 AND COUNT(DISTINCT st.id) > 0 THEN 'Customer & Student'
         WHEN COUNT(DISTINCT o.id) > 0 THEN 'Customer Only'
         WHEN COUNT(DISTINCT st.id) > 0 THEN 'Student Only'
         ELSE 'No Activity'
       END as user_type
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
LEFT JOIN students st ON u.id = st.user_id
GROUP BY u.id, u.first_name, u.last_name, u.email
ORDER BY orders_placed DESC, courses_enrolled DESC;

🎯 Our Learning Strategy:

  1. Start with Level 1-2 in TablePlus to get comfortable
  2. Progress through each level – try to understand each query before moving on
  3. Modify the queries – change filters, add fields, etc.
  4. Create your own variations based on business questions

to be continued … 🚀

Rails 8 App: Create an Academic software app using SQL without using ActiveRecord – Part 1 | users | products | orders

Let’s create a Rails 8 app which use SQL queries with raw SQL instead of ActiveRecord. Let’s use the full Rails environment with ActiveRecord for infrastructure, but bypass AR’s ORM features for pure SQL writing. Let me guide you through this step by step:

Step 1: Create the Rails App with ActiveRecord and PostgreSQL (skipping unnecessary components)

rails new academic-sql-software --database=postgresql --skip-action-cable --skip-jbuilder --skip-solid --skip-kamal

What we’re skipping and why:

  • –skip-action-cable: No WebSocket functionality needed
  • –skip-jbuilder: No JSON API views needed for our SQL practice app
  • –skip-solid: Skips Solid Cache and Solid Queue (we don’t need caching or background jobs)
  • –skip-kamal: No deployment configuration needed

What we’re keeping:

  • ActiveRecord: For database connection management and ActiveRecord::Base.connection.execute()
  • ActionController: For creating web interfaces to display our SQL query results
  • ActionView: For creating simple HTML pages to showcase our SQL learning exercises
  • PostgreSQL: Our database for practicing advanced SQL features

Why this setup is perfect for App with raw SQL:

  • Minimal Rails app focused on database interactions
  • Full Rails environment for development conveniences
  • ActiveRecord infrastructure without ORM usage
  • Clean setup without unnecessary overhead

=> Open config/application.rb and comment the following for now:

# require "active_job/railtie"
...
# require "active_storage/engine"
...
# require "action_mailer/railtie"
# require "action_mailbox/engine"
...
# require "action_cable/engine"

=> Open config/environments/development.rb config/environments/production.rb config/environments/test.rb comment action_mailer

🤔 Why I am using ActiveRecord (even though I don’t want the ORM):

  • Database Connection Management: ActiveRecord provides robust connection pooling, reconnection handling, and connection management
  • Rails Integration: Seamless integration with Rails console, database tasks (rails db:create, rails db:migrate), and development tools
  • Raw SQL Execution: We get ActiveRecord::Base.connection.execute() which is perfect for our raw SQL writing.
  • Migration System: Easy table creation and schema management with migrations (even though we’ll query with raw SQL)
  • Database Configuration: Rails handles database.yml configuration, environment switching, and connection setup
  • Development Tools: Access to Rails console for testing queries, database tasks, and debugging

Our Learning Strategy: We’ll use ActiveRecord’s infrastructure but completely bypass its ORM methods. Instead of Student.where(), we’ll use ActiveRecord::Base.connection.execute("SELECT * FROM students WHERE...")

Step 2: Navigate to the project directory

cd academic-sql-software

Step 3: Verify PostgreSQL setup

# Check if PostgreSQL is running
brew services list | grep postgresql
# or
pg_ctl status

Database Foundation: PostgreSQL gives us advanced SQL features:

  • Complex JOINs (INNER, LEFT, RIGHT, FULL OUTER)
  • Window functions (ROW_NUMBER, RANK, LAG, LEAD)
  • Common Table Expressions (CTEs)
  • Advanced aggregations and subqueries

Step 4: Install dependencies

bundle install

What this gives us:

  • pg gem: Pure PostgreSQL adapter (already included with --database=postgresql)
  • ActiveRecord: For connection management only
  • Rails infrastructure: Console, generators, rake tasks

Step 5: Create the PostgreSQL databases

✗ rails db:create
Created database 'academic_sql_software_development'
Created database 'academic_sql_software_test

Our Development Environment:

  • Creates academic_sql_software_development and academic_sql_software_test
  • Sets up connection pooling and management
  • Enables us to use Rails console for testing queries: rails console then ActiveRecord::Base.connection.execute("SELECT 1")

Our Raw SQL Approach:

# We'll use this pattern throughout our app:
connection = ActiveRecord::Base.connection
result = connection.execute("SELECT s.name, t.subject FROM students s INNER JOIN teachers t ON s.teacher_id = t.id")

Why not pure pg gem:

  • Would require manual connection management
  • No Rails integration (no console, no rake tasks)
  • More boilerplate code for connection handling
  • Loss of Rails development conveniences

Why not pure ActiveRecord ORM:

  • We want to do SQL query writing, not ActiveRecord methods.
  • Need to understand database performance implications.
  • Want to practice complex queries that might be harder to express in ActiveRecord.

Step 6: Create Users table

mkdir -p db/migrate
class CreateUsers < ActiveRecord::Migration[8.0]
  def up
    # create users table
    execute <<~SQL
      CREATE TABLE users (
        id INT,
        username VARCHAR(200),
        email VARCHAR(150),
        phone_number VARCHAR(20)
      );
    SQL
  end

  def down
    execute <<~SQL
      DROP TABLE users;
    SQL
  end
end

class CreateOrders < ActiveRecord::Migration[8.0]
  def up
    # create table orders
    execute <<~SQL
    SQL
  end

  def down
    execute <<~SQL
    SQL
  end
end

execute <<~SQL is a Rails migration method that allows you to run raw SQL statements. Let me break it down:

Components:

  1. execute – A Rails migration method that executes raw SQL directly against the database
  2. <<~SQL – Ruby’s “squiggly heredoc” syntax for multi-line strings that automatically strips leading whitespace (read: https://www.rubyguides.com/2018/11/ruby-heredoc/)

Usage:

class SomeMigration < ActiveRecord::Migration[8.0]
  def change
    execute <<~SQL
      CREATE INDEX CONCURRENTLY idx_users_email_lower 
      ON users (LOWER(email));
    SQL
  end
end

Why use it?

  • Database-specific features: When you need PostgreSQL-specific syntax, MySQL features, etc.
  • Complex SQL: For operations that Rails DSL doesn’t support easily
  • Performance: Direct SQL can sometimes be more efficient
  • Raw SQL requirements: When you specifically need to write SQL (as mentioned in our requirement)

Example for creating a users table with raw SQL:

class CreateUsers < ActiveRecord::Migration[8.0]
  def up
    execute <<~SQL
      CREATE TABLE users (
        id BIGSERIAL PRIMARY KEY,
        name VARCHAR(255) NOT NULL,
        email VARCHAR(255) NOT NULL,
        created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
        updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
      );

      CREATE UNIQUE INDEX idx_users_email ON users(email);
    SQL
  end

  def down
    execute <<~SQL
      DROP TABLE IF EXISTS users;
    SQL
  end
end

The <<~SQL heredoc ends when it encounters the closing SQL keyword, and the ~ strips indentation so your SQL can be properly formatted within the Ruby code.

Q) 🤔 Is it a best practise to repeat the database constraints in model validation in Rails? DB level vs App level? or both?

Best Practice: Use BOTH 🎯

Why both levels are recommended:

Database Level (Last Line of Defense)

-- Database constraints
price DECIMAL(10,2) NOT NULL CHECK (price > 0),
category product_category NOT NULL,
title VARCHAR(250) NOT NULL

Pros:

  • Data integrity guarantee – No bad data can enter, ever
  • Performance – Database enforces constraints efficiently
  • Bulletproof – Protects against bulk operations, direct SQL, data imports
  • Concurrency safe – Prevents race conditions
  • Multi-application protection – If other apps access same DB

Application Level (User Experience)

validates :price, presence: true, numericality: { greater_than: 0 }
validates :category, inclusion: { in: %w[men women kids infants] }

Pros:

  • User-friendly errors – “Price must be greater than 0” vs “CHECK constraint violated”
  • Better UX – Validation before form submission
  • Easier testing – Can unit test validations
  • Conditional logic – Complex business rules
  • Framework features – Callbacks, custom validators

Real-world scenarios where each matters:

Database saves you when:

# Bulk operations bypass Rails validations
Product.update_all(price: -10)  # DB constraint prevents this

# Direct SQL injection attempts
# DB constraints are your last line of defense

App validations save you when:

# User gets friendly error instead of:
# PG::CheckViolation: ERROR: new row violates check constraint
@product = Product.new(price: -5)
@product.valid? # => false
@product.errors.full_messages # => ["Price must be greater than 0"]

Practical Implementation:

class Product < ApplicationRecord
  # App-level validations for UX
  validates :title, presence: true, length: { maximum: 250 }
  validates :price, presence: true, numericality: { greater_than: 0 }
  validates :category, inclusion: { in: %w[men women kids infants] }

  # Don't duplicate precision validation if DB handles it
  # The DECIMAL(10,2) constraint is sufficient at DB level
end
-- DB-level constraints for data integrity
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  title VARCHAR(250) NOT NULL,
  price DECIMAL(10,2) NOT NULL CHECK (price > 0),
  category product_category NOT NULL,
  -- DB handles precision automatically with DECIMAL(10,2)
);

What NOT to duplicate:

  • Precision constraintsDECIMAL(10,2) handles this perfectly
  • Data type validation – DB enforces INTEGER, BOOLEAN, etc.
  • Complex regex patterns – Better handled in app layer

Conclusion:

Use both, but strategically:

  • Database: Core data integrity, type constraints, foreign keys
  • Application: User experience, business logic, conditional rules
  • Don’t over-duplicate simple type/precision constraints that DB handles well

This approach gives you belt and suspenders protection with optimal user experience.

to be continued … 🚀

Rails 8 App: Setup Test DB in PostgreSQL | Write SQL Queries | Operators | Joins

Here’s a list of commonly used SQL comparison operators with brief explanations and examples:

📋 Basic Comparison Operators:

OperatorMeaningExampleResult
=Equal toWHERE age = 25Matches rows where age is 25
<>Not equal to (standard)WHERE status <> 'active'Matches rows where status is not 'active'
!=Not equal to (alternative)WHERE id != 10Same as <>, matches if id is not 10
>Greater thanWHERE salary > 50000Matches rows with salary above 50k
<Less thanWHERE created_at < '2024-01-01'Matches dates before Jan 1, 2024
>=Greater than or equalWHERE age >= 18Matches age 18 and above
<=Less than or equalWHERE age <= 65Matches age 65 and below

📋 Other Common Operators:

OperatorMeaningExample
BETWEENWithin a rangeWHERE price BETWEEN 100 AND 200
INMatch any value in a listWHERE country IN ('US', 'CA', 'UK')
NOT INNot in a listWHERE role NOT IN ('admin', 'staff')
IS NULLValue is nullWHERE deleted_at IS NULL
IS NOT NULLValue is not nullWHERE updated_at IS NOT NULL
LIKEPattern match (case-insensitive in some DBs)WHERE name LIKE 'J%'
ILIKECase-insensitive LIKE (PostgreSQL only)WHERE email ILIKE '%@gmail.com'

Now we’ve our products and product_variants schema, let’s re-explore all major SQL JOINs using these two related tables.

####### Products

   Column    |              Type              | Collation | Nullable |               Default
-------------+--------------------------------+-----------+----------+--------------------------------------
 id          | bigint                         |           | not null | nextval('products_id_seq'::regclass)
 description | text                           |           |          |
 category    | character varying              |           |          |
 created_at  | timestamp(6) without time zone |           | not null |
 updated_at  | timestamp(6) without time zone |           | not null |
 name        | character varying              |           | not null |
 rating      | numeric(2,1)                   |           |          | 0.0
 brand       | character varying              |           |          |

######## Product variants

      Column      |              Type              | Collation | Nullable |                   Default
------------------+--------------------------------+-----------+----------+----------------------------------------------
 id               | bigint                         |           | not null | nextval('product_variants_id_seq'::regclass)
 product_id       | bigint                         |           | not null |
 sku              | character varying              |           | not null |
 mrp              | numeric(10,2)                  |           | not null |
 price            | numeric(10,2)                  |           | not null |
 discount_percent | numeric(5,2)                   |           |          |
 size             | character varying              |           |          |
 color            | character varying              |           |          |
 stock_quantity   | integer                        |           |          | 0
 specs            | jsonb                          |           | not null | '{}'::jsonb
 created_at       | timestamp(6) without time zone |           | not null |
 updated_at       | timestamp(6) without time zone |           | not null |

💎 SQL JOINS with products and product_variants

These tables are related through:

product_variants.product_id → products.id

So we can use that for all join examples.

🔸 1. INNER JOIN – Show only products with variants
SELECT 
  p.name, 
  pv.sku, 
  pv.price 
FROM products p
INNER JOIN product_variants pv ON p.id = pv.product_id;

♦️ Only returns products that have at least one variant.

🔸 2. LEFT JOIN – Show all products, with variants if available
SELECT 
  p.name, 
  pv.sku, 
  pv.price 
FROM products p
LEFT JOIN product_variants pv ON p.id = pv.product_id;

♦️ Returns all products, even those with no variants (NULLs in variant columns).

🔸 3. RIGHT JOIN – Show all variants, with product info if available

(Less common, but useful if variants might exist without a product record)

SELECT 
  pv.sku, 
  pv.price, 
  p.name 
FROM products p
RIGHT JOIN product_variants pv ON p.id = pv.product_id;

🔸 4. FULL OUTER JOIN – All records from both tables
SELECT 
  p.name AS product_name, 
  pv.sku AS variant_sku 
FROM products p
FULL OUTER JOIN product_variants pv ON p.id = pv.product_id;

♦️ Shows all products and all variants, even when there’s no match.

🔸 5. SELF JOIN Example (for product_variants comparing similar sizes or prices)

Let’s compare variants of the same product that are different sizes.

SELECT 
  pv1.product_id,
  pv1.size AS size_1,
  pv2.size AS size_2,
  pv1.sku AS sku_1,
  pv2.sku AS sku_2
FROM product_variants pv1
JOIN product_variants pv2 
  ON pv1.product_id = pv2.product_id 
  AND pv1.size <> pv2.size
WHERE pv1.product_id = 101;  -- example product

♦️ Useful to analyze size comparisons or price differences within a product.

🧬 Complex Combined JOIN Example

Show each product with its variants, and include only discounted ones (price < MRP):

SELECT 
  p.name AS product_name,
  pv.sku,
  pv.price,
  pv.mrp,
  (pv.mrp - pv.price) AS discount_value
FROM products p
INNER JOIN product_variants pv ON p.id = pv.product_id
WHERE pv.price < pv.mrp
ORDER BY discount_value DESC;

📑 JOIN Summary with These Tables

JOIN TypeUse Case
INNER JOINOnly products with variants
LEFT JOINAll products, even if they don’t have variants
RIGHT JOINAll variants, even if product is missing
FULL OUTER JOINEverything — useful in data audits
SELF JOINCompare or relate rows within the same table

Let’s now look at JOIN queries with more realistic conditions using products and product_variants.

🦾 Advanced JOIN Queries with Conditions to practice

🔹 1. All products with variants in stock AND discounted

SELECT 
  p.name AS product_name,
  pv.sku,
  pv.size,
  pv.color,
  pv.stock_quantity,
  pv.mrp,
  pv.price,
  (pv.mrp - pv.price) AS discount_amount
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE pv.stock_quantity > 0
  AND pv.price < pv.mrp
ORDER BY discount_amount DESC;

♦️ Shows available discounted variants, ordered by discount.

🔹 2. Products with high rating (4.5+) and at least one low-stock variant (< 10 items)

SELECT 
  p.name AS product_name,
  p.rating,
  pv.sku,
  pv.stock_quantity
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE p.rating >= 4.5
  AND pv.stock_quantity < 10;

🔹 3. LEFT JOIN to find products with no variants or all variants out of stock

SELECT 
  p.name AS product_name,
  pv.id AS variant_id,
  pv.stock_quantity
FROM products p
LEFT JOIN product_variants pv 
  ON p.id = pv.product_id AND pv.stock_quantity > 0
WHERE pv.id IS NULL;

✅ This tells you:

  • Either the product has no variants
  • Or all variants are out of stock

🔹 4. Group and Count Variants per Product

SELECT 
  p.name AS product_name,
  COUNT(pv.id) AS variant_count
FROM products p
LEFT JOIN product_variants pv ON p.id = pv.product_id
GROUP BY p.name
ORDER BY variant_count DESC;

🔹 5. Variants with price-percentage discount more than 30%

SELECT 
  p.name AS product_name,
  pv.sku,
  pv.mrp,
  pv.price,
  ROUND(100.0 * (pv.mrp - pv.price) / pv.mrp, 2) AS discount_percent
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE pv.price < pv.mrp
  AND (100.0 * (pv.mrp - pv.price) / pv.mrp) > 30;

🔹 6. Color-wise stock summary for a product category

SELECT 
  p.category,
  pv.color,
  SUM(pv.stock_quantity) AS total_stock
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE p.category = 'Shoes'
GROUP BY p.category, pv.color
ORDER BY total_stock DESC;

These queries simulate real-world dashboard views: inventory tracking, product health, stock alerts, etc.


Happy SQL Query Writing! 🚀

Rails 8 App: Setup Test DB in PostgreSQL | Write SQL Queries

1. Add the test db and users table: https://railsdrop.com/2025/04/25/rails-8-app-postgresql-heap-vs-mysql-innodb-b-tree-indexing/

2. Add fake data into the table: https://railsdrop.com/2025/04/29/rails-8-app-postgresql-faker-extensions-for-rails/

Start Learn (Premium): https://railsdrop.com/sql-postgresql-queries-bitmap-seq-index-scan-db-clustering/

📌 Summary of all queries

Check: https://railsdrop.com/postgresql-queries-a-summary/


Read – Composite vs Individual indexes (Premium): https://railsdrop.com/sql-postgres-understanding-current-composite-index/

Read Create 1 million sample users: https://railsdrop.com/sql-postgresql-create-1-million-sample-users-data/

👉 1. SELECT – Basic Query

🔹 1. Select all columns:

SELECT * FROM users;

This gives you every row and every column in the users table.

🔹 2. Select specific columns:

SELECT username, email FROM users;

This limits the output to only username and email.

👉 2. ALTER – Modify Table Structure

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.

🔹 Example 1: Add a new column

Let’s add a column created_at of type timestamp:

ALTER TABLE users 
  ADD COLUMN created_at timestamp;

🔹 Example 2: Rename a column

Let’s rename phone_number to mobile:

ALTER TABLE users
  RENAME COLUMN phone_number TO mobile;
🔹 Example 3: Drop a column

Let’s say you want to remove the created_at column:

ALTER TABLE users
  DROP COLUMN created_at;
🔹 4. Modify specific columns:
UPDATE users
  SET mobile = '123456'
  WHERE mobile IS NULL;
  • Use UPDATE instead of ALTER when modifying data in a table.
  • ALTER is used for changing the structure/schema of a table (e.g., adding columns), not for updating data.

👉 3. DISTINCT – Remove Duplicate Values

The DISTINCT keyword is used to return only unique (non-duplicate) values in a result set.

🔹 Example 1: Distinct usernames
SELECT DISTINCT username FROM users;

This returns a list of unique usernames, removing any duplicates.

🔹 Example 2: Distinct combinations of username and email
SELECT DISTINCT username, email FROM users;
SELECT DISTINCT username from users WHERE username like '%quin%';
EXPLAIN ANALYSE SELECT DISTINCT username from users WHERE username like '%quin%';

This checks for uniqueness based on both username and email combined.


👉 4. WHERE – Filter Records + Major Combine Types (AND, OR, NOT)

The WHERE clause is used to filter records that meet a certain condition.

Let’s look at basic and combined conditions using our users table.

🔹 Example 1: Simple WHERE
SELECT * FROM users WHERE username = 'john_doe';

🔹 Example 2: AND – Combine multiple conditions (all must be true)
SELECT * FROM users 
WHERE username = 'quinton' AND email LIKE '%@gmail.com';

🔹 Example 3: OR – At least one condition must be true
SELECT * FROM users 
WHERE username = 'quinton' OR username = 'joaquin_hand';

🔹 Example 4: NOT – Negate a condition
SELECT * FROM users 
WHERE NOT email LIKE '%@example.com';

🔹 Example 5: Combine AND, OR, NOT (use parentheses!)
SELECT * FROM users 
WHERE (email like '%example%' OR email like '%test%') 
  AND NOT username = 'admin';


👉 5. ORDER BY – Sort the Results

And we’ll also look at combined queries afterward.

🔹 Example 1: Order by a single column (ascending)
SELECT * FROM users 
ORDER BY username;

🔹 Example 2: Order by a column (descending)
SELECT * FROM users 
ORDER BY email DESC;

🔹 Example 3: Order by multiple columns
SELECT * FROM users 
ORDER BY username ASC, email DESC;


👉 6. Combined Queries (UNION, INTERSECT, EXCEPT)

✅ These allow you to combine results from multiple SELECT statements.

⚠ Requirements:

  • Each query must return the same number of columns.
  • Data types must be compatible.
🔹 UNION – Combine results and remove duplicates
SELECT username FROM users WHERE email LIKE '%@example.com'
UNION
SELECT username FROM users WHERE username LIKE 'ton%';

🔹 UNION ALL – Combine results and keep duplicates
SELECT username FROM users WHERE email LIKE '%@gmail.com'
UNION ALL
SELECT username FROM users WHERE username LIKE 'test%';

🔹 INTERSECT – Return only common results
SELECT username FROM users 
  WHERE email LIKE '%@gmail.com'
INTERSECT
SELECT username FROM users 
  WHERE username LIKE 'test%';

SELECT username FROM users
  WHERE (email like '%example' OR email like '%test')
INTERSECT
SELECT username FROM users
  WHERE username like 'adam';
🔹 EXCEPT – Return results from the first query that are not in the second
SELECT username FROM users 
  WHERE email LIKE '%@example'
EXCEPT
SELECT username FROM users 
  WHERE (username like '%ada%' OR username like '%merlin%');


👉 7. IS NULL and IS NOT NULL – Handling Missing Data

These are used to check if a column contains a NULL value (i.e., no value).

🔹 Example 1: Users with a missing/have an email
# Find users with a missing email
SELECT * FROM users 
WHERE email IS NULL;

# Find 
SELECT * FROM users 
WHERE email IS NOT NULL;

🔹 Example 2: Users with no email and no mobile
SELECT * FROM users 
WHERE email IS NULL AND phone_number IS NULL;

🔹 Example 3: Users with either email or mobile missing
SELECT * FROM users 
WHERE email IS NULL OR phone_number IS NULL;

🔹 Example 4: Users who have an email and username starts with ‘adam’
SELECT * FROM users 
WHERE email IS NOT NULL AND username LIKE 'adam%';

🔹 Example 5: Users with email missing but username is not empty
SELECT * FROM users 
WHERE email IS NULL AND username IS NOT NULL;

🔹 Example 6: Users where email or mobile is null, but not both (exclusive or)
SELECT * FROM users 
WHERE (email IS NULL AND mobile IS NOT NULL)
   OR (email IS NOT NULL AND mobile IS NULL);


👉 8. LIMIT, SELECT TOP, SELECT TOP PERCENT (PostgreSQL-style)

In PostgreSQL, we use LIMIT instead of SELECT TOP.
(PostgreSQL doesn’t support TOP directly like SQL Server.)

🔹 Example 1: Limit number of results (first 10 rows)
SELECT * FROM users 
LIMIT 10;

🔹 Example 2: Combined with ORDER BY (top 5 newest usernames)
SELECT username FROM users 
  WHERE username IS NOT NULL
ORDER BY id DESC
LIMIT 5;

🔹 Example 3: Paginate (e.g., 11th to 20th row)
SELECT * FROM users 
ORDER BY id 
OFFSET 10 LIMIT 10;

🔔 Simulating SELECT TOP and SELECT TOP PERCENT in PostgreSQL

🔹 Example 4: Simulate SELECT TOP 1
SELECT * FROM users 
ORDER BY id 
LIMIT 1;

🔹 Example 5: Simulate SELECT TOP 10 PERCENT

To get the top 10% of users by id, you can use a subquery:

SELECT * FROM users
ORDER BY id
LIMIT (SELECT CEIL(COUNT(*) * 0.10) FROM users);

🔹 Example 6: Users with Gmail or Yahoo emails, ordered by ID, limit 5
SELECT id, username, email FROM users
WHERE email LIKE '%@gmail.com' OR email LIKE '%@yahoo.com'
AND username IS NOT NULL
ORDER BY id ASC
LIMIT 5;

Note: Without parentheses, AND has higher precedence than OR.

🔹 Better version with correct logic:
SELECT id, username, email FROM users
WHERE (email LIKE '%@gmail.com' OR email LIKE '%@yahoo.com')
  AND username IS NOT NULL
ORDER BY id ASC
LIMIT 5;


👉 9. Aggregation Functions: MIN, MAX, COUNT, AVG, SUM

These functions help you perform calculations on column values.

🔹 1. COUNT – Number of rows
SELECT COUNT(*) FROM users;

✔️ Total number of users.

SELECT COUNT(email) FROM users WHERE email IS NOT NULL;

✔️ Count of users who have an email.

🔹 2. MIN and MAX – Smallest and largest values
SELECT MIN(id) AS first_user, MAX(id) AS last_user FROM users;

🔹 3. AVG – Average (only on numeric fields)

Assuming id is somewhat sequential, we can do:

SELECT AVG(id) AS avg_id FROM users;

🔹 4. SUM – Total (again, only on numeric fields)
SELECT SUM(id) AS total_ids FROM users WHERE id < 1000;

Combined Queries with Aggregates

🔹 Example 1: Count users without email and with usernames starting with ‘test’
SELECT COUNT(*) FROM users 
WHERE email IS NULL AND username LIKE 'test%';

🔹 Example 2: Get min, max, avg ID of users with Gmail addresses
SELECT 
  MIN(id) AS min_id,
  MAX(id) AS max_id,
  AVG(id) AS avg_id
FROM users 
WHERE email LIKE '%@gmail.com';

🔹 Example 3: Count how many users per email domain
SELECT 
  SPLIT_PART(email, '@', 2) AS domain,
  COUNT(*) AS total_users
FROM users
WHERE email IS NOT NULL
GROUP BY domain
ORDER BY total_users DESC
LIMIT 5;

♦️ This query breaks email at the @ to group by domain like gmail.com, yahoo.com.

GROUP BY Course

Here’s the SQL query to get the maximum mark, minimum mark, and the email (or emails) of users grouped by each course:

Option 1: Basic GROUP BY with aggregate functions (only max/min mark per course, not emails)
SELECT
  course,
  MAX(mark) AS max_mark,
  MIN(mark) AS min_mark
FROM users
GROUP BY course;

Option 2: Include emails of users who have the max or min mark per course

(PostgreSQL-specific using subqueries and JOIN)

SELECT u.course, u.email, u.mark
FROM users u
JOIN (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark
  FROM users
  GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;

♦️ This second query shows all users who have the highest or lowest mark in their course, including ties.

Here’s the updated query that includes:

  • Course name
  • Emails of users with the maximum or minimum marks
  • Their marks
  • Average mark per course
SELECT
  u.course,
  u.email,
  u.mark,
  stats.max_mark,
  stats.min_mark,
  stats.avg_mark
FROM users u
JOIN (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark
  FROM users
  GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;

Notes:
  • ROUND(AVG(mark), 2) gives the average mark rounded to two decimal places.
  • Users with the same max or min mark are all included.

Here’s the full query including:

  • Course
  • Email
  • Mark
  • Max/Min mark
  • Average mark
  • User count per course
SELECT
  u.course,
  u.email,
  u.mark,
  stats.max_mark,
  stats.min_mark,
  stats.avg_mark,
  stats.user_count
FROM users u
JOIN (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark,
    COUNT(*) AS user_count
  FROM users
  GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;

♦️ This query gives you a full breakdown of top/bottom performers per course along with stats per group.

Here’s a version that adds the rank of each user within their course based on their mark (highest mark = rank 1), along with:

  • Course
  • Email
  • Mark
  • Rank (within course)
  • Max mark, Min mark, Average mark, User count per course
WITH ranked_users AS (
  SELECT
    u.*,
    RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
  FROM users u
),
course_stats AS (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark,
    COUNT(*) AS user_count
  FROM users
  GROUP BY course
)
SELECT
  r.course,
  r.email,
  r.mark,
  r.rank_in_course,
  cs.max_mark,
  cs.min_mark,
  cs.avg_mark,
  cs.user_count
FROM ranked_users r
JOIN course_stats cs ON r.course = cs.course
ORDER BY r.course, r.rank_in_course;

Key features:
  • Users are ranked per course using RANK() (supports ties).
  • The output includes all users, not just those with max/min marks.

NOTE: Here we can see output like:

    course    |                   email                   | mark | rank_in_course | max_mark | min_mark | avg_mark | user_count
--------------+-------------------------------------------+------+----------------+----------+----------+----------+------------
 IT           | lisandra.schoen@borer-effertz.test        | 1000 |              1 |     1000 |      100 |   543.04 |        796
 IT           | leona@jaskolski-jaskolski.test            | 1000 |              1 |     1000 |      100 |   543.04 |        796
 IT           | angle@ankunding-sauer.example             |  999 |              3 |     1000 |      100 |   543.04 |        796
 IT           | drucilla_okeefe@monahan.test              |  999 |              3 |     1000 |      100 |   543.04 |        796
 algebra      | natashia.langosh@luettgen.test            | 1000 |              1 |     1000 |      100 |   541.52 |        779
 algebra      | tiffany.tremblay@bergnaum.example         | 1000 |              1 |     1000 |      100 |   541.52 |        779
 algebra      | kristeen.nikolaus@crist.example           |  999 |              3 |     1000 |      100 |   541.52 |        779
 algebra      | domenic@predovic-dare.example             |  999 |              3 |     1000 |      100 |   541.52 |        779
 algebra      | kit@oconner.example                       |  999 |              3 |     1000 |      100 |   541.52 |        779
 architecture | tierra_reilly@botsford-okuneva.test       |  997 |              1 |      997 |      100 |   549.24 |        776
 architecture | celestine_reilly@bayer.example            |  996 |              2 |      997 |      100 |   549.24 |        776
 architecture | carson@kulas.example                      |  995 |              3 |      997 |      100 |   549.24 |        776
 botany       | hassan@towne.test                         | 1000 |              1 |     1000 |      103 |   554.07 |        760
 botany       | shaunna@hudson.test                       | 1000 |              1 |     1000 |      103 |   554.07 |        760
 botany       | sanford_jacobs@johnston.example           |  999 |              3 |     1000 |      103 |   554.07 |        760
 botany       | arnulfo_cremin@ernser.example             |  999 |              3 |     1000 |      103 |   554.07 |        760

The Ranks are not consistent. To avoid this we can use DENSE_RANK().

Here’s the updated query using DENSE_RANK() instead of RANK() — this avoids gaps in rank numbering when there are ties:

WITH ranked_users AS (
  SELECT
    u.*,
    DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
  FROM users u
),
course_stats AS (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark,
    COUNT(*) AS user_count
  FROM users
  GROUP BY course
)
SELECT
  r.course,
  r.email,
  r.mark,
  r.rank_in_course,
  cs.max_mark,
  cs.min_mark,
  cs.avg_mark,
  cs.user_count
FROM ranked_users r
JOIN course_stats cs ON r.course = cs.course
WHERE r.rank_in_course <= 3
ORDER BY r.course, r.rank_in_course;

DENSE_RANK difference:
  • If 2 users tie for 1st place, the next gets rank 2 (not 3 like with RANK).
  • Ensures consistent top-N output when ties are frequent.

🔥 Boom, Bonus: To export the query result as a CSV file in PostgreSQL, you can use the \copy command in psql (PostgreSQL’s CLI), like this:

🧾 Export Top 3 Students per Course to CSV

\copy (
  WITH ranked_users AS (
    SELECT
      u.*,
      DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
    FROM users u
  ),
  course_stats AS (
    SELECT
      course,
      MAX(mark) AS max_mark,
      MIN(mark) AS min_mark,
      ROUND(AVG(mark), 2) AS avg_mark,
      COUNT(*) AS user_count
    FROM users
    GROUP BY course
  )
  SELECT
    r.course,
    r.email,
    r.mark,
    r.rank_in_course,
    cs.max_mark,
    cs.min_mark,
    cs.avg_mark,
    cs.user_count
  FROM ranked_users r
  JOIN course_stats cs ON r.course = cs.course
  WHERE r.rank_in_course <= 3
  ORDER BY r.course, r.rank_in_course
) TO 'top_students_per_course.csv' WITH CSV HEADER;

✅ Requirements:
  • Run this in the psql shell.
  • The file top_students_per_course.csv will be saved in your local working directory (where psql was started).
  • Make sure PostgreSQL has write permissions to that directory.


👉 10. LIKE, %, _ – Pattern Matching in SQL

These are used to filter text using wildcards:

  • % = matches any sequence of characters (0 or more)
  • _ = matches exactly one character

🔹 Basic LIKE Queries

Example 1: Usernames starting with “admin”
SELECT * FROM users 
WHERE username LIKE 'admin%';

Example 2: Usernames ending with “bot”
SELECT * FROM users 
WHERE username LIKE '%bot';

Example 3: Usernames containing “test”
SELECT * FROM users 
WHERE username LIKE '%test%';

🔹 _ Single-character Wildcard

Example 4: 5-character usernames
SELECT * FROM users 
WHERE username LIKE '_____';

(Each _ stands for one character.)

Example 5: Emails starting with any single letter + “ohn” (e.g., “john”, “kohn”)
SELECT * FROM users 
WHERE username LIKE '_ohn';

Combined Queries with LIKE, %, _

🔹 Example 6: Users whose username contains “test” and email ends with “gmail.com”
SELECT * FROM users 
WHERE username LIKE '%test%' AND email LIKE '%@gmail.com';

🔹 Example 7: Users with 3-character usernames and missing email
SELECT * FROM users 
WHERE username LIKE '___' AND email IS NULL;

🔹 Example 8: Users with usernames that start with “a” or end with “x” and have a mobile number
SELECT * FROM users 
WHERE (username LIKE 'a%' OR username LIKE '%x') AND mobile IS NOT NULL;


👉 11. IN, NOT IN, BETWEEN – Set & Range Filters

These are used to filter based on a list of values (IN) or a range (BETWEEN).

🔹 1. IN – Match any of the listed values

SELECT * FROM users 
WHERE username IN ('admin', 'test_user', 'john_doe');

🔹 2. NOT IN – Exclude listed values

SELECT * FROM users 
WHERE username NOT IN ('admin', 'test_user');

🔹 3. BETWEEN – Match within a range (inclusive)

SELECT * FROM users 
WHERE id BETWEEN 100 AND 200;

♦️ Equivalent to: id >= 100 AND id <= 200

Combined Queries

🔹 Example 1: Users with username in a list and id between 1 and 500
SELECT * FROM users 
WHERE username IN ('alice', 'bob', 'carol') 
  AND id BETWEEN 1 AND 500;

🔹 Example 2: Exclude system users and select a range of IDs
SELECT id, username FROM users 
WHERE username NOT IN ('admin', 'system') 
  AND id BETWEEN 1000 AND 2000;

🔹 Example 3: Top 5 users whose email domains are in a specific list
SELECT * FROM users 
WHERE SPLIT_PART(email, '@', 2) IN ('gmail.com', 'yahoo.com', 'hotmail.com')
ORDER BY id
LIMIT 5;


👉 12. SQL Aliases – Renaming Columns or Tables Temporarily

Aliases help improve readability, especially in joins or when using functions.

🔹 1. Column Aliases

Use AS (optional keyword) to rename a column in the result.

Example 1: Rename username to user_name
SELECT username AS user_name, email AS user_email 
FROM users;

You can also omit AS:

SELECT username user_name, email user_email 
FROM users;

🔹 2. Table Aliases

Assign a short name to a table (very useful in joins).

Example 2: Simple alias for table
SELECT u.username, u.email 
FROM users u 
WHERE u.email LIKE '%@gmail.com';

🔹 3. Alias with functions

SELECT COUNT(*) AS total_users, MAX(id) AS latest_id 
FROM users;

Combined Query with Aliases

🔹 Example 4: Count Gmail users, alias result and filter
SELECT 
  COUNT(*) AS gmail_users 
FROM users u 
WHERE u.email LIKE '%@gmail.com';

🔹 Example 5: List usernames with shortened table name and domain extracted
SELECT 
  u.username AS name, 
  SPLIT_PART(u.email, '@', 2) AS domain 
FROM users u 
WHERE u.email IS NOT NULL 
ORDER BY u.username
LIMIT 10;


Rails 8 App: Setup Test DB in PostgreSQL | Faker | Extensions for Rails app, VSCode

Let’s try to add some sample data first to our database.

Step 1: Install pgxnclient

On macOS (with Homebrew):

brew install pgxnclient

On Ubuntu/Debian:

sudo apt install pgxnclient

Step 2: Install the faker extension via PGXN

pgxn install faker

I get issue with installing faker via pgxn:

~ pgxn install faker
INFO: best version: faker 0.5.3
ERROR: resource not found: 'https://api.pgxn.org/dist/PostgreSQL_Faker/0.5.3/META.json'

⚠️ Note: faker extension we’re trying to install via pgxn is not available or improperly published on the PGXN network. Unfortunately, the faker extension is somewhat unofficial and not actively maintained or reliably hosted.

🚨 You can SKIP STEP 3,4,5 and opt Option 2

Step 3: Build and install the extension into PostgreSQL

cd /path/to/pg_faker  # PGXN will print this after install
make
sudo make install

Step 4: Enable it in your database

Inside psql :

CREATE EXTENSION faker;

Step 5: Insert 10,000 fake users

INSERT INTO users (user_id, username, email, phone_number)
SELECT
  gs AS user_id,
  faker_username(),
  faker_email(),
  faker_phone_number()
FROM generate_series(1, 10000) AS gs;
Option 2: Use Ruby + Faker gem (if you’re using Rails or Ruby)

If you’re building your app in Rails, use the faker gem directly:

In Ruby:
require 'faker'
require 'pg'

conn = PG.connect(dbname: 'test_db')

(1..10_000).each do |i|
  conn.exec_params(
    "INSERT INTO users (user_id, username, email, phone_number) VALUES ($1, $2, $3, $4)",
    [i, Faker::Internet.username, Faker::Internet.email, Faker::PhoneNumber.phone_number]
  )
end

In Rails (for test_db), Create the Rake Task:

Create a file at:

lib/tasks/seed_fake_users.rake
# lib/tasks/seed_fake_users.rake

namespace :db do
  desc "Seed 10,000 fake users into the users table"
  task seed_fake_users: :environment do
    require "faker"
    require "pg"

    conn = PG.connect(dbname: "test_db")

    # If user_id is a serial and you want to reset the sequence after deletion, run:
    # conn.exec_params("TRUNCATE TABLE users RESTART IDENTITY")
    # delete existing users to load fake users
    conn.exec_params("DELETE FROM users")
    

    puts "Seeding 10,000 fake users ...."
    (1..10_000).each do |i|
      conn.exec_params(
        "INSERT INTO users (user_id, username, email, phone_number) VALUES ($1, $2, $3, $4)",
        [ i, Faker::Internet.username, Faker::Internet.email, Faker::PhoneNumber.phone_number ]
      )
    end
    puts "Seeded 10,000 fake users into the users table"
    conn.close
  end
end
# run the task
bin/rails db:seed_fake_users
For Normal Rails Rake Task:
# lib/tasks/seed_fake_users.rake

namespace :db do
  desc "Seed 10,000 fake users into the users table"
  task seed_fake_users: :environment do
    require 'faker'

    puts "🌱 Seeding 10,000 fake users..."

    users = []

    # delete existing users
    User.destroy_all

    10_000.times do |i|
      users << {
        user_id: i + 1,
        username: Faker::Internet.unique.username,
        email: Faker::Internet.unique.email,
        phone_number: Faker::PhoneNumber.phone_number
      }
    end

    # Use insert_all for performance
    User.insert_all(users)

    puts "✅ Done. Inserted 10,000 users."
  end
end
# run the task
bin/rails db:seed_fake_users

Now we will discuss about PostgreSQL Extensions and it’s usage.

PostgreSQL extensions are add-ons or plug-ins that extend the core functionality of PostgreSQL. They provide additional capabilities such as new data types, functions, operators, index types, or full features like full-text search, spatial data handling, or fake data generation.

🔧 What Extensions Can Do

Extensions can:

  • Add functions (e.g. gen_random_bytes() from pgcrypto)
  • Provide data types (e.g. hstore, uuid, jsonb)
  • Enable indexing techniques (e.g. btree_gin, pg_trgm)
  • Provide tools for testing and development (e.g. faker, pg_stat_statements)
  • Enhance performance monitoring, security, or language support

📦 Common PostgreSQL Extensions

ExtensionPurpose
pgcryptoCryptographic functions (e.g., hashing, random byte generation)
uuid-osspFunctions to generate UUIDs
postgisSpatial and geographic data support
hstoreKey-value store in a single PostgreSQL column
pg_trgmTrigram-based text search and indexing
citextCase-insensitive text type
pg_stat_statementsSQL query statistics collection
fakerGenerates fake but realistic data (for testing)

📥 Installing and Enabling Extensions

1. Install (if not built-in)

Via package manager or PGXN (PostgreSQL Extension Network), or compile from source.

2. Enable in a database

CREATE EXTENSION extension_name;

Example:

CREATE EXTENSION pgcrypto;

Enabling an extension makes its functionality available to the current database only.

🤔 Why Use Extensions?

  • Productivity: Quickly add capabilities without writing custom code.
  • Performance: Access to advanced indexing, statistics, and optimization tools.
  • Development: Generate test data (faker), test encryption (pgcrypto), etc.
  • Modularity: PostgreSQL stays lightweight while letting you add only what you need.

Here’s a categorized list (with a simple visual-style layout) of PostgreSQL extensions that are safe and useful for Rails apps in both development and production environments.

🔌 PostgreSQL Extensions for Rails Apps

# connect psql
psql -U username -d database_name

# list all available extensions
SELECT * FROM pg_available_extensions;

# eg. to install the hstore extension run
CREATE EXTENSION hstore;

# verify the installation
SELECT * FROM pg_extension;
SELECT * FROM pg_extension WHERE extname = 'hstore';

🔐 Security & UUIDs

ExtensionUse CaseSafe for Prod
pgcryptoSecure random bytes, hashes, UUIDs
uuid-osspUUID generation (v1, v4, etc.)

💡 Tip: Use uuid-ossp or pgcrypto to generate UUID primary keys (id: :uuid) in Rails.

📘 PostgreSQL Procedures and Triggers — Explained with Importance and Examples

PostgreSQL is a powerful, open-source relational database that supports advanced features like stored procedures and triggers, which are essential for encapsulating business logic inside the database.

🔹 What are Stored Procedures in PostgreSQL?

A stored procedure is a pre-compiled set of SQL and control-flow statements stored in the database and executed by calling it explicitly.

Purpose: Encapsulate business logic, reuse complex operations, improve performance, and reduce network overhead.

✅ Benefits of Stored Procedures:
  • Faster execution (compiled and stored in DB)
  • Centralized logic
  • Reduced client-server round trips
  • Language support: SQL, PL/pgSQL, Python, etc.
🧪 Example: Create a Procedure to Add a New User
CREATE OR REPLACE PROCEDURE add_user(name TEXT, email TEXT)
LANGUAGE plpgsql
AS $$
BEGIN
    INSERT INTO users (name, email) VALUES (name, email);
END;
$$;

▶️ Call the procedure:
CALL add_user('John Doe', 'john@example.com');


🔹 What are Triggers in PostgreSQL?

A trigger is a special function that is automatically executed in response to certain events on a table (like INSERT, UPDATE, DELETE).

Purpose: Enforce rules, maintain audit logs, auto-update columns, enforce integrity, etc.

✅ Benefits of Triggers:
  • Automate tasks on data changes
  • Enforce business rules and constraints
  • Keep logs or audit trails
  • Maintain derived data or counters

🧪 Example: Trigger to Log Inserted Users

1. Create the audit table:

CREATE TABLE user_audit (
    id SERIAL PRIMARY KEY,
    user_id INTEGER,
    name TEXT,
    email TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

2. Create the trigger function:

CREATE OR REPLACE FUNCTION log_user_insert()
RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO user_audit (user_id, name, email)
    VALUES (NEW.id, NEW.name, NEW.email);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

3. Create the trigger on users table:

CREATE TRIGGER after_user_insert
AFTER INSERT ON users
FOR EACH ROW
EXECUTE FUNCTION log_user_insert();

Now, every time a user is inserted, the trigger logs it in the user_audit table automatically.

📌 Difference: Procedures vs. Triggers

FeatureStored ProceduresTriggers
When executedCalled explicitly with CALLAutomatically executed on events
PurposeBatch processing, encapsulate logicReact to data changes automatically
ControlFull control by developerFire based on database event (Insert, Update, Delete)
ReturnsNo return or OUT parametersMust return NEW or OLD row in most cases

🎯 Why Are Procedures and Triggers Important?

✅ Use Cases for Stored Procedures:
  • Bulk processing (e.g. daily billing)
  • Data import/export
  • Account setup workflows
  • Multi-step business logic
✅ Use Cases for Triggers:
  • Auto update updated_at column
  • Enforce soft-deletes
  • Maintain counters or summaries (e.g., post comment count)
  • Audit logs / change history
  • Cascading updates or cleanups

🚀 Real-World Example: Soft Delete Trigger

Instead of deleting records, mark them as deleted = true.

CREATE OR REPLACE FUNCTION soft_delete_user()
RETURNS TRIGGER AS $$
BEGIN
  UPDATE users SET deleted = TRUE WHERE id = OLD.id;
  RETURN NULL; -- cancel the delete
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER before_user_delete
BEFORE DELETE ON users
FOR EACH ROW
EXECUTE FUNCTION soft_delete_user();

Now any DELETE FROM users WHERE id = 1; will just update the deleted column.

🛠️ Tools to Manage Procedures & Triggers

  • pgAdmin (GUI)
  • psql (CLI)
  • Code-based migrations (via tools like ActiveRecord or pg gem)

🧠 Summary

FeatureStored ProcedureTrigger
Manual/AutoManual (CALL)Auto (event-based)
FlexibilityComplex logic, loops, variablesQuick logic, row-based or statement-based
LanguagesPL/pgSQL, SQL, Python, etc.PL/pgSQL, SQL
Best forMulti-step workflowsAudit, logging, validation

Use Postgres RANDOM()

By using RANDOM() in PostgreSQL. If the application uses PostgreSQL’s built-in RANDOM() function to efficiently retrieve a random user from the database. Here’s why this is important:

  1. Efficiency: PostgreSQL’s RANDOM() is more efficient than loading all records into memory and selecting one randomly in Ruby. This is especially important when dealing with large datasets (like if we have 10000 users).
  2. Database-level Operation: The randomization happens at the database level rather than the application level, which:
  • Reduces memory usage (we don’t need to load unnecessary records)
  • Reduces network traffic (only one record is transferred)
  • Takes advantage of PostgreSQL’s optimized random number generation
  1. Single Query: Using RANDOM() allows us to fetch a random record in a single SQL query, typically something like:sqlApply to
SELECT * FROM users ORDER BY RANDOM() LIMIT 1

This is in contrast to less efficient methods like:

  • Loading all users and using Ruby’s sample method (User.all.sample)
  • Getting a random ID and then querying for it (which would require two queries)
  • Using offset with count (which can be slow on large tables)

🔍 Full Text Search & Similarity

ExtensionUse CaseSafe for Prod
pg_trgmTrigram-based fuzzy search (great with ILIKE & similarity)
unaccentRemove accents for better search results
fuzzystrmatchSoundex, Levenshtein distance✅ (heavy use = test!)

💡 Combine pg_trgm + unaccent for powerful search in Rails models using ILIKE.

📊 Performance Monitoring & Dev Insights

ExtensionUse CaseSafe for Prod
pg_stat_statementsMonitor slow queries, frequency
auto_explainLog plans for slow queries
hypopgSimulate hypothetical indexes✅ (dev only)

🧪 Dev Tools & Data Generation

ExtensionUse CaseSafe for Prod
fakerFake data generation for testing❌ Dev only
pgfakerCommunity alternative to faker❌ Dev only

📦 Storage & Structure

ExtensionUse CaseSafe for Prod
hstoreKey-value storage in a column
citextCase-insensitive text

💡 citext is very handy for case-insensitive email columns in Rails.

🗺️ Geospatial (Advanced)

ExtensionUse CaseSafe for Prod
postgisGIS/spatial data support✅ (big apps)

🎨 Visual Summary

+-------------------+-----------------------------+-----------------+
| Category          | Extension                   | Safe for Prod?  |
+-------------------+-----------------------------+-----------------+
| Security/UUIDs    | pgcrypto, uuid-ossp         | ✅              |
| Search/Fuzziness  | pg_trgm, unaccent, fuzzystr | ✅              |
| Monitoring        | pg_stat_statements          | ✅              |
| Dev Tools         | faker, pgfaker              | ❌ (Dev only)   |
| Text/Storage      | citext, hstore              | ✅              |
| Geo               | postgis                     | ✅              |
+-------------------+-----------------------------+-----------------+

PostgreSQL Extension for VSCode

# 1. open the Command Palette (Cmd + Shift + P)
# 2. Type 'PostgreSQL: Add Connection'
# 3. Enter the hostname of the database authentication details
# 4. Open Command Palette, type: 'PostgreSQL: New Query'

Enjoy PostgreSQL  🚀


Rails 8 App: Setup Test DB in PostgreSQL | Query Performance Using EXPLAIN ANALYZE

Now we’ll go full-on query performance pro mode using EXPLAIN ANALYZE and real plans. We’ll learn how PostgreSQL makes decisions, how to catch slow queries, and how your indexes make them 10x faster.

💎 Part 1: What is EXPLAIN ANALYZE?

EXPLAIN shows how PostgreSQL plans to execute your query.

ANALYZE runs the query and adds actual time, rows, loops, etc.

Syntax:

EXPLAIN ANALYZE
SELECT * FROM users WHERE username = 'bob';

✏️ Example 1: Without Index

SELECT * FROM users WHERE username = 'bob';

If username has no index, plan shows:

Seq Scan on users
  Filter: (username = 'bob')
  Rows Removed by Filter: 9999

❌ PostgreSQL scans all rows = Sequential Scan = slow!

➕ Add Index:

CREATE INDEX idx_users_username ON users (username);

Now rerun:

EXPLAIN ANALYZE SELECT * FROM users WHERE username = 'bob';

You’ll see:

Index Scan using idx_users_username on users
  Index Cond: (username = 'bob')

✅ PostgreSQL uses B-tree index
🚀 Massive speed-up!

🔥 Want even faster?

SELECT username FROM users WHERE username = 'bob';

If PostgreSQL shows:

Index Only Scan using idx_users_username on users
  Index Cond: (username = 'bob')

🎉 Index Only Scan! = covering index success!
No heap fetch = lightning-fast.

⚠️ Note: Index-only scan only works if:

  • Index covers all selected columns
  • Table is vacuumed (PostgreSQL uses visibility map)

If you still get Seq scan output like:

test_db=# EXPLAIN ANALYSE SELECT * FROM users where username = 'aman_chetri';
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..1.11 rows=1 width=838) (actual time=0.031..0.034 rows=1 loops=1)
   Filter: ((username)::text = 'aman_chetri'::text)
   Rows Removed by Filter: 2
 Planning Time: 0.242 ms
 Execution Time: 0.077 ms
(5 rows)

even after adding an index, because PostgreSQL is saying:

  • 🤔 “The table is so small (cost = 1.11), scanning the whole thing is cheaper than using the index.”
  • Also: Your query uses only SELECT username, which could be eligible for Index Only Scan, but heap fetch might still be needed due to visibility map.

🔧 Step-by-step Fix:

✅ 1. Add Data for Bigger Table

If the table is small (few rows), PostgreSQL will prefer Seq Scan no matter what.

Try adding ~10,000 rows:

INSERT INTO users (username, email, phone_number)
SELECT 'user_' || i, 'user_' || i || '@mail.com', '1234567890'
FROM generate_series(1, 10000) i;

Then VACUUM ANALYZE users; again and retry EXPLAIN.

✅ 2. Confirm Index Exists

First, check your index exists and is recognized:

\d users

You should see something like:

Indexes:
    "idx_users_username" btree (username)

If not, add:

CREATE INDEX idx_users_username ON users(username);

✅ 3. Run ANALYZE (Update Stats)
ANALYZE users;

This updates statistics — PostgreSQL might not be using the index if it thinks only one row matches or the table is tiny.

✅ 4. Vacuum for Index-Only Scan

Index-only scans require the visibility map to be set.

Run:

VACUUM ANALYZE users;

This marks pages in the table as “all-visible,” enabling PostgreSQL to avoid reading the heap.

✅ 5. Force PostgreSQL to Consider Index

You can turn off sequential scan temporarily (for testing):

SET enable_seqscan = OFF;

EXPLAIN SELECT username FROM users WHERE username = 'bob';

You should now see:

Index Scan using idx_users_username on users ...

⚠️ Use this only for testing/debugging — not in production.

💡 Extra Tip (optional): Use EXPLAIN (ANALYZE, BUFFERS)
EXPLAIN (ANALYZE, BUFFERS)
SELECT username FROM users WHERE username = 'bob';

This will show:

  • Whether heap was accessed
  • Buffer hits
  • Actual rows
📋 Summary
StepCommand
Check Index\d users
Analyze tableANALYZE users;
Vacuum for visibilityVACUUM ANALYZE users;
Disable seq scan for testSET enable_seqscan = OFF;
Add more rows (optional)INSERT INTO ...

🚨 How to catch bad index usage?

Always look for:

  • “Seq Scan” instead of “Index Scan” ➔ missing index
  • “Heap Fetch” ➔ not a covering index
  • “Rows Removed by Filter” ➔ inefficient filtering
  • “Loops: 1000+” ➔ possible N+1 issue

Common Pattern Optimizations

PatternFix
WHERE column = ?B-tree index on column
WHERE column LIKE 'prefix%'B-tree works (with text_ops)
SELECT col1 WHERE col2 = ?Covering index: (col2, col1) or (col2) INCLUDE (col1)
WHERE col BETWEEN ?Composite index with range second: (status, created_at)
WHERE col IN (?, ?, ?)Index still helps
ORDER BY col LIMIT 10Index on col helps sort fast

⚡ Tip: Use pg_stat_statements to Find Slow Queries

Enable it in postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'

Then run:

SELECT query, total_exec_time, calls
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;

🎯 Find your worst queries & optimize them with new indexes!

🧪 Try It Yourself

Want a little lab setup to practice?

CREATE TABLE users (
  user_id serial PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150),
  phone_number VARCHAR(20)
);

-- Insert 100K fake rows
INSERT INTO users (username, email, phone_number)
SELECT
  'user_' || i,
  'user_' || i || '@example.com',
  '999-000-' || i
FROM generate_series(1, 100000) i;

Then test:

  1. EXPLAIN ANALYZE SELECT * FROM users WHERE username = 'user_5000';
  2. Add INDEX ON username
  3. Re-run, compare speed!

🎯 Extra Pro Tools for Query Performance

  • EXPLAIN ANALYZE → Always first tool
  • pg_stat_statements → Find slow queries in real apps
  • auto_explain → Log slow plans automatically
  • pgBadger or pgHero → Visual query monitoring
💥 Now We Know:

✅ How to read query plans
✅ When you’re doing full scans vs index scans
✅ How to achieve index-only scans
✅ How to catch bad performance early
✅ How to test and fix in real world

Happy Performance Fixing.. 🚀

Rails 8 App: Setup Test DB in PostgreSQL | Covering Index | BRIN Indexes | Hash Indexes | Create super fast indexes

Let’s look into some of the features of sql data indexing. This will be super helpful while developing our Rails 8 Application.

💎 Part 1: What is a Covering Index?

Normally when you query:

SELECT * FROM users WHERE username = 'bob';

  • Database searches username index (secondary).
  • Finds a pointer (TID or PK).
  • Then fetches full row from table (heap or clustered B-tree).

Problem:

  • Heap fetch = extra disk read.
  • Clustered B-tree fetch = extra traversal.

📜 Covering Index idea:

✅ If the index already contains all the columns you need,
✅ Then the database does not need to fetch the full row!

It can answer the query purely by scanning the index! ⚡

Boom — one disk read, no extra hop!

✏️ Example in PostgreSQL:

Suppose your query is:

SELECT username FROM users WHERE username = 'bob';

  • You only need username.
  • But by default, PostgreSQL indexes only store the index column (here, username) + TID.

✅ So in this case — already covering!

No heap fetch needed!

✏️ Example in MySQL InnoDB:

Suppose your query is:

SELECT username FROM users WHERE username = 'bob';

  • Secondary index (username) contains:
    • username (indexed column)
    • user_id (because secondary indexes in InnoDB always store PK)

♦️ So again, already covering!
No need to jump to the clustered index!

🎯 Key point:

If your query only asks for columns already inside the index,
then only the index is touched ➔ no second lookup ➔ super fast!

💎 Part 2: Real SQL Examples

✨ PostgreSQL

Create a covering index for common query:

CREATE INDEX idx_users_username_email ON users (username, email);

Now if you run:

SELECT email FROM users WHERE username = 'bob';

Postgres can:

  • Search index on username
  • Already have email in index
  • ✅ No heap fetch!

(And Postgres is smart: it checks index-only scan automatically.)

✨ MySQL InnoDB

Create a covering index:

CREATE INDEX idx_users_username_email ON users (username, email);

✅ Now query:

SELECT email FROM users WHERE username = 'bob';

Same behavior:

  • Only secondary index read.
  • No need to touch primary clustered B-tree.

💎 Part 3: Tips to design smart Covering Indexes

✅ If your query uses WHERE on col1 and SELECT col2,
✅ Best to create index: (col1, col2).

✅ Keep indexes small — don’t add 10 columns unless needed.
✅ Avoid huge TEXT or BLOB columns in covering indexes — they make indexes heavy.

Composite indexes are powerful:

CREATE INDEX idx_users_username_email ON users (username, email);

→ Can be used for:

  • WHERE username = ?
  • WHERE username = ? AND email = ?
  • etc.

✅ Monitor index usage:

  • PostgreSQL: EXPLAIN ANALYZE
  • MySQL: EXPLAIN

✅ Always check if Index Only Scan or Using Index appears in EXPLAIN plan!

📚 Quick Summary Table

DatabaseNormal QueryWith Covering Index
PostgreSQLB-tree ➔ Heap fetch (unless TID optimization)B-tree scan only
MySQL InnoDBSecondary B-tree ➔ Primary B-treeSecondary B-tree only
Result2 steps1 step
SpeedSlowerFaster

🏆 Great! — Now We Know:

🧊 How heap fetch works!
🧊 How block lookup is O(1)!
🧊 How covering indexes skip heap fetch!
🧊 How to create super fast indexes for PostgreSQL and MySQL!


🦾 Advanced Indexing Tricks (Real Production Tips)

Now it’s time to look into super heavy functionalities that Postgres supports for making our sql data search/fetch super fast and efficient.

1. 🎯 Partial Indexes (PostgreSQL ONLY)

✅ Instead of indexing the whole table,
✅ You can index only the rows you care about!

Example:

Suppose 95% of users have status = 'inactive', but you only search active users:

SELECT * FROM users WHERE status = 'active' AND email = 'bob@example.com';

👉 Instead of indexing the whole table:

CREATE INDEX idx_active_users_email ON users (email) WHERE status = 'active';

♦️ PostgreSQL will only store rows with status = 'active' in this index!

Advantages:

  • Smaller index = Faster scans
  • Less space on disk
  • Faster index maintenance (less updates/inserts)

Important:

  • MySQL (InnoDB) does NOT support partial indexes 😔 — only PostgreSQL has this superpower.

2. 🎯 INCLUDE Indexes (PostgreSQL 11+)

✅ Normally, a composite index uses all columns for sorting/searching.
✅ With INCLUDE, extra columns are just stored in index, not used for ordering.

Example:

CREATE INDEX idx_username_include_email ON users (username) INCLUDE (email);

Meaning:

  • username is indexed and ordered.
  • email is only stored alongside.

Now query:

SELECT email FROM users WHERE username = 'bob';

Index-only scan — no heap fetch.

Advantages:

  • Smaller & faster than normal composite indexes.
  • Helps to create very efficient covering indexes.

Important:

  • MySQL 8.0 added something similar with INVISIBLE columns but it’s still different.

3. 🎯 Composite Index Optimization

✅ Always order columns inside index smartly based on query pattern.

Golden Rules:

⚜️ Equality columns first (WHERE col = ?)
⚜️ Range columns second (WHERE col BETWEEN ?)
⚜️ SELECT columns last (for covering)

Example:

If query is:

SELECT email FROM users WHERE status = 'active' AND created_at > '2024-01-01';

Best index:

CREATE INDEX idx_users_status_created_at ON users (status, created_at) INCLUDE (email);

♦️ status first (equality match)
♦️ created_at second (range)
♦️ email included (covering)

Bad Index: (wrong order)

CREATE INDEX idx_created_at_status ON users (created_at, status);

→ Will not be efficient!

4. 🎯 BRIN Indexes (PostgreSQL ONLY, super special!)

✅ When your table is very huge (millions/billions of rows),
✅ And rows are naturally ordered (like timestamp, id increasing),
✅ You can create a BRIN (Block Range Index).

Example:

CREATE INDEX idx_users_created_at_brin ON users USING BRIN (created_at);

♦️ BRIN stores summaries of large ranges of pages (e.g., min/max timestamp per 128 pages).

♦️ Ultra small index size.

♦️ Very fast for large range queries like:

SELECT * FROM users WHERE created_at BETWEEN '2024-01-01' AND '2024-04-01';

Important:

  • BRIN ≠ B-tree
  • BRIN is approximate, B-tree is precise.
  • Only useful if data is naturally correlated with physical storage order.

MySQL?

  • MySQL does not have BRIN natively. PostgreSQL has a big advantage here.

5. 🎯 Hash Indexes (special case)

✅ If your query is always exact equality (not range),
✅ You can use hash indexes.

Example:

CREATE INDEX idx_users_username_hash ON users USING HASH (username);

Useful for:

  • Simple WHERE username = 'bob'
  • Never ranges (BETWEEN, LIKE, etc.)

⚠️ Warning:

  • Hash indexes used to be “lossy” before Postgres 10.
  • Now they are safe, but usually B-tree is still better unless you have very heavy point lookups.

😎 PRO-TIP: Which Index Type to Use?

Use caseIndex type
Search small ranges or equalityB-tree
Search on huge tables with natural order (timestamps, IDs)BRIN
Only exact match, super heavy lookupHash
Search only small part of table (active users, special conditions)Partial index
Need to skip heap fetchINCLUDE / Covering Index

🗺️ Quick Visual Mindmap:

Your Query
│
├── Need Equality + Range? ➔ B-tree
│
├── Need Huge Time Range Query? ➔ BRIN
│
├── Exact equality only? ➔ Hash
│
├── Want Smaller Index (filtered)? ➔ Partial Index
│
├── Want to avoid Heap Fetch? ➔ INCLUDE columns (Postgres) or Covering Index

🏆 Now we Know:

🧊 Partial Indexes
🧊 INCLUDE Indexes
🧊 Composite Index order tricks
🧊 BRIN Indexes
🧊 Hash Indexes
🧊 How to choose best Index

This is serious pro-level database knowledge.


Enjoy SQL! 🚀