Understanding Enums: Why They Exist, How They Work, and How Rails Implements Them

Enums are one of those features developers use frequently – especially in frameworks like Rails – but many developers never fully understand why enums exist, what problem they solve, or how they are implemented internally. In Rails, enums appear deceptively simple:

enum status: { pending: 0, paid: 1, failed: 2 }

But behind this tiny line lies an important software design concept used across programming languages, databases, compilers, APIs, operating systems, and application architecture.

This article explains the complete picture of enums:

Why enums exist
How they differ from other data structures
How Rails maps enums to integers internally
Whether enums are tied to SQL/databases
How ActiveRecord::Enum works under the hood
Real-world benefits and tradeoffs developers should know

What Is an Enum?

An Enum (Enumeration) is a restricted set of named values representing a finite group of states or options.

Example:

status = :pending

Possible statuses may be:

			
:pending
:processing
:completed
:failed

Instead of allowing any arbitrary value, enums constrain the system to a known set of valid states.

Why Do Enums Exist?

Enums solve several important problems in software systems.

1. Prevent Invalid States

Without enums:

order.status = "asdfgh"

This may accidentally enter the database and corrupt business logic.

Enums restrict allowed values:

			
enum status: {
  pending: 0,
  processing: 1,
  completed: 2
}

		

Now Rails only allows known states.

2. Improve Readability

Compare:

if order.status == 2

if order.completed?

Enums convert meaningless numbers into expressive business language.

3. Save Storage Space

Integers are smaller and faster than strings.

Instead of storing:

"processing"

the DB stores:

This improves:

indexing
query performance
storage efficiency

4. Standardize State Management

Enums centralize valid states:

Order.statuses

returns:

			
{
  "pending" => 0,
  "processing" => 1,
  "completed" => 2
}

		

This becomes a single source of truth.

5. Enable Better APIs & DSLs

Rails automatically generates methods:

			
order.pending?
order.completed!
Order.processing

Enums create expressive domain APIs.

How Enums Differ From Other Data Structures

Enums are NOT collections like arrays or hashes.

They represent a finite state system.

🔹 Enum vs Array

Array:

statuses = ["pending", "paid", "failed"]

Problem:

no constraints
no semantic meaning
no mapping behavior
no helper methods

🔹 Enum vs Hash

Hash:

			
STATUSES = {
  pending: 0,
  paid: 1
}

Closer, but still missing:

validations
query scopes
state predicates
DSL methods

Rails enums internally use hashes, but add behavior around them.

🔹 Enum vs Constants

Constants:

			
PENDING = 0
PAID = 1

Problem:

scattered
harder to manage
no grouped state semantics

Enums organize states cohesively.

🌍 Are Enums Related Only to SQL or Databases?

❌ Absolutely not.

Enums exist in:

C
Java
Rust
Swift
TypeScript
GraphQL
Operating systems
Compilers
APIs
State machines

Enums are a general programming concept, not a database feature.

Example: TypeScript Enum

			
enum Status {
  Pending,
  Processing,
  Completed
}

		

Example: Java Enum

			
enum Status {
  PENDING,
  PROCESSING,
  COMPLETED
}

		

Example: PostgreSQL Native Enum

			
CREATE TYPE status AS ENUM (
  'pending',
  'processing',
  'completed'
);

		

This is database-level enum support.

🏗️ How Rails Implements Enums

Rails provides:

ActiveRecord::Enum

located in:

activerecord/lib/active_record/enum.rb

When you write:

			
class Order < ApplicationRecord
  enum status: {
    pending: 0,
    processing: 1,
    completed: 2
  }
end

		

Rails dynamically generates:

1️⃣ Attribute Mapping

			
order.status
# => "pending"

Internally stored as:

in the database.

2️⃣ Predicate Methods

			
order.pending?
order.completed?

3️⃣ Bang Methods

order.completed!

Equivalent to:

order.update!(status: :completed)

4️⃣ Query Scopes

			
Order.pending
Order.completed

Generated automatically.

5️⃣ Mapping Helpers

Order.statuses

Returns:

			
{
  "pending" => 0,
  "processing" => 1,
  "completed" => 2
}

		

How Rails Maps Enum Values to Integers

Internally Rails stores:

			
{
  pending: 0,
  processing: 1,
  completed: 2
}

		

When assigning:

order.status = :processing

Rails converts:

:processing -> 1

before writing to DB.

When reading:

1 -> "processing"

This conversion is handled through ActiveRecord attribute type casting.

Database Example

Ruby:

			
order.status
# => "completed"

Actual DB value:

status = 2

Why Integers Are Commonly Used

Integers:

are compact
index efficiently
compare faster
are DB-friendly

This is why Rails originally used integer-backed enums.

Important Enum Pitfall: Order Matters

This is VERY important.

Dangerous

enum status: [:pending, :processing, :completed]

Rails maps automatically:

			
pending    -> 0
processing -> 1
completed  -> 2

If you later insert:

[:pending, :draft, :processing, :completed]

Everything shifts:

processing becomes 2
completed becomes 3

💥 Existing DB data breaks.

Correct (recommended)

Always use explicit mapping:

			
enum status: {
  pending: 0,
  processing: 1,
  completed: 2
}

		

String-Based Enums in Rails

Rails also supports string-backed enums:

			
enum status: {
  pending: "pending",
  completed: "completed"
}

Benefits:

human-readable DB values
safer migrations
easier debugging

Tradeoff:

slightly larger storage
slightly slower indexing

🧪 Real SQL Generated by Rails Enum Queries

Order.completed

Generates:

			
SELECT *
FROM orders
WHERE status = 2;

Even though Ruby code uses names, SQL uses integers.

🔬 Internals: How ActiveRecord::Enum Works

Internally Rails:

stores mappings in a class hash
defines methods dynamically using metaprogramming
hooks into ActiveRecord attribute casting
builds scopes automatically

Rails essentially does something conceptually like:

			
define_method("completed?") do
  status == "completed"
end

and:

scope :completed, -> { where(status: 2) }

This is why enums feel “magical.”

🚨 Limitations of Rails Enums

Enums are useful, but not perfect.

1. Hard to evolve complex workflows

If states become complicated:

pending -> approved -> shipped -> refunded -> disputed

you may need:

state machines
workflow engines

Examples:

aasm
state_machines

2. Integer values can become opaque

DB shows:

status = 2

Harder to debug directly.

3. No DB-level validation by default

Rails validates at app layer, but DB still accepts:

status = 999

unless constrained.

🛡️ Best Practices for Rails Enums

Use explicit mappings

			
enum status: {
  pending: 0,
  processing: 1,
  completed: 2
}

		

Add DB constraints if critical

Example PostgreSQL constraint:

CHECK (status IN (0,1,2))

Keep enums focused

Good:

			
status
payment_state
visibility

Bad:

everything_state

Prefer string enums when readability matters

Especially in:

analytics-heavy apps
debugging-heavy systems
APIs

Consider state machines for complex transitions

Enums represent states.
State machines represent transitions.

Very different concepts.

Mental Model Every Developer Should Remember

Think of enums as:

“A controlled vocabulary for state.”

Enums are:

not collections
not just DB mappings
not Rails-specific

They are a way to model finite, meaningful states safely and expressively.

Final Takeaway

Enums exist because software systems constantly need to represent a limited set of valid states in a way that is:

efficient
readable
maintainable
safe

Rails’ ActiveRecord::Enum builds a powerful abstraction on top of simple integer (or string) mappings, generating expressive APIs, query scopes, and validations automatically through Ruby metaprogramming.

Understanding enums deeply helps developers:

design better domain models
avoid fragile state systems
write safer queries
reason about application workflows more clearly

Enums may look small, but they are one of the foundational building blocks of robust application design.

Happy Implementing! 🚀

GCP Cloud SQL Disaster Recovery: A Practical Guide for Developers

When a production database goes down – whether from a bad migration, an accidental DROP TABLE, or a rogue script – the clock starts ticking. Every minute of downtime is lost revenue, broken trust, and a very stressful Slack channel.

This post walks through how Google Cloud SQL’s backup and recovery features work, common disaster scenarios, and the recovery playbook a developer should follow for each. The examples use a typical SaaS application backed by PostgreSQL on Cloud SQL, but the principles apply broadly.

Cloud SQL Backup Fundamentals

Before anything goes wrong, you need to understand what Cloud SQL gives you out of the box and what you need to configure yourself.

Automated Backups

Cloud SQL can take daily automated backups of your instance. These are full snapshots of the entire database and are retained for a configurable window (default 7 days, max 365).

# gcloud: verify automated backups are enabled
gcloud sql instances describe my-instance \
  --format="value(settings.backupConfiguration)"

Key settings to configure:

Setting	Recommendation	Why
`backupConfiguration.enabled`	`true`	Non-negotiable for production
`backupConfiguration.startTime`	Off-peak hours (e.g. `04:00` UTC)	Minimizes performance impact
`backupConfiguration.backupRetentionSettings.retainedBackups`	14-30	Gives you a wider recovery window
`backupConfiguration.pointInTimeRecoveryEnabled`	`true`	Enables PITR (see below)
`backupConfiguration.transactionLogRetentionDays`	7	How far back PITR can reach

Point-in-Time Recovery (PITR)

Automated backups give you daily snapshots. PITR fills the gaps by continuously archiving write-ahead logs (WAL for PostgreSQL, binary logs for MySQL). This lets you restore to any second within the retention window — not just to the time of the last backup.

# Enable PITR on an existing instance
gcloud sql instances patch my-instance \
  --enable-point-in-time-recovery \
  --retained-transaction-log-days=7

PITR is the single most important setting for disaster recovery. Without it, you lose every write between your last automated backup and the incident.

On-Demand Backups

You can trigger a backup manually before risky operations:

gcloud sql backups create --instance=my-instance \
  --description="pre-migration-backup-2026-04-08"

Rule of thumb: always take an on-demand backup before running migrations, bulk data operations, or any ad-hoc SQL against production.

Disaster Scenarios and Recovery Playbooks

Scenario 1: Accidental Table Drop or Data Deletion

What happened: A developer ran a DROP TABLE or DELETE FROM without a WHERE clause against production. Maybe it was a script meant for staging. Maybe an AI-generated SQL statement was executed without review.

Impact: One or more tables are gone or empty. The application is throwing 500s.

Recovery options:

Option A: PITR (best if available)

Restore to the moment just before the destructive command. You’ll need the approximate timestamp.

# Restore to a clone instance first — never restore directly over production
gcloud sql instances clone my-instance my-instance-recovery \
  --point-in-time="2026-04-08T10:59:00Z"

This creates a new instance with the database state at that exact second. You can then:

Verify the data on the clone
Export the affected tables from the clone
Import them back into the production instance

# Export a specific table from the recovery clone
gcloud sql export sql my-instance-recovery gs://my-bucket/recovery/users-table.sql \
  --database=myapp_production \
  --table=users

# Import into production
gcloud sql import sql my-instance gs://my-bucket/recovery/users-table.sql \
  --database=myapp_production

Option B: Restore from automated backup

If PITR is not enabled, restore the most recent automated backup that predates the incident.

# List available backups
gcloud sql backups list --instance=my-instance

# Restore a specific backup (this overwrites the instance)
gcloud sql backups restore BACKUP_ID --restore-instance=my-instance

Warning: Restoring a backup directly onto your production instance overwrites everything. All writes since that backup are lost. Prefer cloning to a recovery instance first.

The data gap problem:

When you restore from a backup taken at, say, 4:00 AM, but the incident happened at 11:00 AM, you lose 7 hours of data. This is the gap you’ll need to address manually. Common strategies:

Application-level event logs: If your app publishes events to a message queue (Kafka, Pub/Sub), you can replay them.
Analytics replicas: If you replicate data to BigQuery, Snowflake, or another analytics store, you can query the missing records from there and re-import them.
Audit tables: If your application logs changes to an audit table in a separate database, those records survive.

-- Example: querying BigQuery for records created during the gap window
SELECT *
FROM `project.dataset.user_actions`
WHERE created_at BETWEEN TIMESTAMP('2026-04-08 04:00:00', 'America/Vancouver')
  AND TIMESTAMP('2026-04-08 11:00:00', 'America/Vancouver')
  AND action_type = 'account_status_change'

You then re-ingest these records into production, typically via a script run in your application’s console or through a migration task.

Scenario 2: Interrupted Background Job

What happened: A critical scheduled job — say, one that generates weekly records for all active users — was running when the incident occurred. The database was restored from backup, but the job was killed mid-execution. Some users got their records; others didn’t.

Impact: No application errors (the data that exists is valid), but there’s a silent gap. Some users are missing records they should have.

Recovery playbook:

Step 1 — Quantify the gap

Before doing anything, measure what’s missing:

# Find users who should have a record but don't
target_date = Date.parse('2026-05-30')
users_missing = User.where(status: ['active', 'subscribed'])
  .where.not(id: WeeklyRecord.where(week_date: target_date).select(:user_id))
users_missing.count

Record the count. You’ll need it for verification later.

Step 2 – Understand the generation logic

Before re-running anything, understand what the job does:

Does it check for existing records before creating? (idempotent?)
Does it behave differently based on user status? (e.g., suspended users get a different treatment)
Does it trigger side effects? (emails, webhooks, billing)

If the job is idempotent — meaning running it twice for the same user produces the same result without duplicates — you can safely re-run it for all users, not just the ones missing records. This is simpler and safer than trying to target only the gap.

Step 3 – Re-run with guardrails

Write a targeted script rather than re-triggering the entire job:

			
target_date = Date.parse('2026-05-30')
# Pre-check
baseline_count = WeeklyRecord.where(week_date: target_date).count
puts "Records before: #{baseline_count}"
# Find and process missing users
users_missing = User.where(status: ['active', 'subscribed'])
  .where.not(id: WeeklyRecord.where(week_date: target_date).select(:user_id))
puts "Users missing records: #{users_missing.count}"
users_missing.find_each do |user|
  WeeklyRecordGenerator.new(user).generate(target_date)
rescue => e
  puts "Failed for User ##{user.id}: #{e.message}"
end
# Post-check
new_count = WeeklyRecord.where(week_date: target_date).count
puts "Records after: #{new_count}"
puts "Delta: #{new_count - baseline_count}"

		

Step 4 – Verify

Check that:

The record count increased by the expected amount
No duplicates were created
No users are still missing records
Any status-dependent logic was applied correctly (e.g., suspended users got the right treatment)

Scenario 3: Corrupted Data from a Bad Migration

What happened: A migration altered a column type, dropped a constraint, or backfilled data incorrectly. The application is running but producing wrong results.

Impact: Data is present but incorrect. This is often harder to detect than missing data.

Recovery playbook:

Don’t panic-restore. If the app is functional (just producing wrong data), you have time to assess.
Clone to a recovery instance from a backup predating the migration: gcloud sql instances clone my-instance pre-migration-clone \ --point-in-time="2026-04-07T23:00:00Z"
Diff the data between production and the clone to understand exactly what changed: -- Compare row counts SELECT 'production' as source, count(*) FROM production.orders UNION ALL SELECT 'backup' as source, count(*) FROM backup_clone.orders; -- Find rows that differ SELECT p.id, p.amount as prod_amount, b.amount as backup_amount FROM production.orders p JOIN backup_clone.orders b ON p.id = b.id WHERE p.amount != b.amount;
Write a targeted fix rather than a full restore (which would lose post-migration legitimate writes).
Write a rollback migration if the schema change itself was the problem.

Scenario 4: Full Instance Failure

What happened: The Cloud SQL instance is unreachable – maybe a zone outage, maybe accidental instance deletion.

Recovery options:

If the instance still exists (zone outage):

Cloud SQL instances configured for high availability will automatically failover to a standby in another zone. If you don’t have HA enabled:

			
# Enable HA (requires instance restart)
gcloud sql instances patch my-instance --availability-type=REGIONAL

If the instance was deleted:

Deleted instances can be recovered within a limited window if deletion protection wasn’t bypassed:

			
# Enable deletion protection
gcloud sql instances patch my-instance --deletion-protection

If truly gone, restore from the most recent backup to a new instance:

			
gcloud sql instances create my-instance-restored \
  --source-backup=BACKUP_ID \
  --tier=db-custom-4-16384 \
  --region=us-west1

Then update your application’s database connection string to point to the new instance.

Prevention Checklist

The best disaster recovery is the one you never need. Here’s what to set up before things go wrong:

Cloud SQL Configuration

			
# The production-ready configuration checklist
gcloud sql instances patch my-instance \
  --backup-start-time=04:00 \
  --enable-point-in-time-recovery \
  --retained-transaction-log-days=7 \
  --retained-backups-count=30 \
  --deletion-protection \
  --availability-type=REGIONAL

		

Operational Practices

1. Never run ad-hoc SQL directly against production

Use a read replica for investigative queries. If you must write, use a transaction with a manual ROLLBACK checkpoint:

BEGIN;

-- Your change here
UPDATE users SET status = 'inactive' WHERE last_login < '2025-01-01';

-- Verify before committing
SELECT count(*) FROM users WHERE status = 'inactive';

-- Only if the count looks right:
COMMIT;
-- Otherwise:
ROLLBACK;

2. Take on-demand backups before risky operations

			
gcloud sql backups create --instance=my-instance \
  --description="pre-bulk-update-$(date +%Y%m%d-%H%M%S)"

3. Review AI-generated SQL before executing

AI tools are excellent at generating SQL, but they don’t understand your data invariants. A syntactically correct DROP TABLE or DELETE without a WHERE clause is still catastrophic. Always:

Read the generated SQL line by line
Run it on staging first
Wrap destructive operations in a transaction
Have a second pair of eyes for DDL changes

4. Maintain an analytics replica

Replicate critical tables to BigQuery or another analytics store. This serves as both an analytics platform and a recovery source. If your primary database loses data, you can query the replica for the gap window and re-ingest.

			
# Set up a BigQuery data transfer from Cloud SQL
bq mk --transfer_config \
  --target_dataset=sql_replica \
  --display_name="Production SQL Replica" \
  --data_source=scheduled_query \
  --schedule="every 1 hours"

		

5. Use IAM to restrict destructive operations

Not every developer needs cloudsql.instances.delete or direct SQL access to production:

			
# Create a read-only role for most developers
gcloud projects add-iam-policy-binding my-project \
  --member="group:developers@company.com" \
  --role="roles/cloudsql.viewer"
# Grant write access only to the ops team
gcloud projects add-iam-policy-binding my-project \
  --member="group:database-ops@company.com" \
  --role="roles/cloudsql.admin"

		

The Recovery Timeline: What Happens in Practice

Here’s what a real recovery typically looks like, end to end:

			
T+0min    Incident detected (alerts fire, app errors spike)
T+5min    Confirm the issue — is it a code bug or data loss?
T+10min   Identify the last good backup / PITR target
T+15min   Clone instance from backup (takes 5-30 min depending on size)
T+45min   Verify restored data on the clone
T+60min   Restore production from clone or selectively import tables
T+90min   Identify the data gap (writes between backup and incident)
T+120min  Query analytics replica / event logs for gap data
T+150min  Re-ingest gap data, verify counts
T+180min  Re-run interrupted jobs with verification
T+210min  Final validation — all counts match, no duplicates, app healthy
T+240min  Post-incident review

		

The total time depends on database size, gap complexity, and whether you had PITR enabled. With PITR, the gap is seconds. Without it, you could be looking at hours of manual data reconciliation.

Key Takeaways

Enable PITR. It’s the difference between losing seconds of data and losing hours.
Always clone to a recovery instance first. Never restore directly over production unless you have no other option.
Maintain an analytics replica. It’s your insurance policy for the data gap.
Quantify before you fix. Record counts before and after every recovery step. You can’t verify what you didn’t measure.
Understand your jobs’ idempotency. If a background job was interrupted, knowing whether it’s safe to re-run is the difference between a smooth recovery and creating a bigger mess.
Take on-demand backups before risky operations. The 30 seconds it takes could save you 4 hours of recovery.
Review all SQL before execution. Especially AI-generated SQL. Trust, but verify.

Production incidents are stressful, but with the right configuration and a clear playbook, they don’t have to be catastrophic. Set up your backups today — future you will be grateful.

Happy fixing!

Rails 8 App: Create an Academic software app using SQL without using ActiveRecord- Part 2 | students | courses | schools

Design: Our Students Table -> course -> school

We need a UNIQUE constraint on user_id because:

✅ One student per user (user_id should be unique)
✅ Multiple students per course (course_id can be repeated)

Check Migration Files:

Students: Github: 20250709072153_create_students.rb
Courses: Github: 20250709155145_create_courses.rb
Schools: Github: 20250709155152_create_schools.rb

Key Changes:

✅ Added UNIQUE constraint: CONSTRAINT uk_students_user_id UNIQUE (user_id)
🔧 Fixed typos:

TIMSTAMP → TIMESTAMP
stidents → students

📈 Optimized indexes: No need for user_id index since UNIQUE creates one automatically

Business Logic Validation:

user_id: One student per user ✅
course_id: Multiple students per course ✅
school_id: Multiple students per school ✅

This ensures referential integrity and business rules are enforced at the database level!

📁 Schema Storage Options:

Rails allows you to store the schema in SQL format instead of the default Ruby format. Let me explain the options and why you’d choose each:

1. Ruby Format (Default)

# db/schema.rb
ActiveRecord::Schema[8.0].define(version: 2025_07_09_074552) do
  enable_extension "pg_catalog.plpgsql"

  create_table "users", force: :cascade do |t|
    t.string "first_name", limit: 100, null: false
    t.string "email", limit: 150, null: false
    t.datetime "created_at", null: false
    t.index ["email"], name: "idx_users_email"
  end
end

2. SQL Format

-- db/structure.sql
CREATE EXTENSION IF NOT EXISTS pg_catalog.plpgsql;

CREATE TYPE gender_enum AS ENUM ('male', 'female', 'not-specified');

CREATE TABLE users (
    id bigserial PRIMARY KEY,
    first_name varchar(100) NOT NULL,
    email varchar(150) NOT NULL,
    created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE UNIQUE INDEX idx_users_email ON users(email);

Check structure.sql File: Github: db/structure.sql

🔧 How to Switch to SQL Format:

Add this to config/application.rb:

module AcademicSqlSoftware
  class Application < Rails::Application
    # ... existing config ...

    # Use SQL format for schema
    config.active_record.schema_format = :sql
  end
end

Then regenerate the schema:

# Generate SQL schema file
rails db:schema:dump

This will create db/structure.sql instead of db/schema.rb.

Comparison Table:

Feature	Ruby Format (`schema.rb`)	SQL Format (`structure.sql`)
Default	✅ Yes	❌ No
Database Agnostic	✅ Yes	❌ No (DB-specific)
Custom SQL Features	❌ Limited	✅ Full support
Views	❌ No	✅ Yes
Stored Procedures	❌ No	✅ Yes
Triggers	❌ No	✅ Yes
Custom Types/ENUMs	❌ Limited	✅ Full support
Partial Indexes	❌ Limited	✅ Yes
Check Constraints	❌ Limited	✅ Yes
Version Control	✅ Easy to diff	⚠️ Harder to diff
Readability	✅ Ruby-like	⚠️ Raw SQL

When to Use Each:

Choose SQL Format When:

✅ Using PostgreSQL-specific features (ENUMs, arrays, JSONB operators)
✅ Need database views, triggers, or stored procedures
✅ Using complex constraints (CHECK, partial indexes)
✅ Working with custom data types
✅ Team is SQL-proficient

Choose Ruby Format When:

✅ Want database portability (switching between PostgreSQL/MySQL/SQLite)
✅ Team prefers Ruby syntax
✅ Using standard Rails features only
✅ Need easier version control diffs

For Our Project:

I recommend SQL format because you’re using:

✅ PostgreSQL ENUMs (gender_enum, category_enum) – Rails 8 supports this
✅ Custom constraints (CHECK (price > 0)) – Rails 8 supports this
✅ Complex foreign key relationships
✅ Raw SQL migrations

Our config/application.rb should include:

config.active_record.schema_format = :sql

Then run:

rails db:schema:dump

This will give us full control over PostgreSQL features while maintaining schema consistency! 🎯

🌱 Seed Data

1. Let’s refactor the seed file into modular components. I’ll create separate seeder classes for each model type and organize them properly.

2. Let’s me create individual seeder classes for each model, the seeders for the dependent models (orders and students), refactor the main seeds.rb file to use the new modular structure, create a documentation file for the new modular structure.

3. I’ve successfully refactored our seed file into a modular, maintainable architecture. Here’s what we’ve created:

🎯 Modular Seed System Overview

📁 File Structure:

db/
├── seeds.rb                    # Main orchestrator (now 20 lines vs 246!)
└── seeds/
    ├── README.md              # Complete documentation
    ├── base_seeder.rb         # Shared functionality & config
    ├── users_seeder.rb        # User creation logic
    ├── products_seeder.rb     # Product creation logic  
    ├── courses_seeder.rb      # Course creation logic
    ├── schools_seeder.rb      # School creation logic
    ├── orders_seeder.rb       # Order creation (with dependencies)
    └── students_seeder.rb     # Student creation (with constraints)

Key Benefits:

✅ Separation of Concerns:

Each model has its own dedicated seeder class
Clear responsibility boundaries
Easy to modify individual model generation

✅ Maintainability:

Single file changes vs monolithic seed file
Inheritance from BaseSeeder for shared functionality
Comprehensive documentation

✅ Flexibility:

Easy to add new models
Environment-specific configuration
Configurable record counts

✅ Robustness:

Proper dependency handling
Constraint compliance (stock, unique keys)
Error handling and validation

📋 Usage Examples:

# Basic seeding
rails db:seed

# Clear data first
CLEAR_DATA=true rails db:seed

# Scale up for testing
# Edit USERS_COUNT in base_seeder.rb then:
rails db:seed

Check my Github Seed File: Seed File – Academic SQL App

🛠️ Commit the Refactored System:

# Stage all the new modular files
git add db/seeds.rb \
        db/seeds/

# Commit with comprehensive message
git commit -m "Refactor seed system into modular architecture" -m "

Modular Structure:
- Split monolithic seeds.rb into focused seeder classes
- Create BaseSeeder with shared functionality and configuration
- Separate concerns: each model has dedicated seeder
- Add comprehensive documentation in seeds/README.md

Benefits:
- Maintainability: Easy to modify individual model generation
- Flexibility: Simple to add new models or adjust data
- Robustness: Proper dependency handling and constraint compliance
- Testability: Individual seeders can be tested in isolation

Features:
- Environment-specific behavior (auto-clear in development)
- Smart constraint handling (stock validation, unique keys)
- Progress indicators and comprehensive reporting
- Configurable record counts via constants

Files:
- db/seeds.rb: Main orchestrator (20 lines vs 246)
- db/seeds/base_seeder.rb: Shared functionality & config
- db/seeds/*_seeder.rb: Individual model seeders
- db/seeds/README.md: Complete documentation

The system is now ready for production use and easy to extend."

🎉 What We Can Do Now:

Scale Easily:

# In base_seeder.rb - change these and run rails db:seed
USERS_COUNT = 10_000
PRODUCTS_COUNT = 1_000  
ORDERS_COUNT = 50_000

Add New Models: Create new_model_seeder.rb, inherit from BaseSeeder, add to orchestration.

Customize Data: Edit individual seeder constants and attribute methods.

Test Components:

users = UsersSeeder.create_users
products = ProductsSeeder.create_products

Our seed system is now enterprise-ready with clean architecture, comprehensive documentation, and infinite scalability! 🚀

Possible Refactoring

Now Let’s move the helper methods to a dedicated helper folder for better organization. I’ll create a helpers directory and move the shared utility methods there.

Here’s how to refactor your seed helper methods into a new helpers folder under db/seeds:

Create a directory:
db/seeds/helpers/
Move all shared/random data methods (like random_gender, random_category, etc.) into a module, e.g. db/seeds/helpers/data_generators.rb:

# db/seeds/helpers/data_generators.rb
module DataGenerators
  def random_gender
    ['male', 'female', 'not-specified'].sample
  end

  def random_category
    ['men', 'women', 'kids', 'infants'].sample
  end

  def random_date_of_birth
    Date.today - rand(18..80).years - rand(365).days
  end

  def random_phone
    "+1#{rand(100..999)}-#{rand(100..999)}-#{rand(1000..9999)}"
  end

  def random_price
    [9.99, 19.99, 29.99, 49.99, 99.99, 199.99, 299.99, 499.99].sample
  end

  def random_stock
    [0, 5, 10, 25, 50, 100, 500].sample
  end

  def random_past_date(days_ago_max = 365)
    Date.today - rand(1..days_ago_max).days
  end
end

In your seeders and base seeder, require and include the helper:

# At the top of each seeder or in base_seeder.rb
require_relative 'helpers/data_generators'

class UsersSeeder < BaseSeeder
  extend DataGenerators
  # ... now you can call random_gender, random_category, etc.
end

Remove the helper methods from BaseSeeder and use the module instead.

Summary:

Move all shared/random data methods to db/seeds/helpers/data_generators.rb as a module.
require_relative and extend or include the module in your seeders.
This keeps your seed logic clean and DRY!

Our BaseSeeder still contains all the random data and progress helper methods, which are now in db/seeds/helpers/data_generators.rb and db/seeds/helpers/ui_helpers.rb.

Here’s what we should do:

Remove all the methods at the bottom of BaseSeeder that are now in the helpers.
At the top of base_seeder.rb, require the helper modules.
If you want to use the helpers as class methods, use extend for each module.

Here’s the refactor plan:

Remove:

random_gender
random_category
random_date_of_birth
random_phone
random_price
random_stock
random_past_date
progress_indicator

Add at the top:

   require_relative 'helpers/data_generators'
   require_relative 'helpers/ui_helpers'
   extend DataGenerators
   extend UiHelpers

Changes Made:

In UiHelpers:
- Removed the hardcoded print_summary and print_verification methods.
- Ensured print_summary_box and print_verification_stats are ready to accept data.
In BaseSeeder:
- Now dynamically creates a summary_data hash with the final record counts.
- Creates a verification_stats hash with key metrics.
- Calls UiHelpers.print_summary_box and UiHelpers.print_verification_stats, passing in the data.

Benefits of this Refactor:

More Flexible: You can now easily add or remove items from the summary and verification sections just by changing the hashes in BaseSeeder.
Decoupled: The UiHelpers module is now purely for presentation and has no knowledge of your specific models.
Cleaner Code: The logic is now clearly separated—BaseSeeder gathers the data, and UiHelpers displays it.

Now the BaseSeeder is clean and only contains orchestration logic and configuration. All random data and UI/progress helpers are now properly delegated to the new helper modules.

We can now use these helpers in any seeder by calling random_gender, progress_indicator, etc., thanks to the extend statements. This makes our seed system much more modular, maintainable, and DRY!

That’s a great suggestion for further refactoring. It makes sense to keep all data-related logic, including aggregation for summaries, within the DataGenerators helper.

Let’s proceed. I’ll move the hash construction into new methods inside db/seeds/helpers/data_generators.rb and then update BaseSeeder to call them.
Now that the data generation has been moved, I’ll update the BaseSeeder to call the new methods.

Issues Fixed:

UiHelpers method calls: All seeder files were calling progress_indicator directly, but since the methods in UiHelpers are defined as class methods, they needed to be prefixed with UiHelpers.. Fixed in:

UsersSeeder
ProductsSeeder
CoursesSeeder
SchoolsSeeder
OrdersSeeder
StudentsSeeder

DataGenerators references: The BaseSeeder was trying to call DataGenerators.summary_data and DataGenerators.verification_stats, but that file was deleted. I updated it to use the local summary_data and verification_stats methods that were already defined in the BaseSeeder class.

Seeding Results:

✅ 10,000 users created
✅ 1,000 products created
✅ 50 courses created
✅ 25 schools created
✅ 5,000 orders created
✅ 5,000 students created

The verification stats show proper data relationships:

4,004 users have orders
647 products have been ordered
756 products are active
200 students per school (average)
100 students per course (average)

✅ Final Architecture:

DataGenerators: Is now responsible for all data-related tasks, including generating random primitive data (random_phone) and creating aggregated summary data (summary_data, verification_stats).
UiHelpers: Is responsible for all presentation logic, taking data as input and printing it to the console in a formatted way.
Individual Seeders (UsersSeeder, etc.): Responsible for the business logic of creating a specific type of record, using helpers for data and UI.
BaseSeeder: The main orchestrator. It knows the correct order to call the individual seeders and delegates all data and UI tasks to the appropriate helpers.
seeds.rb: The single entry point that kicks off the entire process.

to be continued … 🚀

Rails 8 App: Create an Academic software app using SQL without using ActiveRecord – Part 1 | users | products | orders

Let’s create a Rails 8 app which use SQL queries with raw SQL instead of ActiveRecord. Let’s use the full Rails environment with ActiveRecord for infrastructure, but bypass AR’s ORM features for pure SQL writing. Let me guide you through this step by step:

Step 1: Create the Rails App with ActiveRecord and PostgreSQL (skipping unnecessary components)

rails new academic-sql-software --database=postgresql --skip-action-cable --skip-jbuilder --skip-solid --skip-kamal

What we’re skipping and why:

–skip-action-cable: No WebSocket functionality needed
–skip-jbuilder: No JSON API views needed for our SQL practice app
–skip-solid: Skips Solid Cache and Solid Queue (we don’t need caching or background jobs)
–skip-kamal: No deployment configuration needed

What we’re keeping:

ActiveRecord: For database connection management and ActiveRecord::Base.connection.execute()
ActionController: For creating web interfaces to display our SQL query results
ActionView: For creating simple HTML pages to showcase our SQL learning exercises
PostgreSQL: Our database for practicing advanced SQL features

Why this setup is perfect for App with raw SQL:

Minimal Rails app focused on database interactions
Full Rails environment for development conveniences
ActiveRecord infrastructure without ORM usage
Clean setup without unnecessary overhead

=> Open config/application.rb and comment the following for now:

# require "active_job/railtie"
...
# require "active_storage/engine"
...
# require "action_mailer/railtie"
# require "action_mailbox/engine"
...
# require "action_cable/engine"

=> Open config/environments/development.rb config/environments/production.rb config/environments/test.rb comment action_mailer

🤔 Why I am using ActiveRecord (even though I don’t want the ORM):

Database Connection Management: ActiveRecord provides robust connection pooling, reconnection handling, and connection management
Rails Integration: Seamless integration with Rails console, database tasks (rails db:create, rails db:migrate), and development tools
Raw SQL Execution: We get ActiveRecord::Base.connection.execute() which is perfect for our raw SQL writing.
Migration System: Easy table creation and schema management with migrations (even though we’ll query with raw SQL)
Database Configuration: Rails handles database.yml configuration, environment switching, and connection setup
Development Tools: Access to Rails console for testing queries, database tasks, and debugging

Our Learning Strategy: We’ll use ActiveRecord’s infrastructure but completely bypass its ORM methods. Instead of Student.where(), we’ll use ActiveRecord::Base.connection.execute("SELECT * FROM students WHERE...")

Step 2: Navigate to the project directory

cd academic-sql-software

Step 3: Verify PostgreSQL setup

# Check if PostgreSQL is running
brew services list | grep postgresql
# or
pg_ctl status

Database Foundation: PostgreSQL gives us advanced SQL features:

Complex JOINs (INNER, LEFT, RIGHT, FULL OUTER)
Window functions (ROW_NUMBER, RANK, LAG, LEAD)
Common Table Expressions (CTEs)
Advanced aggregations and subqueries

Step 4: Install dependencies

bundle install

What this gives us:

pg gem: Pure PostgreSQL adapter (already included with --database=postgresql)
ActiveRecord: For connection management only
Rails infrastructure: Console, generators, rake tasks

Step 5: Create the PostgreSQL databases

✗ rails db:create
Created database 'academic_sql_software_development'
Created database 'academic_sql_software_test

Our Development Environment:

Creates academic_sql_software_development and academic_sql_software_test
Sets up connection pooling and management
Enables us to use Rails console for testing queries: rails console then ActiveRecord::Base.connection.execute("SELECT 1")

Our Raw SQL Approach:

# We'll use this pattern throughout our app:
connection = ActiveRecord::Base.connection
result = connection.execute("SELECT s.name, t.subject FROM students s INNER JOIN teachers t ON s.teacher_id = t.id")

Why not pure pg gem:

Would require manual connection management
No Rails integration (no console, no rake tasks)
More boilerplate code for connection handling
Loss of Rails development conveniences

Why not pure ActiveRecord ORM:

We want to do SQL query writing, not ActiveRecord methods.
Need to understand database performance implications.
Want to practice complex queries that might be harder to express in ActiveRecord.

Step 6: Create Users table

mkdir -p db/migrate

class CreateUsers < ActiveRecord::Migration[8.0]
  def up
    # create users table
    execute <<~SQL
      CREATE TABLE users (
        id INT,
        username VARCHAR(200),
        email VARCHAR(150),
        phone_number VARCHAR(20)
      );
    SQL
  end

  def down
    execute <<~SQL
      DROP TABLE users;
    SQL
  end
end

class CreateOrders < ActiveRecord::Migration[8.0]
  def up
    # create table orders
    execute <<~SQL
    SQL
  end

  def down
    execute <<~SQL
    SQL
  end
end

execute <<~SQL is a Rails migration method that allows you to run raw SQL statements. Let me break it down:

Components:

execute – A Rails migration method that executes raw SQL directly against the database
<<~SQL – Ruby’s “squiggly heredoc” syntax for multi-line strings that automatically strips leading whitespace (read: https://www.rubyguides.com/2018/11/ruby-heredoc/)

Usage:

class SomeMigration < ActiveRecord::Migration[8.0]
  def change
    execute <<~SQL
      CREATE INDEX CONCURRENTLY idx_users_email_lower 
      ON users (LOWER(email));
    SQL
  end
end

Why use it?

Database-specific features: When you need PostgreSQL-specific syntax, MySQL features, etc.
Complex SQL: For operations that Rails DSL doesn’t support easily
Performance: Direct SQL can sometimes be more efficient
Raw SQL requirements: When you specifically need to write SQL (as mentioned in our requirement)

Example for creating a users table with raw SQL:

class CreateUsers < ActiveRecord::Migration[8.0]
  def up
    execute <<~SQL
      CREATE TABLE users (
        id BIGSERIAL PRIMARY KEY,
        name VARCHAR(255) NOT NULL,
        email VARCHAR(255) NOT NULL,
        created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
        updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
      );

      CREATE UNIQUE INDEX idx_users_email ON users(email);
    SQL
  end

  def down
    execute <<~SQL
      DROP TABLE IF EXISTS users;
    SQL
  end
end

The <<~SQL heredoc ends when it encounters the closing SQL keyword, and the ~ strips indentation so your SQL can be properly formatted within the Ruby code.

Q) 🤔 Is it a best practise to repeat the database constraints in model validation in Rails? DB level vs App level? or both?

Best Practice: Use BOTH 🎯

Why both levels are recommended:

Database Level (Last Line of Defense)

-- Database constraints
price DECIMAL(10,2) NOT NULL CHECK (price > 0),
category product_category NOT NULL,
title VARCHAR(250) NOT NULL

Pros:

✅ Data integrity guarantee – No bad data can enter, ever
✅ Performance – Database enforces constraints efficiently
✅ Bulletproof – Protects against bulk operations, direct SQL, data imports
✅ Concurrency safe – Prevents race conditions
✅ Multi-application protection – If other apps access same DB

Application Level (User Experience)

validates :price, presence: true, numericality: { greater_than: 0 }
validates :category, inclusion: { in: %w[men women kids infants] }

Pros:

✅ User-friendly errors – “Price must be greater than 0” vs “CHECK constraint violated”
✅ Better UX – Validation before form submission
✅ Easier testing – Can unit test validations
✅ Conditional logic – Complex business rules
✅ Framework features – Callbacks, custom validators

Real-world scenarios where each matters:

Database saves you when:

# Bulk operations bypass Rails validations
Product.update_all(price: -10)  # DB constraint prevents this

# Direct SQL injection attempts
# DB constraints are your last line of defense

App validations save you when:

# User gets friendly error instead of:
# PG::CheckViolation: ERROR: new row violates check constraint
@product = Product.new(price: -5)
@product.valid? # => false
@product.errors.full_messages # => ["Price must be greater than 0"]

Practical Implementation:

class Product < ApplicationRecord
  # App-level validations for UX
  validates :title, presence: true, length: { maximum: 250 }
  validates :price, presence: true, numericality: { greater_than: 0 }
  validates :category, inclusion: { in: %w[men women kids infants] }

  # Don't duplicate precision validation if DB handles it
  # The DECIMAL(10,2) constraint is sufficient at DB level
end

-- DB-level constraints for data integrity
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  title VARCHAR(250) NOT NULL,
  price DECIMAL(10,2) NOT NULL CHECK (price > 0),
  category product_category NOT NULL,
  -- DB handles precision automatically with DECIMAL(10,2)
);

What NOT to duplicate:

❌ Precision constraints – DECIMAL(10,2) handles this perfectly
❌ Data type validation – DB enforces INTEGER, BOOLEAN, etc.
❌ Complex regex patterns – Better handled in app layer

Conclusion:

Use both, but strategically:

Database: Core data integrity, type constraints, foreign keys
Application: User experience, business logic, conditional rules
Don’t over-duplicate simple type/precision constraints that DB handles well

This approach gives you belt and suspenders protection with optimal user experience.

to be continued … 🚀

Rails 8 App: Setup Test DB in PostgreSQL | Write SQL Queries | Operators | Joins

Here’s a list of commonly used SQL comparison operators with brief explanations and examples:

📋 Basic Comparison Operators:

Operator	Meaning	Example	Result
`=`	Equal to	`WHERE age = 25`	Matches rows where `age` is 25
`<>`	Not equal to (standard)	`WHERE status <> 'active'`	Matches rows where status is not `'active'`
`!=`	Not equal to (alternative)	`WHERE id != 10`	Same as `<>`, matches if id is not 10
`>`	Greater than	`WHERE salary > 50000`	Matches rows with salary above 50k
`<`	Less than	`WHERE created_at < '2024-01-01'`	Matches dates before Jan 1, 2024
`>=`	Greater than or equal	`WHERE age >= 18`	Matches age 18 and above
`<=`	Less than or equal	`WHERE age <= 65`	Matches age 65 and below

📋 Other Common Operators:

Operator	Meaning	Example
`BETWEEN`	Within a range	`WHERE price BETWEEN 100 AND 200`
`IN`	Match any value in a list	`WHERE country IN ('US', 'CA', 'UK')`
`NOT IN`	Not in a list	`WHERE role NOT IN ('admin', 'staff')`
`IS NULL`	Value is null	`WHERE deleted_at IS NULL`
`IS NOT NULL`	Value is not null	`WHERE updated_at IS NOT NULL`
`LIKE`	Pattern match (case-insensitive in some DBs)	`WHERE name LIKE 'J%'`
`ILIKE`	Case-insensitive `LIKE` (PostgreSQL only)	`WHERE email ILIKE '%@gmail.com'`

Now we’ve our products and product_variants schema, let’s re-explore all major SQL JOINs using these two related tables.

####### Products

   Column    |              Type              | Collation | Nullable |               Default
-------------+--------------------------------+-----------+----------+--------------------------------------
 id          | bigint                         |           | not null | nextval('products_id_seq'::regclass)
 description | text                           |           |          |
 category    | character varying              |           |          |
 created_at  | timestamp(6) without time zone |           | not null |
 updated_at  | timestamp(6) without time zone |           | not null |
 name        | character varying              |           | not null |
 rating      | numeric(2,1)                   |           |          | 0.0
 brand       | character varying              |           |          |

######## Product variants

      Column      |              Type              | Collation | Nullable |                   Default
------------------+--------------------------------+-----------+----------+----------------------------------------------
 id               | bigint                         |           | not null | nextval('product_variants_id_seq'::regclass)
 product_id       | bigint                         |           | not null |
 sku              | character varying              |           | not null |
 mrp              | numeric(10,2)                  |           | not null |
 price            | numeric(10,2)                  |           | not null |
 discount_percent | numeric(5,2)                   |           |          |
 size             | character varying              |           |          |
 color            | character varying              |           |          |
 stock_quantity   | integer                        |           |          | 0
 specs            | jsonb                          |           | not null | '{}'::jsonb
 created_at       | timestamp(6) without time zone |           | not null |
 updated_at       | timestamp(6) without time zone |           | not null |

💎 SQL JOINS with `products` and `product_variants`

These tables are related through:

product_variants.product_id → products.id

So we can use that for all join examples.

🔸 1. INNER JOIN – Show only products with variants

SELECT 
  p.name, 
  pv.sku, 
  pv.price 
FROM products p
INNER JOIN product_variants pv ON p.id = pv.product_id;

♦️ Only returns products that have at least one variant.

🔸 2. LEFT JOIN – Show all products, with variants if available

SELECT 
  p.name, 
  pv.sku, 
  pv.price 
FROM products p
LEFT JOIN product_variants pv ON p.id = pv.product_id;

♦️ Returns all products, even those with no variants (NULLs in variant columns).

🔸 3. RIGHT JOIN – Show all variants, with product info if available

(Less common, but useful if variants might exist without a product record)

SELECT 
  pv.sku, 
  pv.price, 
  p.name 
FROM products p
RIGHT JOIN product_variants pv ON p.id = pv.product_id;

🔸 4. FULL OUTER JOIN – All records from both tables

SELECT 
  p.name AS product_name, 
  pv.sku AS variant_sku 
FROM products p
FULL OUTER JOIN product_variants pv ON p.id = pv.product_id;

♦️ Shows all products and all variants, even when there’s no match.

🔸 5. SELF JOIN Example (for product_variants comparing similar sizes or prices)

Let’s compare variants of the same product that are different sizes.

SELECT 
  pv1.product_id,
  pv1.size AS size_1,
  pv2.size AS size_2,
  pv1.sku AS sku_1,
  pv2.sku AS sku_2
FROM product_variants pv1
JOIN product_variants pv2 
  ON pv1.product_id = pv2.product_id 
  AND pv1.size <> pv2.size
WHERE pv1.product_id = 101;  -- example product

♦️ Useful to analyze size comparisons or price differences within a product.

🧬 Complex Combined JOIN Example

Show each product with its variants, and include only discounted ones (price < MRP):

SELECT 
  p.name AS product_name,
  pv.sku,
  pv.price,
  pv.mrp,
  (pv.mrp - pv.price) AS discount_value
FROM products p
INNER JOIN product_variants pv ON p.id = pv.product_id
WHERE pv.price < pv.mrp
ORDER BY discount_value DESC;

📑 JOIN Summary with These Tables

JOIN Type	Use Case
`INNER JOIN`	Only products with variants
`LEFT JOIN`	All products, even if they don’t have variants
`RIGHT JOIN`	All variants, even if product is missing
`FULL OUTER JOIN`	Everything — useful in data audits
`SELF JOIN`	Compare or relate rows within the same table

Let’s now look at JOIN queries with more realistic conditions using products and product_variants.

🦾 Advanced JOIN Queries with Conditions to practice

🔹 1. All products with variants in stock AND discounted

SELECT 
  p.name AS product_name,
  pv.sku,
  pv.size,
  pv.color,
  pv.stock_quantity,
  pv.mrp,
  pv.price,
  (pv.mrp - pv.price) AS discount_amount
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE pv.stock_quantity > 0
  AND pv.price < pv.mrp
ORDER BY discount_amount DESC;

♦️ Shows available discounted variants, ordered by discount.

🔹 2. Products with high rating (4.5+) and at least one low-stock variant (< 10 items)

SELECT 
  p.name AS product_name,
  p.rating,
  pv.sku,
  pv.stock_quantity
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE p.rating >= 4.5
  AND pv.stock_quantity < 10;

🔹 3. LEFT JOIN to find products with no variants or all variants out of stock

SELECT 
  p.name AS product_name,
  pv.id AS variant_id,
  pv.stock_quantity
FROM products p
LEFT JOIN product_variants pv 
  ON p.id = pv.product_id AND pv.stock_quantity > 0
WHERE pv.id IS NULL;

✅ This tells you:

Either the product has no variants
Or all variants are out of stock

🔹 4. Group and Count Variants per Product

SELECT 
  p.name AS product_name,
  COUNT(pv.id) AS variant_count
FROM products p
LEFT JOIN product_variants pv ON p.id = pv.product_id
GROUP BY p.name
ORDER BY variant_count DESC;

🔹 5. Variants with price-percentage discount more than 30%

SELECT 
  p.name AS product_name,
  pv.sku,
  pv.mrp,
  pv.price,
  ROUND(100.0 * (pv.mrp - pv.price) / pv.mrp, 2) AS discount_percent
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE pv.price < pv.mrp
  AND (100.0 * (pv.mrp - pv.price) / pv.mrp) > 30;

🔹 6. Color-wise stock summary for a product category

SELECT 
  p.category,
  pv.color,
  SUM(pv.stock_quantity) AS total_stock
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE p.category = 'Shoes'
GROUP BY p.category, pv.color
ORDER BY total_stock DESC;

These queries simulate real-world dashboard views: inventory tracking, product health, stock alerts, etc.

Happy SQL Query Writing! 🚀

Rails 8 App: Setup Test DB | Comprehensive Guide 📖 for PostgreSQL , Mysql Indexing – PostgreSQL Heap ⛰ vs Mysql InnoDB B-Tree 🌿

Enter into psql terminal:

✗ psql postgres
psql (14.17 (Homebrew))
Type "help" for help.

postgres=# \l
                                     List of databases
           Name            |  Owner   | Encoding | Collate | Ctype |   Access privileges
---------------------------+----------+----------+---------+-------+-----------------------
 studio_development | postgres | UTF8     | C       | C     |

Create a new test database
Create a users Table
Check the db and table details

postgres=# create database test_db;
CREATE DATABASE

test_db=# CREATE TABLE users (
user_id INT,
username VARCHAR(220),
email VARCHAR(150),
phone_number VARCHAR(20)
);
CREATE TABLE

test_db=# \dt
List of relations
 Schema | Name  | Type  |  Owner
--------+-------+-------+----------
 public | users | table | abhilash
(1 row)

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           |          |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |

Add a Primary key to users and check the user table.

test_db=# ALTER TABLE users ADD PRIMARY KEY (user_id);
ALTER TABLE

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           | not null |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (user_id)

# OR add primary key when creating the table:
CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150),
  phone_number VARCHAR(20)
);

You can a unique constraint and an index added when adding a primary key.

Why does adding a primary key also add an index?

A primary key must guarantee that each value is unique and fast to find.
Without an index, the database would have to scan the whole table every time you look up a primary key, which would be very slow.
So PostgreSQL automatically creates a unique index on the primary key to make lookups efficient and to enforce uniqueness at the database level.

👉 It needs the index for speed and to enforce the “no duplicates” rule of primary keys.

What is btree?

btree stands for Balanced Tree (specifically, a “B-tree” data structure).
It’s the default index type in PostgreSQL.
B-tree indexes organize the data in a tree structure, so that searches, inserts, updates, and deletes are all very efficient — about O(log n) time.
It’s great for looking up exact matches (like WHERE user_id = 123) or range queries (like WHERE user_id BETWEEN 100 AND 200).

👉 So when you see btree, it just means it’s using a very efficient tree structure for your primary key index.

Summary in one line:
Adding a primary key automatically adds a btree index to enforce uniqueness and make lookups super fast.

In MySQL (specifically InnoDB engine, which is default now):

Primary keys always create an index automatically.
The index is a clustered index — this is different from Postgres!
The index uses a B-tree structure too, just like Postgres.

👉 So yes, MySQL also adds an index and uses a B-tree under the hood for primary keys.

But here’s a big difference:

In InnoDB, the table data itself is stored inside the primary key’s B-tree.
- That’s called a clustered index.
- It means the physical storage of the table rows follows the order of the primary key.
In PostgreSQL, the index and the table are stored separately (non-clustered by default).

Example: If you have a table like this in MySQL:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150)
);

user_id will have a B-tree clustered index.
The rows themselves will be stored sorted by user_id.

Short version:

Database	Primary Key Behavior	B-tree?	Clustered?
PostgreSQL	Separate index created for PK	Yes	No (separate by default)
MySQL (InnoDB)	PK index + Table rows stored inside the PK’s B-tree	Yes	Yes (always clustered)

Why Indexing on Unique Columns (like `email`) Improves Lookup 🔍

Use Case

You frequently run queries like:

SELECT * FROM students WHERE email = 'john@example.com';

Without an index, this results in a full table scan — checking each row one-by-one.

With an index, the database can jump directly to the row using a sorted structure, significantly reducing lookup time — especially in large tables.

🌲 How SQL Stores Indexes Internally (PostgreSQL)

📚 PostgreSQL uses B-Tree indexes by default.

When you run:

CREATE UNIQUE INDEX idx_students_on_email ON students(email);

PostgreSQL creates a balanced B-tree like this:

          m@example.com
         /              \
  d@example.com     t@example.com
  /        \           /         \
...      ...        ...         ...

✅ Keys (email values) are sorted lexicographically.
✅ Each leaf node contains a pointer to the actual row in the students table (called a tuple pointer or TID).
✅ Lookup uses binary search, giving O(log n) performance.

⚙️ Unique Index = Even Faster

Because all email values are unique, the database:

Can stop searching immediately once a match is found.
Doesn’t need to scan multiple leaf entries (no duplicates).

🧠 Summary

Feature	Value
Index Type	B-tree (default in PostgreSQL)
Lookup Time	O(log n) vs O(n) without index
Optimized for	Equality search (`WHERE email = ...`), sorting, joins
Email is unique?	✅ Yes – index helps even more (no need to check multiple rows)
Table scan avoided?	✅ Yes – PostgreSQL jumps directly via B-tree lookup

What Exactly is a Clustered Index in MySQL (InnoDB)?

🔹 In MySQL InnoDB, the primary key IS the table.

🔹 A Clustered Index means:

The table’s data rows are physically organized in the order of the primary key.
No separate storage for the table – it’s merged into the primary key’s B-tree structure.

In simple words:
👉 “The table itself lives inside the primary key B-tree.”

That’s why:

Every secondary index must store the primary key value (not a row pointer).
InnoDB can only have one clustered index (because you can’t physically order a table in two different ways).

📈 Visual for MySQL Clustered Index

Suppose you have:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(255),
  email VARCHAR(255)
);

The storage looks like:

B-tree by user_id (Clustered)

user_id  | username | email
----------------------------
101      | Alice    | a@x.com
102      | Bob      | b@x.com
103      | Carol    | c@x.com

👉 Table rows stored directly inside the B-tree nodes by user_id!

🔵 PostgreSQL (Primary Key Index = Separate)

Imagine you have a users table:

users table (physical table):

row_id | user_id | username | email
-------------------------------------
  1    |   101   | Alice    | a@example.com
  2    |   102   | Bob      | b@example.com
  3    |   103   | Carol    | c@example.com

And the Primary Key Index looks like:

Primary Key B-Tree (separate structure):

user_id -> row pointer
 101    -> row_id 1
 102    -> row_id 2
 103    -> row_id 3

👉 When you query WHERE user_id = 102, PostgreSQL goes:

Find user_id 102 in the B-tree index,
Then jump to row_id 2 in the actual table.

🔸 Index and Table are separate.
🔸 Extra step: index lookup ➔ then fetch row.

🟠 MySQL InnoDB (Primary Key Index = Clustered)

Same users table, but stored like this:

Primary Key Clustered B-Tree (index + data together):

user_id | username | email
---------------------------------
  101   | Alice    | a@example.com
  102   | Bob      | b@example.com
  103   | Carol    | c@example.com

👉 When you query WHERE user_id = 102, MySQL:

Goes straight to user_id 102 in the B-tree,
Data is already there, no extra lookup.

🔸 Index and Table are merged.
🔸 One step: direct access!

📈 Quick Visual:

PostgreSQL
(Index)    ➔    (Table Row)
    |
    ➔ extra lookup needed

MySQL InnoDB
(Index + Row Together)
    |
    ➔ data found immediately

Summary:

PostgreSQL: primary key index is separate ➔ needs 2 steps (index ➔ table).
MySQL InnoDB: primary key index is clustered ➔ 1 step (index = table).

📚 How Secondary Indexes Work

Secondary Index = an index on a column that is not the primary key.

Example:

CREATE INDEX idx_username ON users(username);

Now you have an index on username.

🔵 PostgreSQL Secondary Index Behavior

Secondary indexes are separate structures from the table (just like the primary key index).
When you query by username, PostgreSQL:
1. Finds the matching row_id using the secondary B-tree index.
2. Then fetches the full row from the table by row_id.
This is called an Index Scan + Heap Fetch.

📜 Example:

Secondary Index (username -> row_id):

username -> row_id
------------------
Alice    -> 1
Bob      -> 2
Carol    -> 3

(users table is separate)

👉 Flexible, but needs 2 steps: index (row_id) ➔ table.

🟠 MySQL InnoDB Secondary Index Behavior

In InnoDB, secondary indexes don’t store row pointers.
Instead, they store the primary key value!

So:

Find the matching primary key using the secondary index.
Use the primary key to find the actual row inside the clustered primary key B-tree.

📜 Example:

Secondary Index (username -> user_id):

username -> user_id
--------------------
Alice    -> 101
Bob      -> 102
Carol    -> 103

(Then find user_id inside Clustered B-Tree)

✅ Needs 2 steps too: secondary index (primary key) ➔ clustered table.

📈 Quick Visual:

Feature	PostgreSQL	MySQL InnoDB
Secondary Index	username ➔ row pointer (row_id)	username ➔ primary key (user_id)
Fetch Full Row	Use row_id to get table row	Use primary key to find row in clustered index
Steps to Fetch	Index ➔ Table	Index ➔ Primary Key ➔ Table (clustered)

Action	PostgreSQL	MySQL InnoDB
Primary Key Lookup	Index ➔ Row (2 steps)	Clustered Index (1 step)
Secondary Index Lookup	Index (row_id) ➔ Row (2 steps)	Secondary Index (PK) ➔ Row (2 steps)
Storage Model	Separate index and table	Primary key and table merged (clustered)

🌐 Now, let’s do some Real SQL Query ⛁ Examples!

1. **Simple `SELECT * FROM users WHERE user_id = 102;`**

PostgreSQL:
Look into PK btree ➔ find row pointer ➔ fetch row separately.
MySQL InnoDB:
Directly find the row inside the PK B-tree (no extra lookup).

✅ MySQL is a little faster here because it needs only 1 step!

2. `SELECT username FROM users WHERE user_id = 102;` (Only 1 Column)

PostgreSQL:
Might do an Index Only Scan if all needed data is in the index (very fast).
MySQL:
Clustered index contains all columns already, no special optimization needed.

✅ Both can be very fast, but PostgreSQL shines if the index is “covering” (i.e., contains all needed columns). Because index table has less size than clustered index of mysql.

3. **`SELECT * FROM users WHERE username = 'Bob';` (Secondary Index Search)**

PostgreSQL:
Secondary index on username ➔ row pointer ➔ fetch table row.
MySQL:
Secondary index on username ➔ get primary key ➔ clustered index lookup ➔ fetch data.

✅ Both are 2 steps, but MySQL needs 2 different B-trees: secondary ➔ primary clustered.

Consider the below situation:

SELECT username FROM users WHERE user_id = 102;

user_id is the Primary Key.
You only want username, not full row.

Now:

🔵 PostgreSQL Behavior

👉 In PostgreSQL, by default:

It uses the primary key btree to find the row pointer.
Then fetches the full row from the table (heap fetch).

👉 But PostgreSQL has an optimization called Index-Only Scan.

If all requested columns are already present in the index,
And if the table visibility map says the row is still valid (no deleted/updated row needing visibility check),
Then Postgres does not fetch the heap.

👉 So in this case:

If the primary key index also stores username internally (or if an extra index is created covering username), Postgres can satisfy the query just from the index.

✅ Result: No table lookup needed ➔ Very fast (almost as fast as InnoDB clustered lookup).

📢 Postgres primary key indexes usually don’t store extra columns, unless you specifically create an index that includes them (INCLUDE (username) syntax in modern Postgres 11+).

🟠 MySQL InnoDB Behavior

In InnoDB:
Since the primary key B-tree already holds all columns (user_id, username, email),
It directly finds the row from the clustered index.
So when you query by PK, even if you only need one column, it has everything inside the same page/block.

✅ One fast lookup.

🔥 Why sometimes Postgres can still be faster?

If PostgreSQL uses Index-Only Scan, and the page is already cached, and no extra visibility check is needed,
Then Postgres may avoid touching the table at all and only scan the tiny index pages.
In this case, for very narrow queries (e.g., only 1 small field), Postgres can outperform even MySQL clustered fetch.

💡 Because fetching from a small index page (~8KB) is faster than reading bigger table pages.

🎯 Conclusion:

✅ MySQL clustered index is always fast for PK lookups.
✅ PostgreSQL can be even faster for small/narrow queries if Index-Only Scan is triggered.

👉 Quick Tip:

In PostgreSQL, you can force an index to include extra columns by using: CREATE INDEX idx_user_id_username ON users(user_id) INCLUDE (username); Then index-only scans become more common and predictable! 🚀

Isn’t PostgreSQL also doing 2 B-tree scans? One for secondary index and one for table (row_id)?

When you query with a secondary index, like:

SELECT * FROM users WHERE username = 'Bob';

In MySQL InnoDB, I said:
1. Find in secondary index (username ➔ user_id)
2. Then go to primary clustered index (user_id ➔ full row)

Let’s look at PostgreSQL first:

♦️ Step 1: Search Secondary Index B-tree on username.

It finds the matching TID (tuple ID) or row pointer.
- TID is a pair (block_number, row_offset).
- Not a B-tree! Just a physical pointer.

♦️ Step 2: Use the TID to directly jump into the heap (the table).

The heap (table) is not a B-tree — it’s just a collection of unordered pages (blocks of rows).
PostgreSQL goes directly to the block and offset — like jumping straight into a file.

🔔 Important:

Secondary index ➔ TID ➔ heap fetch.
No second B-tree traversal for the table!

🟠 Meanwhile in MySQL InnoDB:

♦️ Step 1: Search Secondary Index B-tree on username.

It finds the Primary Key value (user_id).

♦️ Step 2: Now, search the Primary Key Clustered B-tree to find the full row.

Need another B-tree traversal based on user_id.

🔔 Important:

Secondary index ➔ Primary Key B-tree ➔ data fetch.
Two full B-tree traversals!

Real-world Summary:

♦️ PostgreSQL

Secondary index gives a direct shortcut to the heap.
One B-tree scan (secondary) ➔ Direct heap fetch.

♦️ MySQL

Secondary index gives PK.
Then another B-tree scan (primary clustered) to find full row.

✅ PostgreSQL does not scan a second B-tree when fetching from the table — just a direct page lookup using TID.

✅ MySQL does scan a second B-tree (primary clustered index) when fetching full row after secondary lookup.

Is heap fetch a searching technique? Why is it faster than B-tree?

📚 Let’s start from the basics:

When PostgreSQL finds a match in a secondary index, what it gets is a TID.

♦️ A TID (Tuple ID) is a physical address made of:

Block Number (page number)
Offset Number (row slot inside the page)

Example:

TID = (block_number = 1583, offset = 7)

🔵 How PostgreSQL uses TID?

It directly calculates the location of the block (disk page) using block_number.
It reads that block (if not already in memory).
Inside that block, it finds the row at offset 7.

♦️ No search, no btree, no extra traversal — just:

Find the page (via simple number addressing)
Find the row slot

📈 Visual Example

Secondary index (username ➔ TID):

username	TID
Alice	(1583, 7)
Bob	(1592, 3)
Carol	(1601, 12)

♦️ When you search for “Bob”:

Find (1592, 3) from secondary index B-tree.
Jump directly to Block 1592, Offset 3.
Done ✅!

Answer:

Heap fetch is NOT a search.
It’s a direct address lookup (fixed number).
Heap = unordered collection of pages.
Pages = fixed-size blocks (usually 8 KB each).
TID gives an exact GPS location inside heap — no searching required.

That’s why heap fetch is faster than another B-tree search:

No binary search, no B-tree traversal needed.
Only a simple disk/memory read + row offset jump.

🌿 B-tree vs 📁 Heap Fetch

Action	B-tree	Heap Fetch
What it does	Binary search inside sorted tree nodes	Direct jump to block and slot
Steps needed	Traverse nodes (root ➔ internal ➔ leaf)	Directly read page and slot
Time complexity	O(log n)	O(1)
Speed	Slower (needs comparisons)	Very fast (direct)

🎯 Final and short answer:

♦️ In PostgreSQL, after finding the TID in the secondary index, the heap fetch is a direct, constant-time (O(1)) access — no B-tree needed!
♦️ This is faster than scanning another B-tree like in MySQL InnoDB.

🧩 Our exact question:

When we say:

Jump directly to Block 1592, Offset 3.

We are thinking:

There are thousands of blocks.
How can we directly jump to block 1592?
Shouldn’t that be O(n) (linear time)?
Shouldn’t there be some traversal?

🔵 Here’s the real truth:

No traversal needed.
No O(n) work.
Accessing Block 1592 is O(1) — constant time.

📚 Why?

Because of how files, pages, and memory work inside a database.

When PostgreSQL stores a table (the “heap”), it saves it in a file on disk.
The file is just a long array of fixed-size pages.

Each page = 8KB (default in Postgres).
Each block = 1 page = fixed 8KB chunk.
Block 0 is the first 8KB.
Block 1 is next 8KB.
Block 2 is next 8KB.
…
Block 1592 = (1592 × 8 KB) offset from the beginning.

✅ So block 1592 is simply located at 1592 × 8192 bytes offset from the start of the file.

✅ Operating systems (and PostgreSQL’s Buffer Manager) know exactly how to seek to that byte position without reading everything before it.

📈 Diagram (imagine the table file):

+-----------+-----------+-----------+-----------+-----------+------+
| Block 0   | Block 1   | Block 2   | Block 3   | Block 4   |  ... |
+-----------+-----------+-----------+-----------+-----------+------+
  (8KB)       (8KB)       (8KB)       (8KB)       (8KB)

Finding Block 1592 ➔
Seek directly to offset 1592 * 8192 bytes ➔
Read 8KB ➔
Find row at Offset 3 inside it.

🤔 What happens technically?

If in memory (shared buffers / page cache):

PostgreSQL checks its buffer pool (shared memory).
“Do I already have block 1592 cached?”
- ✅ Yes: immediately access memory address.
- ❌ No: Load block 1592 from disk into memory.

If from disk (rare if cached):

File systems (ext4, xfs, etc) know how to seek to a byte offset in a file without reading previous parts.
Seek to (block_number × 8192) bytes.
Read exactly 8KB into memory.
No need to scan the whole file linearly.

📊 Final Step: Inside the Block

Once the block is loaded:

The block internally is structured like an array of tuples.
Each tuple is placed into an offset slot.
Offset 3 ➔ third tuple inside the block.

♦️ Again, this is just array lookup — no traversal, no O(n).

⚡ So to summarize:

Question	Answer
How does PostgreSQL jump directly to block?	Using the block number × page size calculation (fixed offset math).
Is it O(n)?	❌ No, it’s O(1) constant time
Is there any traversal?	❌ No traversal. Just a seek + memory read.
How fast?	Extremely fast if cached, still fast if disk seeks.

🔥 Key concept:

PostgreSQL heap access is O(1) because the heap file is a flat sequence of fixed-size pages, and the TID gives exact coordinates.

🎯 Simple Real World Example:

Imagine you have a giant book (the table file).
Each page of the book is numbered (block number).

If someone says:

👉 “Go to page 1592.”

♦️ You don’t need to read pages 1 to 1591 first.
♦️ You just flip directly to page 1592.

📗 Same idea: no linear traversal, just positional lookup.

🧠 Deep thought:

Because blocks are fixed size and TID is known,
heap fetch is almost as fast as reading a small array.

(Actually faster than searching B-tree because B-tree needs multiple comparisons at each node.)

Enjoy SQL! 🚀

Learn SQL: Day 3 – JOINs (One of the Most Important SQL Topic)

If I had to choose one SQL topic that appears most frequently in Senior Developer interviews, it would be:

JOINs

Most Rails developers know:

User.joins(:orders)

But many cannot explain:

What SQL Rails generates
How PostgreSQL executes it
Why duplicates occur
When to use joins
When to use includes
When JOINs become slow

A senior engineer should be comfortable with all of these.

Today’s Goals

By the end of Day 3, you’ll understand:

INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL OUTER JOIN
CROSS JOIN
Self JOIN
How Rails translates associations into JOINs
N+1 query problem
joins vs includes
Interview questions

Step 1: Create Fresh Tables

Let’s create a simple system.

Users

			
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS users;
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100)
);

		

Orders

			
CREATE TABLE orders (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL,
  amount NUMERIC(10,2),
  CONSTRAINT fk_orders_user
  FOREIGN KEY (user_id)
  REFERENCES users(id)
);

		

Insert Sample Data

Users:

			
INSERT INTO users(name)
VALUES
('John'),
('Mary'),
('Bob'),
('Alice');

		

Orders:

			
INSERT INTO orders(user_id, amount)
VALUES
(1,100),
(1,200),
(2,300),
(2,400),
(2,500);

		

Current data:

users

id	name
1	John
2	Mary
3	Bob
4	Alice

orders

id	user_id	amount
1	1	100
2	1	200
3	2	300
4	2	400
5	2	500

Notice:

			
Bob has no orders
Alice has no orders

This becomes important.

What is a JOIN?

A JOIN combines rows from multiple tables.

Think:

			
users
+
orders
=
business information

		

The database uses a common column:

			
users.id
=
orders.user_id

1. INNER JOIN

Most common JOIN.

Returns only matching rows.

Query

			
SELECT
  users.id,
  users.name,
  orders.amount
FROM users
INNER JOIN orders
  ON users.id = orders.user_id;

		

Result:

name	amount
John	100
John	200
Mary	300
Mary	400
Mary	500

Notice:

			
Bob disappeared
Alice disappeared

Why?

Because they have no matching order.

Visual

			
users         orders
John      <->   100
John      <->   200
Mary      <->   300
Mary      <->   400
Mary      <->   500
Bob       X
Alice     X

		

Only matches survive.

Rails Equivalent

User.joins(:orders)

Generated SQL:

			
SELECT users.*
FROM users
INNER JOIN orders
ON orders.user_id = users.id;

Interview Question

What type of JOIN does Rails joins use?

Answer:

INNER JOIN

Many candidates miss this.

2. LEFT JOIN

Returns:

			
All rows from LEFT table
+
matching rows from RIGHT table

Query

			
SELECT
  users.name,
  orders.amount
FROM users
LEFT JOIN orders
ON users.id = orders.user_id;

		

Result:

name	amount
John	100
John	200
Mary	300
Mary	400
Mary	500
Bob	NULL
Alice	NULL

Notice:

			
Bob exists
Alice exists

Even without orders.

Visual

			
LEFT TABLE = users
Keep everything

			
John  -> order
Mary  -> order
Bob   -> NULL
Alice -> NULL

Rails Equivalent

User.left_joins(:orders)

Generated SQL:

LEFT OUTER JOIN

Practical Example

Find users without orders.

			
SELECT users.*
FROM users
LEFT JOIN orders
ON users.id = orders.user_id
WHERE orders.id IS NULL;

		

Result:

			
Bob
Alice

Rails:

			
User.left_joins(:orders)
    .where(orders: { id: nil })

Common Interview Question

Find customers who never placed an order.

Expected answer:

			
LEFT JOIN
+
IS NULL

3. RIGHT JOIN

Opposite of LEFT JOIN.

Keep all rows from right table.

			
SELECT *
FROM users
RIGHT JOIN orders
ON users.id = orders.user_id;

In real-world Rails projects:

Rarely used

Most engineers rewrite it as LEFT JOIN.

4. FULL OUTER JOIN

Keep everything.

			
SELECT *
FROM users
FULL OUTER JOIN orders
ON users.id = orders.user_id;

Returns:

			
All users
+
All orders

matched where possible.

Used occasionally for:

reporting
analytics
reconciliation

Rare in Rails applications.

5. CROSS JOIN

Creates every possible combination.

Example:

			
CREATE TABLE colors (
  color VARCHAR(20)
);
INSERT INTO colors
VALUES ('Red'),('Blue');

		

Sizes:

			
CREATE TABLE sizes (
  size VARCHAR(20)
);
INSERT INTO sizes
VALUES ('S'),('M');

		

Query:

			
SELECT *
FROM colors
CROSS JOIN sizes;

Result:

			
Red   S
Red   M
Blue  S
Blue  M

Every row paired with every row.

Formula:

RowsA × RowsB

Interview Question:

10 rows × 100 rows

How many rows?

Answer:

6. Self JOIN

A table joins itself.

Very common interview topic.

Create employees:

			
CREATE TABLE employees (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100),
  manager_id BIGINT
);

		

Insert:

			
INSERT INTO employees
(name, manager_id)
VALUES
('CEO', NULL),
('Manager1',1),
('Manager2',1),
('Developer1',2),
('Developer2',2);

		

Query:

			
SELECT
  e.name AS employee,
  m.name AS manager
FROM employees e
LEFT JOIN employees m
ON e.manager_id = m.id;

		

Result:

employee	manager
CEO	NULL
Manager1	CEO
Developer1	Manager1

Rails

			
class Employee < ApplicationRecord
  belongs_to :manager,
             class_name: "Employee",
             optional: true
  has_many :subordinates,
           class_name: "Employee",
           foreign_key: :manager_id
end

		

Why Duplicates Occur

Look at:

			
SELECT *
FROM users
INNER JOIN orders
ON users.id = orders.user_id;

Mary has:

3 orders

Therefore:

Mary appears 3 times

JOINs multiply rows.

This is one of the most misunderstood SQL concepts.

DISTINCT After JOIN

Sometimes we want unique users.

			
SELECT DISTINCT users.*
FROM users
JOIN orders
ON users.id = orders.user_id;

Rails

User.joins(:orders).distinct

The N+1 Query Problem

Every Rails interview asks this.

Suppose:

			
users = User.all
users.each do |user|
  puts user.orders.count
end

Queries:

SELECT * FROM users;

Then:

			
SELECT * FROM orders WHERE user_id=1;
SELECT * FROM orders WHERE user_id=2;
SELECT * FROM orders WHERE user_id=3;
...

100 users:

101 queries

Called:

N+1 problem

Fix Using includes

User.includes(:orders)

Rails loads:

SELECT * FROM users;

and

			
SELECT * FROM orders
WHERE user_id IN (...);

Only 2 queries.

joins vs includes

This is a favorite interview question.

joins

Used for filtering.

User.joins(:orders)

SQL:

INNER JOIN

Purpose:

Filter data

includes

Used for eager loading.

User.includes(:orders)

Purpose:

Avoid N+1

Example

Find users with orders.

User.joins(:orders)

Display users and orders.

User.includes(:orders)

Interview Question

Which is better?

joins

includes

Answer:

Depends on the problem.

Different purposes.

Real Interview Queries

Find users with orders

			
SELECT DISTINCT users.*
FROM users
JOIN orders
ON users.id = orders.user_id;

Rails:

User.joins(:orders).distinct

Find users without orders

			
SELECT users.*
FROM users
LEFT JOIN orders
ON users.id = orders.user_id
WHERE orders.id IS NULL;

		

Rails:

			
User.left_joins(:orders)
    .where(orders: { id: nil })

Find total orders per user

			
SELECT
  users.name,
  COUNT(orders.id)
FROM users
LEFT JOIN orders
ON users.id = orders.user_id
GROUP BY users.name;

		

We’ll study GROUP BY in Day 4.

Senior-Level Insights

1. Most Rails JOINs are INNER JOINs

joins

means:

INNER JOIN

2. LEFT JOIN + IS NULL

is the standard pattern for:

Find missing records

Examples:

users without orders
customers without subscriptions
products without inventory

3. JOINs Multiply Rows

One user:

3 orders

becomes:

3 result rows

Always remember this.

4. Understand the Generated SQL

When writing:

User.joins(:orders)

you should mentally see:

			
INNER JOIN orders
ON orders.user_id = users.id

immediately.

Practical Exercises

Exercise 1

Find all users with orders.

Exercise 2

Find all users without orders.

Exercise 3

Add 3 more orders for John.

Run:

INNER JOIN

Observe duplicates.

Exercise 4

Use:

DISTINCT

to remove duplicates.

Exercise 5

Create:

			
categories
products

and practice:

			
INNER JOIN
LEFT JOIN

Homework

Create the following schema:

			
authors
books
publishers

Relationships:

			
Author has many Books
Publisher has many Books
Book belongs to Author
Book belongs to Publisher

Insert sample data.

Write queries:

Books with author names
Books with publisher names
Authors without books
Publishers without books
Unique authors who have books

Write both:

SQL
ActiveRecord

Question:

I cannot see any difference in output between left join and full outer join with our example. can u create a new data to see me correct difference between them?

This is actually something that confuses many developers in interviews.

The reason you don’t see any difference is because our current data doesn’t have any “orphan” records in the orders table.

Let’s understand why.

Our Current Data

users

id	name
1	John
2	Mary
3	Bob
4	Alice

orders

id	user_id	amount
1	1	100
2	1	200
3	2	300
4	2	400
5	2	500

Every order belongs to an existing user.

So there are:

Users without orders (Bob, Alice)
No orders without users

That’s why LEFT JOIN and FULL OUTER JOIN appear almost identical.

Let’s Create a Better Example

To see the difference, we need an order that doesn’t match any user.

However…

Our foreign key prevents that.

			
FOREIGN KEY (user_id)
REFERENCES users(id)

This is a good thing because it maintains data integrity.

So for learning purposes, we’ll create another table without a foreign key.

Step 1

			
DROP TABLE IF EXISTS orders_demo;
CREATE TABLE orders_demo (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT,
    amount NUMERIC(10,2)
);

		

Notice:

❌ No foreign key.

Step 2

Insert data

			
INSERT INTO orders_demo(user_id, amount)
VALUES
(1,100),
(1,200),
(2,300),
(999,400);

		

Now we have:

users

id	name
1	John
2	Mary
3	Bob
4	Alice

orders_demo

id	user_id	amount
1	1	100
2	1	200
3	2	300
4	999	400

Notice:

user_id = 999

There is no matching user.

This is our orphan order.

INNER JOIN

			
SELECT
    u.id,
    u.name,
    o.amount
FROM users u
INNER JOIN orders_demo o
ON u.id = o.user_id;

		

Result

name	amount
John	100
John	200
Mary	300

The orphan order disappears.

LEFT JOIN

			
SELECT
    u.id,
    u.name,
    o.amount
FROM users u
LEFT JOIN orders_demo o
ON u.id = o.user_id;

		

Result

name	amount
John	100
John	200
Mary	300
Bob	NULL
Alice	NULL

Question:

Where is the orphan order?

It is gone!

Why?

Because LEFT JOIN keeps every row from the left table (users). Since there is no user with id = 999, there is nothing on the left to preserve.

FULL OUTER JOIN

			
SELECT
    u.id,
    u.name,
    o.user_id,
    o.amount
FROM users u
FULL OUTER JOIN orders_demo o
ON u.id = o.user_id;

		

Result

user id	name	order user_id	amount
1	John	1	100
1	John	1	200
2	Mary	2	300
3	Bob	NULL	NULL
4	Alice	NULL	NULL
NULL	NULL	999	400

Now you finally see the difference!

The last row exists only because of FULL OUTER JOIN.

When to use FULL OUTER JOIN?

Check the page: https://railsdrop.com/learn-sql-day-3-when-to-use-full-outer-join/

Quick Quiz

Given these tables:

id
1
2
3

id
2
3
4

Without running SQL, what rows do you expect from:

INNER JOIN
LEFT JOIN (A LEFT JOIN B)
RIGHT JOIN (A RIGHT JOIN B)
FULL OUTER JOIN

Try answering these on paper first. If you can do that confidently, you’ve truly understood how the different joins work.

Day 4 Preview

Tomorrow we’ll cover one of the highest-frequency SQL interview topics:

GROUP BY & Aggregations

Including:

COUNT
SUM
AVG
MIN
MAX
GROUP BY
HAVING
Aggregate queries in Rails
Real interview problems such as:
- Top customers
- Revenue calculations
- Most purchased products
- Reporting queries

This is where SQL starts becoming analytical rather than just relational.

Happy Learning!

Learn SQL: Day 2 – SELECT Queries (The Foundation of SQL)

Today we start writing queries.

Today’s Goals

By the end of Day 2 you should understand:

SELECT
WHERE
ORDER BY
LIMIT
OFFSET
DISTINCT
IN
BETWEEN
LIKE
ILIKE
NULL handling
ActiveRecord equivalents
Common interview questions
Common mistakes

Step 1: Create Our Practice Database

Connect:

psql sql_day1

Let’s create a new table.

			
DROP TABLE IF EXISTS users;
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(255),
  age INTEGER,
  city VARCHAR(100),
  salary NUMERIC(10,2),
  active BOOLEAN,
  created_at TIMESTAMP DEFAULT NOW()
);

		

Insert Sample Data

			
INSERT INTO users
(name, email, age, city, salary, active)
VALUES
('John',  'john@test.com', 30, 'New York', 70000, true),
('Mary',  'mary@test.com', 25, 'Chicago', 60000, true),
('Bob',   'bob@test.com', 35, 'Chicago', 90000, false),
('Alice', 'alice@test.com', 28, 'Boston', 75000, true),
('Tom',   'tom@test.com', 40, 'New York', 120000, false),
('Sara',  'sara@test.com', 32, 'Boston', 85000, true),
('Mike',  'mike@test.com', NULL, 'Chicago', NULL, true);

		

View data:

SELECT * FROM users;

1. SELECT

The most basic query.

SELECT * FROM users;

Meaning:

Give me all columns from users

Output:

id | name | email | age | city | salary

Select Specific Columns

Instead of everything:

SELECT name, email FROM users;

Output:

			
name   | email
--------+---------------
John   | john@test.com
Mary   | mary@test.com

Rails Equivalent

User.select(:name, :email)

Senior Insight

Avoid:

SELECT *

in production systems unless needed.

Why?

Because fetching unnecessary columns:

uses more memory
transfers more data
slows queries

Good:

SELECT id, name FROM users;

2. WHERE Clause

Filters rows.

Example

Only active users.

			
SELECT * FROM users
WHERE active = true;

Rails:

User.where(active: true)

Age Greater Than 30

			
SELECT * FROM users
WHERE age > 30;

Rails:

User.where("age > ?", 30)

Multiple Conditions

			
SELECT * FROM users
WHERE city = 'Chicago'
AND active = true;

Rails:

User.where(city: "Chicago", active: true)

OR

			
SELECT * FROM users
WHERE city = 'Chicago'
OR city = 'Boston';

Rails:

User.where(city: ["Chicago", "Boston"])

Interview Question

Which runs first?

WHERE A OR B AND C

Answer:

AND

before

OR

Use parentheses.

			
WHERE (A OR B)
AND C

3. ORDER BY

Sort results.

Ascending

			
SELECT * FROM users
ORDER BY age ASC;

Smallest age first.

Descending

			
SELECT * FROM users
ORDER BY salary DESC;

Highest salary first.

Rails:

User.order(salary: :desc)

Multiple Columns

			
SELECT * FROM users
ORDER BY city ASC, salary DESC;

Meaning:

			
Sort by city first
Inside each city
sort by salary

Common Interview Question

What happens if you omit ASC/DESC?

ORDER BY age

Default:

ASC

4. LIMIT

Return only N rows.

			
SELECT * FROM users
LIMIT 3;

Rails:

User.limit(3)

Why LIMIT Matters

Imagine:

10 million rows

Fetching all:

			
slow
memory-heavy
unnecessary

LIMIT reduces work.

5. OFFSET

Skip rows.

			
SELECT * FROM users
LIMIT 3
OFFSET 3;

Meaning:

			
Skip first 3
Return next 3

Rails

User.limit(3).offset(3)

Pagination Example

Page 1

LIMIT 10 OFFSET 0

Page 2

LIMIT 10 OFFSET 10

Page 3

LIMIT 10 OFFSET 20

Senior Insight

Large OFFSET values become expensive.

Example:

OFFSET 500000

PostgreSQL still scans through those rows.

Later we’ll learn:

Keyset Pagination

which is much faster.

6. DISTINCT

Remove duplicates.

Example:

SELECT city FROM users;

Result:

			
Chicago
Chicago
Chicago
Boston
Boston
New York

		

Distinct:

SELECT DISTINCT city FROM users;

Result:

			
Chicago
Boston
New York

Rails

User.select(:city).distinct

Multiple Columns

SELECT DISTINCT city, active FROM users;

Distinct applies to the combination.

7. IN

Cleaner alternative to multiple OR conditions.

Instead of:

			
WHERE city='Boston'
OR city='Chicago'
OR city='New York'

Use:

			
WHERE city IN
('Boston','Chicago','New York');

Rails:

User.where(city: ["Boston", "Chicago", "New York"])

8. BETWEEN

Range filtering.

Age between 25 and 35.

			
SELECT * FROM users
WHERE age BETWEEN 25 AND 35;

Equivalent:

			
age >= 25
AND
age <= 35

Rails

User.where(age: 25..35)

Salary Range

			
SELECT * FROM users
WHERE salary BETWEEN 60000 AND 90000;

Interview Question

Is BETWEEN inclusive?

Answer:

YES

Both boundaries included.

9. LIKE

Pattern matching.

Find names beginning with M.

			
SELECT * FROM users
WHERE name LIKE 'M%';

Result:

			
Mary
Mike

Ends With

WHERE email LIKE '%test.com'

Contains

WHERE name LIKE '%ar%'

Matches:

			
Mary
Sara

Wildcards

Symbol	Meaning
%	Any number of chars
_	Exactly one char

Example

WHERE name LIKE '_o%'

Matches:

			
Bob
Tom

10. ILIKE

PostgreSQL-specific.

Case-insensitive LIKE.

			
SELECT * FROM users
WHERE name ILIKE 'john';

Matches:

			
John
JOHN
john
JoHn

Rails

User.where("name ILIKE ?", "john")

Senior Interview Insight

In PostgreSQL:

LIKE

is case-sensitive.

ILIKE

is case-insensitive.

Many developers don’t know this.

11. NULL Handling

This is a favorite interview topic.

Let’s inspect:

SELECT * FROM users;

Mike has:

			
age = NULL
salary = NULL

Wrong

WHERE age = NULL

Returns:

Nothing

Correct

WHERE age IS NULL

Find users with no salary:

			
SELECT * FROM users
WHERE salary IS NULL;

Rails

User.where(age: nil)

Generates:

IS NULL

NOT NULL

			
SELECT * FROM users
WHERE salary IS NOT NULL;

Why NULL Is Special

SQL uses:

			
TRUE
FALSE
UNKNOWN

not just:

			
TRUE
FALSE

This is called:

Three-Valued Logic

Interviewers love asking this.

Practical Exercises

Exercise 1

Find all active users.

Exercise 2

Find users older than 30.

Exercise 3

Find users from Boston.

Exercise 4

Find top 3 highest-paid users.

Exercise 5

Find unique cities.

Exercise 6

Find users aged between 25 and 35.

Exercise 7

Find names starting with S.

Exercise 8

Find users whose salary is NULL.

Combining Everything

Example:

			
SELECT name, city, salary
FROM users
WHERE active = true
  AND city IN ('Chicago', 'Boston')
  AND salary IS NOT NULL
ORDER BY salary DESC
LIMIT 3;

		

Can you explain what this query does before running it?

That’s exactly the kind of reasoning expected in senior interviews.

ActiveRecord Translation Challenge

Convert this SQL:

			
SELECT *
FROM users
WHERE city = 'Chicago'
AND active = true
ORDER BY salary DESC
LIMIT 2;

		

into ActiveRecord.

Common Mistakes

Mistake 1

WHERE age = NULL

Wrong.

Use:

WHERE age IS NULL

Mistake 2

Using:

SELECT *

everywhere.

Mistake 3

Forgetting ORDER BY when using LIMIT.

LIMIT 5

without ordering can return arbitrary rows.

Mistake 4

Using huge OFFSET values.

Senior-Level Knowledge

Understand that SQL logically executes in this order:

			
FROM
WHERE
SELECT
DISTINCT
ORDER BY
LIMIT

		

Even though we write:

			
SELECT ...
FROM ...
WHERE ...

PostgreSQL conceptually processes the clauses in the above order.

This understanding becomes extremely important when we move to:

JOINs
GROUP BY
HAVING
Query Optimization
EXPLAIN ANALYZE

Homework

Create a new table:

			
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100),
  category VARCHAR(100),
  price NUMERIC(10,2),
  stock_quantity INTEGER
);

		

Insert at least 10 records.

Practice:

SELECT specific columns
WHERE with multiple conditions
ORDER BY price DESC
LIMIT 5
DISTINCT categories
BETWEEN on price
LIKE searches
Products with stock_quantity IS NULL

Day 3 Preview

Next we’ll cover one of the most important interview topics:

JOINs

Including:

INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
CROSS JOIN
Self Join
ActiveRecord joins
includes vs joins vs preload vs eager_load
Real Rails interview questions

Day 3 is where SQL starts becoming truly powerful.

Happy Learning! 🚀

Learn SQL: Day 1 – Relational Database Fundamentals

We’re going to learn these topics at three levels simultaneously:

Database Level (PostgreSQL)
SQL Level
Rails / ActiveRecord Level

For every topic, ask yourself:

“How does PostgreSQL store this?”

“How do I query this with SQL?”

“How does Rails represent this?”

This will help you to prepare for interviews. We use PostgreSQL here.

Today’s Goal

By the end of Day 1, you should fully understand:

Database
Table
Row
Column
Primary Key
Foreign Key
Constraints
One-to-One
One-to-Many
Many-to-Many
Rails associations behind them

Part 1: What is a Database?

Imagine you are building an e-commerce application.

You need to store:

Users
Products
Orders
Payments

A database is simply a structured place to store and retrieve that information.

In PostgreSQL:

CREATE DATABASE shop_app;

Connect to it:

psql postgres

Inside psql:

CREATE DATABASE shop_app;

Connect:

\c shop_app

Verify:

SELECT current_database();

Output:

 shop_app

Part 2: What is a Table?

A table is similar to an Excel sheet.

Example:

Users table

id	name	email
1	John	john@test.com
2	Mary	mary@test.com

Create it:

			
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(255)
);

		

Verify:

\d users

Interview Question:

Why not store everything in a single giant table?

Answer:

Because:

duplication increases
maintenance becomes difficult
relationships become unclear
updates become expensive

This concept is called normalization (we’ll study later).

Part 3: Rows

A row represents one record.

Insert data:

			
INSERT INTO users (name, email)
VALUES
('John', 'john@test.com'),
('Mary', 'mary@test.com');

View:

SELECT * FROM users;

Output:

 id | name | email
----+------+----------------
 1  | John | john@test.com
 2  | Mary | mary@test.com

Each row = one user.

Part 4: Columns

Columns describe attributes.

In users table:

			
id
name
email

View columns:

\d users

Senior Insight:

A database table models an entity.

Examples:

Entity	Table
User	users
Product	products
Order	orders

Columns represent attributes of that entity.

Part 5: Primary Keys

Every row needs a unique identifier.

Example:

id BIGSERIAL PRIMARY KEY

Meaning:

No duplicates.

No NULLs.

Try:

			
INSERT INTO users (id, name)
VALUES (1, 'Bob');

You should get:

duplicate key value violates unique constraint

Why Primary Keys Exist

Without a primary key:

			
John
John
John

Which John?

Nobody knows.

Primary key solves identity.

Rails Equivalent

Migration:

			
create_table :users do |t|
  t.string :name
  t.string :email
end

Rails automatically adds:

id

as the primary key.

Part 6: Constraints

Constraint = database rule.

Interviewers love this topic.

NOT NULL

Create:

			
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL
);

Try:

			
INSERT INTO products(name)
VALUES(NULL);

Fails.

UNIQUE

			
CREATE TABLE customers (
  id BIGSERIAL PRIMARY KEY,
  email VARCHAR(255) UNIQUE
);

Duplicate email:

			
INSERT INTO customers(email)
VALUES('test@test.com');
INSERT INTO customers(email)
VALUES('test@test.com');

Fails.

CHECK Constraint

Age must be positive.

			
CREATE TABLE employees (
  id BIGSERIAL PRIMARY KEY,
  age INTEGER CHECK(age > 0)
);

Fails:

			
INSERT INTO employees(age)
VALUES(-5);

Why Constraints Matter

Junior developer:

validates :email, uniqueness: true

Senior developer:

			
validates :email, uniqueness: true
+
UNIQUE(email)

Because application validations can be bypassed.

Database constraints cannot.

Part 7: Foreign Keys

Now let’s model:

User has many orders.

Create users:

			
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100)
);

Create orders:

			
CREATE TABLE orders (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT,
  total NUMERIC(10,2)
);

		

Foreign key:

			
ALTER TABLE orders
ADD CONSTRAINT fk_orders_user
FOREIGN KEY (user_id)
REFERENCES users(id);

Insert user:

			
INSERT INTO users(name)
VALUES('John');

Insert order:

			
INSERT INTO orders(user_id,total)
VALUES(1,100);

Works.

Try:

			
INSERT INTO orders(user_id,total)
VALUES(999,100);

Fails.

Because user doesn’t exist.

Why Foreign Keys Exist

Without them:

Order belongs to user 999

But user 999 doesn’t exist.

Database becomes corrupted.

Rails Equivalent

			
class User < ApplicationRecord
  has_many :orders
end
class Order < ApplicationRecord
  belongs_to :user
end

		

Migration:

			
t.references :user,
             null: false,
             foreign_key: true

Rails creates:

			
user_id
FOREIGN KEY

behind the scenes.

Part 8: One-to-Many Relationship

Most common relationship.

Example:

User -> Orders

One user:

John

Many orders:

			
Order 1
Order 2
Order 3

Diagram:

			
users
-----
id
orders
------
id
user_id

		

Rails:

			
User has_many :orders
Order belongs_to :user

Practical Exercise

Insert:

			
INSERT INTO users(name)
VALUES('Mary');

Orders:

			
INSERT INTO orders(user_id,total)
VALUES
(2,50),
(2,75),
(2,120);

		

Query:

			
SELECT *
FROM orders
WHERE user_id = 2;

Part 9: One-to-One Relationship

Less common.

Example:

			
User
Profile

Each user has exactly one profile.

Create profile table:

			
CREATE TABLE profiles (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT UNIQUE,
  bio TEXT,
  FOREIGN KEY(user_id)
  REFERENCES users(id)
);

		

Notice:

UNIQUE(user_id)

This forces:

			
One user
One profile

Rails:

			
class User < ApplicationRecord
  has_one :profile
end
class Profile < ApplicationRecord
  belongs_to :user
end

		

Interview Question:

How does a database enforce one-to-one?

Answer:

			
FOREIGN KEY
+
UNIQUE

on the foreign key column.

Part 10: Many-to-Many Relationship

Classic interview topic.

Example:

			
Students
Courses

Student can enroll in many courses.

Course can have many students.

Create students:

			
CREATE TABLE students (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100)
);

Create courses:

			
CREATE TABLE courses (
  id BIGSERIAL PRIMARY KEY,
  title VARCHAR(100)
);

Need a join table:

			
CREATE TABLE enrollments (
  student_id BIGINT,
  course_id BIGINT,
  PRIMARY KEY(student_id, course_id),
  FOREIGN KEY(student_id)
    REFERENCES students(id),
  FOREIGN KEY(course_id)
    REFERENCES courses(id)
);

		

Diagram:

			
students
    |
    |
enrollments
    |
    |
courses

		

Rails

			
class Student < ApplicationRecord
  has_many :enrollments
  has_many :courses, through: :enrollments
end
class Course < ApplicationRecord
  has_many :enrollments
  has_many :students, through: :enrollments
end
class Enrollment < ApplicationRecord
  belongs_to :student
  belongs_to :course
end

		

Senior-Level Insight

Most Rails developers stop at:

			
has_many
belongs_to

Strong backend engineers understand:

			
Association
      ↓
Foreign Key
      ↓
Constraint
      ↓
Index
      ↓
Storage

		

That understanding helps you:

debug production issues
optimize queries
design schemas
answer interview questions confidently

Interview Questions

Try answering without looking.

Q1

Difference between:

PRIMARY KEY

and

UNIQUE

Q2

Can a table have multiple UNIQUE constraints?

Q3

Can a table have multiple PRIMARY KEYS?

Q4

How is a one-to-one relationship implemented in PostgreSQL?

Q5

Why should foreign keys exist even when Rails validations exist?

Q6

What problem does a join table solve?

Practical Lab (Run Everything)

Create a fresh database:

CREATE DATABASE interview_sql_day1;

Connect:

\c interview_sql_day1

Create:

			
users
profiles
orders
students
courses
enrollments

		

Insert sample data.

Then practice:

			
SELECT * FROM users;
SELECT * FROM orders;
SELECT * FROM profiles;
SELECT * FROM enrollments;

Try intentionally violating:

PRIMARY KEY
UNIQUE
NOT NULL
FOREIGN KEY

and observe PostgreSQL’s error messages.

A senior engineer learns a lot from database errors.

Homework

Exercise 1

Create:

			
authors
books

One author → many books

Add proper foreign keys.

Exercise 2

Create:

			
employees
employee_details

One-to-one relationship.

Exercise 3

Create:

			
movies
actors
movie_actors

Many-to-many relationship.

Insert:

3 movies
5 actors

Create relationships.

Exercise 4

For every relationship above, write the equivalent Rails models and associations.

Day 2 Preview

Next we’ll cover the foundation of everything in SQL:

SELECT Queries

Including:

SELECT
WHERE
ORDER BY
LIMIT
OFFSET
DISTINCT
IN
BETWEEN
LIKE
ILIKE
NULL handling

plus PostgreSQL execution behavior and ActiveRecord equivalents.

This is where real querying begins.

Happy Learning! 🚀