Rails 8 App: Setup Test DB | Comprehensive Guide 📖 for PostgreSQL , Mysql Indexing – PostgreSQL Heap ⛰ vs Mysql InnoDB B-Tree 🌿

Enter into psql terminal:

✗ psql postgres
psql (14.17 (Homebrew))
Type "help" for help.

postgres=# \l
                                     List of databases
           Name            |  Owner   | Encoding | Collate | Ctype |   Access privileges
---------------------------+----------+----------+---------+-------+-----------------------
 studio_development | postgres | UTF8     | C       | C     |

Create a new test database
Create a users Table
Check the db and table details

postgres=# create database test_db;
CREATE DATABASE

test_db=# CREATE TABLE users (
user_id INT,
username VARCHAR(220),
email VARCHAR(150),
phone_number VARCHAR(20)
);
CREATE TABLE

test_db=# \dt
List of relations
 Schema | Name  | Type  |  Owner
--------+-------+-------+----------
 public | users | table | abhilash
(1 row)

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           |          |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |

Add a Primary key to users and check the user table.

test_db=# ALTER TABLE users ADD PRIMARY KEY (user_id);
ALTER TABLE

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           | not null |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (user_id)

# OR add primary key when creating the table:
CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150),
  phone_number VARCHAR(20)
);

You can a unique constraint and an index added when adding a primary key.

Why does adding a primary key also add an index?

A primary key must guarantee that each value is unique and fast to find.
Without an index, the database would have to scan the whole table every time you look up a primary key, which would be very slow.
So PostgreSQL automatically creates a unique index on the primary key to make lookups efficient and to enforce uniqueness at the database level.

👉 It needs the index for speed and to enforce the “no duplicates” rule of primary keys.

What is btree?

btree stands for Balanced Tree (specifically, a “B-tree” data structure).
It’s the default index type in PostgreSQL.
B-tree indexes organize the data in a tree structure, so that searches, inserts, updates, and deletes are all very efficient — about O(log n) time.
It’s great for looking up exact matches (like WHERE user_id = 123) or range queries (like WHERE user_id BETWEEN 100 AND 200).

👉 So when you see btree, it just means it’s using a very efficient tree structure for your primary key index.

Summary in one line:
Adding a primary key automatically adds a btree index to enforce uniqueness and make lookups super fast.

In MySQL (specifically InnoDB engine, which is default now):

Primary keys always create an index automatically.
The index is a clustered index — this is different from Postgres!
The index uses a B-tree structure too, just like Postgres.

👉 So yes, MySQL also adds an index and uses a B-tree under the hood for primary keys.

But here’s a big difference:

In InnoDB, the table data itself is stored inside the primary key’s B-tree.
- That’s called a clustered index.
- It means the physical storage of the table rows follows the order of the primary key.
In PostgreSQL, the index and the table are stored separately (non-clustered by default).

Example: If you have a table like this in MySQL:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150)
);

user_id will have a B-tree clustered index.
The rows themselves will be stored sorted by user_id.

Short version:

Database	Primary Key Behavior	B-tree?	Clustered?
PostgreSQL	Separate index created for PK	Yes	No (separate by default)
MySQL (InnoDB)	PK index + Table rows stored inside the PK’s B-tree	Yes	Yes (always clustered)

Why Indexing on Unique Columns (like `email`) Improves Lookup 🔍

Use Case

You frequently run queries like:

SELECT * FROM students WHERE email = 'john@example.com';

Without an index, this results in a full table scan — checking each row one-by-one.

With an index, the database can jump directly to the row using a sorted structure, significantly reducing lookup time — especially in large tables.

🌲 How SQL Stores Indexes Internally (PostgreSQL)

📚 PostgreSQL uses B-Tree indexes by default.

When you run:

CREATE UNIQUE INDEX idx_students_on_email ON students(email);

PostgreSQL creates a balanced B-tree like this:

          m@example.com
         /              \
  d@example.com     t@example.com
  /        \           /         \
...      ...        ...         ...

✅ Keys (email values) are sorted lexicographically.
✅ Each leaf node contains a pointer to the actual row in the students table (called a tuple pointer or TID).
✅ Lookup uses binary search, giving O(log n) performance.

⚙️ Unique Index = Even Faster

Because all email values are unique, the database:

Can stop searching immediately once a match is found.
Doesn’t need to scan multiple leaf entries (no duplicates).

🧠 Summary

Feature	Value
Index Type	B-tree (default in PostgreSQL)
Lookup Time	O(log n) vs O(n) without index
Optimized for	Equality search (`WHERE email = ...`), sorting, joins
Email is unique?	✅ Yes – index helps even more (no need to check multiple rows)
Table scan avoided?	✅ Yes – PostgreSQL jumps directly via B-tree lookup

What Exactly is a Clustered Index in MySQL (InnoDB)?

🔹 In MySQL InnoDB, the primary key IS the table.

🔹 A Clustered Index means:

The table’s data rows are physically organized in the order of the primary key.
No separate storage for the table – it’s merged into the primary key’s B-tree structure.

In simple words:
👉 “The table itself lives inside the primary key B-tree.”

That’s why:

Every secondary index must store the primary key value (not a row pointer).
InnoDB can only have one clustered index (because you can’t physically order a table in two different ways).

📈 Visual for MySQL Clustered Index

Suppose you have:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(255),
  email VARCHAR(255)
);

The storage looks like:

B-tree by user_id (Clustered)

user_id  | username | email
----------------------------
101      | Alice    | a@x.com
102      | Bob      | b@x.com
103      | Carol    | c@x.com

👉 Table rows stored directly inside the B-tree nodes by user_id!

🔵 PostgreSQL (Primary Key Index = Separate)

Imagine you have a users table:

users table (physical table):

row_id | user_id | username | email
-------------------------------------
  1    |   101   | Alice    | a@example.com
  2    |   102   | Bob      | b@example.com
  3    |   103   | Carol    | c@example.com

And the Primary Key Index looks like:

Primary Key B-Tree (separate structure):

user_id -> row pointer
 101    -> row_id 1
 102    -> row_id 2
 103    -> row_id 3

👉 When you query WHERE user_id = 102, PostgreSQL goes:

Find user_id 102 in the B-tree index,
Then jump to row_id 2 in the actual table.

🔸 Index and Table are separate.
🔸 Extra step: index lookup ➔ then fetch row.

🟠 MySQL InnoDB (Primary Key Index = Clustered)

Same users table, but stored like this:

Primary Key Clustered B-Tree (index + data together):

user_id | username | email
---------------------------------
  101   | Alice    | a@example.com
  102   | Bob      | b@example.com
  103   | Carol    | c@example.com

👉 When you query WHERE user_id = 102, MySQL:

Goes straight to user_id 102 in the B-tree,
Data is already there, no extra lookup.

🔸 Index and Table are merged.
🔸 One step: direct access!

📈 Quick Visual:

PostgreSQL
(Index)    ➔    (Table Row)
    |
    ➔ extra lookup needed

MySQL InnoDB
(Index + Row Together)
    |
    ➔ data found immediately

Summary:

PostgreSQL: primary key index is separate ➔ needs 2 steps (index ➔ table).
MySQL InnoDB: primary key index is clustered ➔ 1 step (index = table).

📚 How Secondary Indexes Work

Secondary Index = an index on a column that is not the primary key.

Example:

CREATE INDEX idx_username ON users(username);

Now you have an index on username.

🔵 PostgreSQL Secondary Index Behavior

Secondary indexes are separate structures from the table (just like the primary key index).
When you query by username, PostgreSQL:
1. Finds the matching row_id using the secondary B-tree index.
2. Then fetches the full row from the table by row_id.
This is called an Index Scan + Heap Fetch.

📜 Example:

Secondary Index (username -> row_id):

username -> row_id
------------------
Alice    -> 1
Bob      -> 2
Carol    -> 3

(users table is separate)

👉 Flexible, but needs 2 steps: index (row_id) ➔ table.

🟠 MySQL InnoDB Secondary Index Behavior

In InnoDB, secondary indexes don’t store row pointers.
Instead, they store the primary key value!

So:

Find the matching primary key using the secondary index.
Use the primary key to find the actual row inside the clustered primary key B-tree.

📜 Example:

Secondary Index (username -> user_id):

username -> user_id
--------------------
Alice    -> 101
Bob      -> 102
Carol    -> 103

(Then find user_id inside Clustered B-Tree)

✅ Needs 2 steps too: secondary index (primary key) ➔ clustered table.

📈 Quick Visual:

Feature	PostgreSQL	MySQL InnoDB
Secondary Index	username ➔ row pointer (row_id)	username ➔ primary key (user_id)
Fetch Full Row	Use row_id to get table row	Use primary key to find row in clustered index
Steps to Fetch	Index ➔ Table	Index ➔ Primary Key ➔ Table (clustered)

Action	PostgreSQL	MySQL InnoDB
Primary Key Lookup	Index ➔ Row (2 steps)	Clustered Index (1 step)
Secondary Index Lookup	Index (row_id) ➔ Row (2 steps)	Secondary Index (PK) ➔ Row (2 steps)
Storage Model	Separate index and table	Primary key and table merged (clustered)

🌐 Now, let’s do some Real SQL Query ⛁ Examples!

1. **Simple `SELECT * FROM users WHERE user_id = 102;`**

PostgreSQL:
Look into PK btree ➔ find row pointer ➔ fetch row separately.
MySQL InnoDB:
Directly find the row inside the PK B-tree (no extra lookup).

✅ MySQL is a little faster here because it needs only 1 step!

2. `SELECT username FROM users WHERE user_id = 102;` (Only 1 Column)

PostgreSQL:
Might do an Index Only Scan if all needed data is in the index (very fast).
MySQL:
Clustered index contains all columns already, no special optimization needed.

✅ Both can be very fast, but PostgreSQL shines if the index is “covering” (i.e., contains all needed columns). Because index table has less size than clustered index of mysql.

3. **`SELECT * FROM users WHERE username = 'Bob';` (Secondary Index Search)**

PostgreSQL:
Secondary index on username ➔ row pointer ➔ fetch table row.
MySQL:
Secondary index on username ➔ get primary key ➔ clustered index lookup ➔ fetch data.

✅ Both are 2 steps, but MySQL needs 2 different B-trees: secondary ➔ primary clustered.

Consider the below situation:

SELECT username FROM users WHERE user_id = 102;

user_id is the Primary Key.
You only want username, not full row.

Now:

🔵 PostgreSQL Behavior

👉 In PostgreSQL, by default:

It uses the primary key btree to find the row pointer.
Then fetches the full row from the table (heap fetch).

👉 But PostgreSQL has an optimization called Index-Only Scan.

If all requested columns are already present in the index,
And if the table visibility map says the row is still valid (no deleted/updated row needing visibility check),
Then Postgres does not fetch the heap.

👉 So in this case:

If the primary key index also stores username internally (or if an extra index is created covering username), Postgres can satisfy the query just from the index.

✅ Result: No table lookup needed ➔ Very fast (almost as fast as InnoDB clustered lookup).

📢 Postgres primary key indexes usually don’t store extra columns, unless you specifically create an index that includes them (INCLUDE (username) syntax in modern Postgres 11+).

🟠 MySQL InnoDB Behavior

In InnoDB:
Since the primary key B-tree already holds all columns (user_id, username, email),
It directly finds the row from the clustered index.
So when you query by PK, even if you only need one column, it has everything inside the same page/block.

✅ One fast lookup.

🔥 Why sometimes Postgres can still be faster?

If PostgreSQL uses Index-Only Scan, and the page is already cached, and no extra visibility check is needed,
Then Postgres may avoid touching the table at all and only scan the tiny index pages.
In this case, for very narrow queries (e.g., only 1 small field), Postgres can outperform even MySQL clustered fetch.

💡 Because fetching from a small index page (~8KB) is faster than reading bigger table pages.

🎯 Conclusion:

✅ MySQL clustered index is always fast for PK lookups.
✅ PostgreSQL can be even faster for small/narrow queries if Index-Only Scan is triggered.

👉 Quick Tip:

In PostgreSQL, you can force an index to include extra columns by using: CREATE INDEX idx_user_id_username ON users(user_id) INCLUDE (username); Then index-only scans become more common and predictable! 🚀

Isn’t PostgreSQL also doing 2 B-tree scans? One for secondary index and one for table (row_id)?

When you query with a secondary index, like:

SELECT * FROM users WHERE username = 'Bob';

In MySQL InnoDB, I said:
1. Find in secondary index (username ➔ user_id)
2. Then go to primary clustered index (user_id ➔ full row)

Let’s look at PostgreSQL first:

♦️ Step 1: Search Secondary Index B-tree on username.

It finds the matching TID (tuple ID) or row pointer.
- TID is a pair (block_number, row_offset).
- Not a B-tree! Just a physical pointer.

♦️ Step 2: Use the TID to directly jump into the heap (the table).

The heap (table) is not a B-tree — it’s just a collection of unordered pages (blocks of rows).
PostgreSQL goes directly to the block and offset — like jumping straight into a file.

🔔 Important:

Secondary index ➔ TID ➔ heap fetch.
No second B-tree traversal for the table!

🟠 Meanwhile in MySQL InnoDB:

♦️ Step 1: Search Secondary Index B-tree on username.

It finds the Primary Key value (user_id).

♦️ Step 2: Now, search the Primary Key Clustered B-tree to find the full row.

Need another B-tree traversal based on user_id.

🔔 Important:

Secondary index ➔ Primary Key B-tree ➔ data fetch.
Two full B-tree traversals!

Real-world Summary:

♦️ PostgreSQL

Secondary index gives a direct shortcut to the heap.
One B-tree scan (secondary) ➔ Direct heap fetch.

♦️ MySQL

Secondary index gives PK.
Then another B-tree scan (primary clustered) to find full row.

✅ PostgreSQL does not scan a second B-tree when fetching from the table — just a direct page lookup using TID.

✅ MySQL does scan a second B-tree (primary clustered index) when fetching full row after secondary lookup.

Is heap fetch a searching technique? Why is it faster than B-tree?

📚 Let’s start from the basics:

When PostgreSQL finds a match in a secondary index, what it gets is a TID.

♦️ A TID (Tuple ID) is a physical address made of:

Block Number (page number)
Offset Number (row slot inside the page)

Example:

TID = (block_number = 1583, offset = 7)

🔵 How PostgreSQL uses TID?

It directly calculates the location of the block (disk page) using block_number.
It reads that block (if not already in memory).
Inside that block, it finds the row at offset 7.

♦️ No search, no btree, no extra traversal — just:

Find the page (via simple number addressing)
Find the row slot

📈 Visual Example

Secondary index (username ➔ TID):

username	TID
Alice	(1583, 7)
Bob	(1592, 3)
Carol	(1601, 12)

♦️ When you search for “Bob”:

Find (1592, 3) from secondary index B-tree.
Jump directly to Block 1592, Offset 3.
Done ✅!

Answer:

Heap fetch is NOT a search.
It’s a direct address lookup (fixed number).
Heap = unordered collection of pages.
Pages = fixed-size blocks (usually 8 KB each).
TID gives an exact GPS location inside heap — no searching required.

That’s why heap fetch is faster than another B-tree search:

No binary search, no B-tree traversal needed.
Only a simple disk/memory read + row offset jump.

🌿 B-tree vs 📁 Heap Fetch

Action	B-tree	Heap Fetch
What it does	Binary search inside sorted tree nodes	Direct jump to block and slot
Steps needed	Traverse nodes (root ➔ internal ➔ leaf)	Directly read page and slot
Time complexity	O(log n)	O(1)
Speed	Slower (needs comparisons)	Very fast (direct)

🎯 Final and short answer:

♦️ In PostgreSQL, after finding the TID in the secondary index, the heap fetch is a direct, constant-time (O(1)) access — no B-tree needed!
♦️ This is faster than scanning another B-tree like in MySQL InnoDB.

🧩 Our exact question:

When we say:

Jump directly to Block 1592, Offset 3.

We are thinking:

There are thousands of blocks.
How can we directly jump to block 1592?
Shouldn’t that be O(n) (linear time)?
Shouldn’t there be some traversal?

🔵 Here’s the real truth:

No traversal needed.
No O(n) work.
Accessing Block 1592 is O(1) — constant time.

📚 Why?

Because of how files, pages, and memory work inside a database.

When PostgreSQL stores a table (the “heap”), it saves it in a file on disk.
The file is just a long array of fixed-size pages.

Each page = 8KB (default in Postgres).
Each block = 1 page = fixed 8KB chunk.
Block 0 is the first 8KB.
Block 1 is next 8KB.
Block 2 is next 8KB.
…
Block 1592 = (1592 × 8 KB) offset from the beginning.

✅ So block 1592 is simply located at 1592 × 8192 bytes offset from the start of the file.

✅ Operating systems (and PostgreSQL’s Buffer Manager) know exactly how to seek to that byte position without reading everything before it.

📈 Diagram (imagine the table file):

+-----------+-----------+-----------+-----------+-----------+------+
| Block 0   | Block 1   | Block 2   | Block 3   | Block 4   |  ... |
+-----------+-----------+-----------+-----------+-----------+------+
  (8KB)       (8KB)       (8KB)       (8KB)       (8KB)

Finding Block 1592 ➔
Seek directly to offset 1592 * 8192 bytes ➔
Read 8KB ➔
Find row at Offset 3 inside it.

🤔 What happens technically?

If in memory (shared buffers / page cache):

PostgreSQL checks its buffer pool (shared memory).
“Do I already have block 1592 cached?”
- ✅ Yes: immediately access memory address.
- ❌ No: Load block 1592 from disk into memory.

If from disk (rare if cached):

File systems (ext4, xfs, etc) know how to seek to a byte offset in a file without reading previous parts.
Seek to (block_number × 8192) bytes.
Read exactly 8KB into memory.
No need to scan the whole file linearly.

📊 Final Step: Inside the Block

Once the block is loaded:

The block internally is structured like an array of tuples.
Each tuple is placed into an offset slot.
Offset 3 ➔ third tuple inside the block.

♦️ Again, this is just array lookup — no traversal, no O(n).

⚡ So to summarize:

Question	Answer
How does PostgreSQL jump directly to block?	Using the block number × page size calculation (fixed offset math).
Is it O(n)?	❌ No, it’s O(1) constant time
Is there any traversal?	❌ No traversal. Just a seek + memory read.
How fast?	Extremely fast if cached, still fast if disk seeks.

🔥 Key concept:

PostgreSQL heap access is O(1) because the heap file is a flat sequence of fixed-size pages, and the TID gives exact coordinates.

🎯 Simple Real World Example:

Imagine you have a giant book (the table file).
Each page of the book is numbered (block number).

If someone says:

👉 “Go to page 1592.”

♦️ You don’t need to read pages 1 to 1591 first.
♦️ You just flip directly to page 1592.

📗 Same idea: no linear traversal, just positional lookup.

🧠 Deep thought:

Because blocks are fixed size and TID is known,
heap fetch is almost as fast as reading a small array.

(Actually faster than searching B-tree because B-tree needs multiple comparisons at each node.)

Enjoy SQL! 🚀

Setup 🛠 Rails 8 App – Part 13: Composite keys & Candidate keys in Rails DB

🔑 What Is a Composite Key?

A composite key is a primary key made up of two or more columns that together uniquely identify a row in a table.

Use a composite key when no single column is unique on its own, but the combination is.

👉 Example: Composite Key in Action

Let’s say we’re building a table to track which students are enrolled in which courses.

Without Composite Key:

-- This table might allow duplicates
CREATE TABLE Enrollments (
  student_id INT,
  course_id INT
);

Nothing stops the same student from enrolling in the same course multiple times!

With Composite Key:

CREATE TABLE Enrollments (
  student_id INT,
  course_id INT,
  PRIMARY KEY (student_id, course_id)
);

Now:

student_id alone is not unique
course_id alone is not unique
But together → each (student_id, course_id) pair is unique

📌 Why Use Composite Keys?

When to Use	Why
Tracking many-to-many relationships	Ensures unique pairs
Bridging/junction tables	e.g., students-courses, authors-books
No natural single-column key	But the combination is unique

⚠️ Things to Keep in Mind

Composite keys enforce uniqueness across multiple columns.
They can also be used as foreign keys in other tables.
Some developers prefer to add an auto-increment id as the primary key instead—but that’s a design choice.

🔎 What Is a Candidate Key?

A candidate key is any column (or combination of columns) in a table that can uniquely identify each row.

Every table can have multiple candidate keys
One of them is chosen to be the primary key
The rest are called alternate keys

🔑 Think of candidate keys as “potential primary keys”

👉 Example: Users Table

CREATE TABLE Users (
  user_id INT,
  username VARCHAR(80),
  email VARCHAR(150),
  phone_number VARCHAR(30)
);

Let’s have some hands own experience in SQL queries by creating a TEST DB. Check https://railsdrop.com/2025/04/25/rails-8-app-part-13-2-test-sql-queries/

Assume:

user_id is unique
username is unique
email is unique

Candidate Keys:

user_id
username
email

You can choose any one of them as the primary key, depending on your design needs.

-- Choosing user_id as the primary key
PRIMARY KEY (user_id)

The rest (username, email) are alternate keys.

📌 Characteristics of Candidate Keys

Property	Description
Uniqueness	Must uniquely identify each row
Non-null	Cannot contain NULL values
Minimality	Must be the smallest set of columns that uniquely identifies a row (no extra columns)
No duplicates	No two rows have the same value(s)

👥 Candidate Key vs Composite Key

Concept	Explanation
Candidate Key	Any unique identifier (single or multiple columns)
Composite Key	A candidate key that uses multiple columns

So: All composite keys are candidate keys, but not all candidate keys are composite.

💡 When Designing a Database

Find all possible candidate keys
Choose one as the primary key
(Optional) Define other candidate keys as unique constraints

CREATE TABLE Users (
  user_id INT PRIMARY KEY,
  username VARCHAR UNIQUE,
  email VARCHAR UNIQUE
);

Let’s walk through a real-world example using a schema we are already working on: a shopping app that sells clothing for women, men, kids, and infants.

We’ll look at how candidate keys apply to real tables like Users, Products, Orders, etc.

🛍️ Example Schema: Shopping App

1. Users Table

CREATE TABLE Users (
  user_id SERIAL PRIMARY KEY,
  email VARCHAR UNIQUE,
  username VARCHAR UNIQUE,
  phone_number VARCHAR
);

Candidate Keys:

user_id ✅
email ✅
username ✅

We chose user_id as the primary key, but both email and username could also uniquely identify a user — so they’re candidate keys.

2. Products Table

CREATE TABLE Products (
  product_id SERIAL PRIMARY KEY,
  sku VARCHAR UNIQUE,
  name VARCHAR,
  category VARCHAR
);

Candidate Keys:

product_id ✅
sku ✅ (Stock Keeping Unit – a unique identifier for each product)

sku is a candidate key. We use product_id as the primary key, but you could use sku if you wanted a natural key instead.

3. Orders Table

CREATE TABLE Orders (
  order_id SERIAL PRIMARY KEY,
  user_id INT REFERENCES Users(user_id),
  order_number VARCHAR UNIQUE,
  created_at TIMESTAMP
);

Candidate Keys:

order_id ✅
order_number ✅

You might use order_number (e.g., "ORD-20250417-0012") for external reference and order_id internally. Both are unique identifiers = candidate keys.

4. OrderItems Table (Join Table)

This table links orders to the specific products and quantities purchased.

CREATE TABLE OrderItems (
  order_id INT,
  product_id INT,
  quantity INT,
  PRIMARY KEY (order_id, product_id),
  FOREIGN KEY (order_id) REFERENCES Orders(order_id),
  FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

Candidate Key:

Composite key: (order_id, product_id) ✅

Here, a combination of order_id and product_id uniquely identifies a row — i.e., what product was ordered in which order — making it a composite candidate key, and we’ve selected it as the primary key.

👀 Summary of Candidate Keys by Table

Table	Candidate Keys	Primary Key Used
Users	`user_id`, `email`, `username`	`user_id`
Products	`product_id`, `sku`	`product_id`
Orders	`order_id`, `order_number`	`order_id`
OrderItems	`(order_id, product_id)`	`(order_id, product_id)`

Let’s explore how to implement candidate keys in both SQL and Rails (Active Record). Since we are working on a shopping app in Rails 8, I’ll show how to enforce uniqueness and data integrity in both layers:

🔹 1. Candidate Keys in SQL (PostgreSQL Example)

Let’s take the Users table with multiple candidate keys (email, username, and user_id).

CREATE TABLE users (
  user_id SERIAL PRIMARY KEY,
  email VARCHAR(255) NOT NULL UNIQUE,
  username VARCHAR(100) NOT NULL UNIQUE,
  phone_number VARCHAR(20)
);

user_id: chosen as the primary key
email and username: candidate keys, enforced via UNIQUE constraints

💎 Composite Key Example (OrderItems)

CREATE TABLE order_items (
  order_id INT,
  product_id INT,
  quantity INT NOT NULL,
  PRIMARY KEY (order_id, product_id),
  FOREIGN KEY (order_id) REFERENCES orders(order_id),
  FOREIGN KEY (product_id) REFERENCES products(product_id)
);

This sets (order_id, product_id) as a composite candidate key and primary key.

🔸 2. Candidate Keys in Rails (ActiveRecord)

Now let’s do the same with Rails models + migrations + validations.

✅ `users` Migration (with candidate keys)

# db/migrate/xxxxxx_create_users.rb
class CreateUsers < ActiveRecord::Migration[8.0]
  def change
    create_table :users do |t|
      t.string :email, null: false
      t.string :username, null: false
      t.string :phone_number

      t.timestamps
    end

    add_index :users, :email, unique: true
    add_index :users, :username, unique: true
  end
end

✅ `User` Model

class User < ApplicationRecord
  validates :email, presence: true, uniqueness: true
  validates :username, presence: true, uniqueness: true
end

✅ These are candidate keys — email and username could be primary keys, but we are using id instead.

✅ Composite Key with `OrderItem` (Join Table)

ActiveRecord doesn’t support composite primary keys natively, but you can enforce uniqueness via a multi-column index:

Migration:

class CreateOrderItems < ActiveRecord::Migration[8.0]
  def change
    create_table :order_items, id: false do |t|
      t.references :order, null: false, foreign_key: true
      t.references :product, null: false, foreign_key: true
      t.integer :quantity, null: false

      t.timestamps
    end

    add_index :order_items, [:order_id, :product_id], unique: true
  end
end

Model:

class OrderItem < ApplicationRecord
  belongs_to :order
  belongs_to :product

  validates :quantity, presence: true
  validates :order_id, uniqueness: { scope: :product_id }
end

🎯 This simulates a composite key behavior: each product can only appear once per order.

➕ Extra: Use `composite_primary_keys` Gem (Optional)

If you really need true composite primary keys, use:

gem 'composite_primary_keys'

But it’s best to avoid unless your use case demands it — most Rails apps use a surrogate key (id) for simplicity.

to be continued.. 🚀

Setup 🛠 Rails 8 App – Part 12: Modify Product Schema – Apply Normalization

Right now we have following fields in Product Table:

create_table "products", force: :cascade do |t|
    t.string "title", null: false
    t.text "description"
    t.string "category"
    t.string "color"
    t.string "size", limit: 10
    t.decimal "mrp", precision: 7, scale: 2
    t.decimal "discount", precision: 7, scale: 2
    t.decimal "rating", precision: 2, scale: 1
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
  end

If you examine the above table, there will be repetitive product items if consider for a size of an item there comes so many colours. We need to create different product rows for each size different colours.

So Let’s split the table into two.

1. Product Table

class CreateProducts < ActiveRecord::Migration[8.0]
  def change
    def change
      create_table :products do |t|
        t.string  :name
        t.text    :description
        t.string  :category   # women, men, kids, infants
        t.decimal :rating, precision: 2, scale: 1, default: 0.0

        t.timestamps
      end

      add_index :products, :category
    end
  end
end

2. Product Variant Table

class CreateProductVariants < ActiveRecord::Migration[8.0]
  def change
    create_table :product_variants do |t|
      t.references :product, null: false, foreign_key: true
      t.string  :sku, null: false
      t.decimal :price, precision: 10, scale: 2
      t.string  :size
      t.string  :color
      t.integer :stock_quantity, default: 0
      t.jsonb   :specs, default: {}, null: false

      t.timestamps
    end

    # GIN index for fast JSONB attribute searching
    add_index :product_variants, :specs, using: :gin
    add_index :product_variants, [ :product_id, :size, :color ], unique: true
    add_index :product_variants, :sku, unique: true
  end
end

Data normalization is a core concept in database design that helps organize data efficiently, eliminate redundancy, and ensure data integrity.

🔍 What Is Data Normalization?

Normalization is the process of structuring a relational database in a way that:

Reduces data redundancy (no repeated data)
Prevents anomalies in insert, update, or delete operations
Improves data integrity

It breaks down large, complex tables into smaller, related tables and defines relationships using foreign keys.

🧐 Why Normalize?

Problem Without Normalization	How Normalization Helps
Duplicate data everywhere	Moves repeated data into separate tables
Inconsistent values	Enforces rules and relationships
Hard to update data	Isolates each concept so it’s updated once
Wasted storage	Reduces data repetition

📚 Normal Forms (NF)

Each Normal Form (NF) represents a level of database normalization. The most common are:

🔸 1NF – First Normal Form

Eliminate repeating groups
Ensure each column has atomic (indivisible) values

Bad Example:

CREATE TABLE Orders (
  order_id INT,
  customer_name VARCHAR,
  items TEXT  -- "Shirt, Pants, Hat"
);

Fixed (1NF):

CREATE TABLE OrderItems (
  order_id INT,
  item_name VARCHAR
);

🔸 2NF – Second Normal Form

Must be in 1NF
Remove partial dependencies (when a non-key column depends only on part of a composite key)

Example: If a table has a composite key (student_id, course_id), and student_name depends only on student_id, it should go into a separate table.

🔸 3NF – Third Normal Form

Must be in 2NF
Remove transitive dependencies (non-key columns depending on other non-key columns)

Example:

CREATE TABLE Employees (
  emp_id INT PRIMARY KEY,
  emp_name VARCHAR,
  dept_name VARCHAR,
  dept_location VARCHAR
);

Here, dept_location depends on dept_name, not emp_id — so split it:

Normalized:

CREATE TABLE Departments (
  dept_name VARCHAR PRIMARY KEY,
  dept_location VARCHAR
);

💡 Real-World Example

Let’s say you have this table:

Orders(order_id, customer_name, customer_email, product1, product2, product3)

Problems:

Repeating columns (product1, product2, …)
Redundant customer data in each order

Normalized Version:

Customers(customer_id, name, email)
Orders(order_id, customer_id)
OrderItems(order_id, product_id)
Products(product_id, name, price)

⚖️ Normalization vs. Denormalization

✅ Normalization = Good for consistency, long-term maintenance
⚠️ Denormalization = Good for performance in read-heavy systems (like reporting dashboards)

Use normalization as a default practice, then selectively denormalize if performance requires it.

Delete button example (Rails 7+)

<%= link_to "Delete Product",
              @product,
              data: { turbo_method: :delete, turbo_confirm: "Are you sure you want to delete this product?" },
              class: "inline-block px-4 py-2 bg-red-100 text-red-600 border border-red-300 rounded-md hover:bg-red-600 hover:text-white font-semibold transition duration-300 transform hover:scale-105" %>

💡 What’s Improved:

data: { turbo_confirm: ... } ensures compatibility with Turbo (Rails 7+).
Better button-like appearance (bg, px, py, rounded, etc.).
Hover effects and transitions for a smooth UI experience.

Add Brand to products table

Let’s add brand column to the product table:

✗ rails g migration add_brand_to_products brand:string:
index

class AddBrandToProducts < ActiveRecord::Migration[8.0]
  def change
    # Add 'brand' column
    add_column :products, :brand, :string

    # Add index for brand
    add_index :products, :brand
  end
end

❗️Important Note:

❌ PostgreSQL does not support BEFORE or AFTER when adding a column.

Caused by:
PG::SyntaxError: ERROR:  syntax error at or near "BEFORE" (PG::SyntaxError)
LINE 1: ...LTER TABLE products ADD COLUMN brand VARCHAR(255) BEFORE des...

PostgreSQL (default in Rails) does not support column order (they’re always returned in the order they were created).
If you’re using MySQL, you could use raw SQL for positioning as shown below.

If I USE MySQL, I would like to see the brand name as first column of the table products. You can do that by changing the migration to:

class AddBrandToProducts < ActiveRecord::Migration[8.0]
  def up
    execute "ALTER TABLE products ADD COLUMN brand VARCHAR(255) BEFORE description;"
    add_index :products, :brand
  end

  def down
    remove_index :products, :brand
    remove_column :products, :brand
  end
end

Reverting Previous Migrations

You can use Active Record’s ability to rollback migrations using the revert method:

require_relative "20121212123456_example_migration"

class FixupExampleMigration < ActiveRecord::Migration[8.0]
  def change
    revert ExampleMigration

    create_table(:apples) do |t|
      t.string :variety
    end
  end
end

The revert method also accepts a block of instructions to reverse. This could be useful to revert selected parts of previous migrations.

Reference: https://guides.rubyonrails.org/active_record_migrations.html#reverting-previous-migrations

Product Index Page after applying NF:

Product Show Page after applying NF:

New Product Page after applying NF:

to be continued.. 🚀

Setup 🛠 Rails 8 App – Part 11: Convert 🔄 Rails App from SQLite to PostgreSQL

If you’ve already built a Rails 8 app using the default SQLite setup and now want to switch to PostgreSQL, here’s a clean step-by-step guide to make the transition smooth:

1.🔧 Setup PostgreSQL in macOS

🔷 Step 1: Install PostgreSQL via Homebrew

Run the following:

brew install postgresql

This created a default database cluster for me, check the output. So you can skip the Step 3.

==> Summary
🍺  /opt/homebrew/Cellar/postgresql@14/14.17_1: 3,330 files, 45.9MB

==> Running `brew cleanup postgresql@14`...
==> postgresql@14
This formula has created a default database cluster with:
  initdb --locale=C -E UTF-8 /opt/homebrew/var/postgresql@14

To start postgresql@14 now and restart at login:
  brew services start postgresql@14

Or, if you don't want/need a background service you can just run:
  /opt/homebrew/opt/postgresql@14/bin/postgres -D /opt/homebrew/var/postgresql@14

After installation, check the version:

psql --version
> psql (PostgreSQL) 14.17 (Homebrew)

🔷 Step 2: Start PostgreSQL Service

To start PostgreSQL now and have it start automatically at login:

brew services start postgresql
==> Successfully started `postgresql@14` (label: homebrew.mxcl.postgresql@14)

If you just want to run it in the background without autostart:

# pg_ctl — initialize, start, stop, or control a PostgreSQL server
pg_ctl -D /opt/homebrew/var/postgresql@14 start

https://www.postgresql.org/docs/current/app-pg-ctl.html

You can find the installed version using:

brew list | grep postgres

🔷 Step 3: Initialize the Database (if needed)

Sometimes Homebrew does this automatically. If not:

initdb /opt/homebrew/var/postgresql@<version>

Or a more general version:

initdb /usr/local/var/postgres

Key functions of initdb: Creates a new database cluster, Initializes the database cluster’s default locale and character set encoding, Runs a vacuum command.

In essence, initdb prepares the environment for a PostgreSQL database to be used and provides a foundation for creating and managing databases within that cluster

🔷 Step 4: Create a User and Database

PostgreSQL uses a role-based access control. Create a user with superuser privileges:

# createuser creates a new Postgres user
createuser -s postgres

createuser is a shell script wrapper around the SQL command CREATE USER via the Postgres interactive terminal psql. Thus, there is nothing special about creating users via this or other methods

Then switch to psql:

psql postgres

You can also create a database:

createdb <db_name>

🔷 Step 5: Connect and Use psql

psql -d <db_name>

Inside the psql shell, try:

\l    -- list databases
\dt   -- list tables
\q    -- quit

🔷 Step 6: Use a GUI (Optional)

For a friendly UI, install one of the following:

pgAdmin

Postico

TablePlus

2. Update `Gemfile`

Replace SQLite gem with PostgreSQL:

# Remove or comment this:
# gem "sqlite3", "~> 1.4"

# Add this:
gem "pg", "~> 1.4"

Then run:

bundle install

3. Update `config/database.yml`

Replace the entire contents of config/database.yml with the following:

default: &default
  adapter: postgresql
  encoding: unicode
  username: postgres
  password:
  host: localhost
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>

development:
  <<: *default
  database: your_app_name_development

test:
  <<: *default
  database: your_app_name_test

production:
  primary: &primary_production
    <<: *default
    database: your_app_name_production
    username: your_production_username
    password: <%= ENV['YOUR_APP_DATABASE_PASSWORD'] %>
  cache:
    <<: *primary_production
    database: your_app_name_production_cache
    migrations_paths: db/cache_migrate
  queue:
    <<: *primary_production
    database: your_app_name_production_queue
    migrations_paths: db/queue_migrate
  cable:
    <<: *primary_production
    database: your_app_name_production_cable
    migrations_paths: db/cable_migrate

Replace your_app_name with your actual Rails app name.

4. Drop SQLite Database (Optional)

rm storage/development.sqlite3
rm storage/test.sqlite3

5. Create and Setup PostgreSQL Database

rails db:create
rails db:migrate

If you had seed data:

rails db:seed

6. Test It Works

Boot up your server:

bin/dev

Then go to http://localhost:3000 and confirm everything works.

7. Check psql manually (Optional)

psql -d your_app_name_development

Then run:

\dt     -- view tables
\q      -- quit

8. Update `.gitignore`

Note: If not already added /storage/*

Make sure SQLite DBs are not accidentally committed:

/storage/*.sqlite3
/storage/*.sqlite3-journal

After moving into PostgreSQL

I was getting an issue with postgres column, where I have the following data in the migration:

# migration
t.decimal :rating, precision: 1, scale: 1

# log
ActiveRecord::RangeError (PG::NumericValueOutOfRange: ERROR:  numeric field overflow
12:44:36 web.1  | DETAIL:  A field with precision 1, scale 1 must round to an absolute value less than 1.
12:44:36 web.1  | )

Value passed is: 4.3. I was not getting this issue in SqLite DB.

What does `precision: 1, scale: 1` mean?

precision: Total number of digits (both left and right of the decimal).
scale: Number of digits after the decimal point

If you want to store ratings like 4.3, 4.5, etc., a good setup is:

t.decimal :rating, precision: 2, scale: 1

# revert and migrate for products table

✗ rails db:migrate:down VERSION=2025031XXXXX -t
✗ rails db:migrate:up VERSION=2025031XXXXXX -t

Then go to http://localhost:3000 and confirm everything works.

to be continued.. 🚀

Setup 🛠 Rails 8 App – Part 10: PostgreSQL Into The Action

For a Ruby on Rails 8 application, the choice of database depends on your specific needs, but here’s a breakdown of the best options and when to use each:

PostgreSQL (Highly Recommended)

Best overall choice for most Rails apps.

Why:

First-class support in Rails.
Advanced features like full-text search, JSONB support, CTEs, window functions.
Strong consistency and reliability.
Scales well vertically and horizontally (with tools like Citus).

Used by: GitHub, Discourse, Basecamp, Shopify.

Use if:

You’re building a standard Rails web app or API.
You need advanced query features or are handling complex data types (e.g., JSON).

SQLite (For development/testing only)

Lightweight, file-based.
Fast and easy to set up.
But not recommended for production.

Use if:

You’re building a quick prototype or local dev/testing app.
NOT for multi-user production environments.

MySQL / MariaDB

Also supported by Rails.
Can work fine for simpler applications.
Lacks some advanced features (like robust JSON support or full Postgres-style indexing).
Not the default in many modern Rails setups.

Use if:

Your team already has MySQL infrastructure or legacy systems.
You need horizontal scaling with Galera Cluster or similar setups.

Others (NoSQL like MongoDB, Redis, etc.)

Use Redis for caching and background job data (not as primary DB).
Use MongoDB or other NoSQL only if your data model really demands it (e.g., unstructured documents, event sourcing).

Recommendation Summary:

Use Case	Recommended DB
Production web/API app	PostgreSQL
Dev/prototyping/local testing	SQLite
Legacy systems/MySQL infrastructure	MySQL/MariaDB
Background jobs/caching	Redis
Special needs (e.g., documents)	MongoDB (with caution)

If you’re starting fresh or building something scalable and modern with Rails 8, go with PostgreSQL.

Let’s break that down:

💬 What does “robust JSON support” mean?

PostgreSQL supports a special column type: json and jsonb, which lets you store structured JSON data directly in your database — like hashes or objects.

Why it matters:

You can store dynamic data without needing to change your schema.
You can query inside the JSON using SQL (->, ->>, @>, etc.).
You can index parts of the JSON — for speed.

🔧 Example:

You have a products table with a specs column that holds tech specs in JSON:

specs = {
  "color": "black",
  "brand": "Libas",
  "dimensions": {"chest": "34", "waist": "30", "shoulder": "13.5"}
}

You can query like:

SELECT * FROM products WHERE specs->>'color' = 'black';

Or check if the JSON contains a value:

SELECT * FROM products WHERE specs @> '{"brand": "Libas"}';

You can even index specs->>'color' to make these queries fast.

💬 What does “full Postgres-style indexing” mean?

PostgreSQL supports a wide variety of powerful indexing options, which improve query performance and flexibility.

⚙️ Types of Indexes PostgreSQL supports:

Index Type	Use Case
B-Tree	Default; used for most equality and range searches
GIN (Generalized Inverted Index)	Fast indexing for JSON, arrays, full-text search
Partial Indexes	Index only part of the data (e.g., `WHERE active = true`)
Expression Indexes	Index a function or expression (e.g., `LOWER(email)`)
Covering Indexes (INCLUDE)	Fetch data directly from the index, avoiding table reads

B-Tree Indexes: B-tree indexes are more suitable for single-value columns.
When to Use GIN Indexes: When you frequently search for specific elements within arrays, JSON documents, or other composite data types.
Example for GIN Indexes: Imagine you have a table with a JSONB column containing document metadata. A GIN index on this column would allow you to quickly find all documents that have a specific author or belong to a particular category.

Why does this matter for our shopping app?

We can store and filter products with dynamic specs (e.g., kurtas, shorts, pants) without new columns.
Full-text search on product names/descriptions.
Fast filters: color = 'red' AND brand = 'Libas' even if those are stored in JSON.
Index custom expressions like LOWER(email) for case-insensitive login.

💬 What are Common Table Expressions (CTEs)?

CTEs are temporary result sets you can reference within a SQL query — like defining a mini subquery that makes complex SQL easier to read and write.

WITH recent_orders AS (
  SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT * FROM recent_orders WHERE total > 100;

Breaking complex queries into readable parts.
Re-using result sets without repeating subqueries.

In Rails (via `with` from gems like `scenic` or `with_cte`):

Order
  .with(recent_orders: Order.where('created_at > ?', 7.days.ago))
  .from('recent_orders')
  .where('total > ?', 100)

💬 What are Window Functions?

Window functions perform calculations across rows related to the current row — unlike aggregate functions, they don’t group results into one row.

🔧 Example: Rank users by their score within each team:

SELECT
  user_id,
  team_id,
  score,
  RANK() OVER (PARTITION BY team_id ORDER BY score DESC) AS rank
FROM users;

Use cases:

Ranking rows (like leaderboards).
Running totals or moving averages.
Calculating differences between rows (e.g. “How much did this order increase from the last?”).

🛤 In Rails:

Window functions are available through raw SQL or Arel. Here’s a basic example:

User
  .select("user_id, team_id, score, RANK() OVER (PARTITION BY team_id ORDER BY score DESC) AS rank")

CTEs and Window functions are fully supported in PostgreSQL, making it the go-to DB for any Rails 8 app that needs advanced querying.

JSONB Support

JSONB stands for “JSON Binary” and is a binary representation of JSON data that allows for efficient storage and retrieval of complex data structures.

This can be useful when you have data that doesn’t fit neatly into traditional relational database tables, such as nested or variable-length data structures.

Absolutely — storing JSON in a relational database (like PostgreSQL) can be super powerful when used wisely. It gives you schema flexibility without abandoning the structure and power of SQL. Here are real-world use cases for using JSON columns in relational databases:

Here are real-world use cases for using JSON columns in relational databases:

🔧 1. Flexible Metadata / Extra Attributes

Let users store arbitrary attributes that don’t require schema changes every time.

Use case: Product variants, custom fields

t.jsonb :metadata

{
  "color": "red",
  "size": "XL",
  "material": "cotton"
}

=> Good when:

You can’t predict all the attributes users will need.
You don’t want to create dozens of nullable columns.

🎛️ 2. Storing Settings or Preferences

User or app settings that vary a lot.

Use case: Notification preferences, UI layout, feature toggles

{
  "email": true,
  "sms": false,
  "theme": "dark"
}

=> Easy to store and retrieve as a blob without complex joins.

🌐 3. API Response Caching

Store external API responses for caching or auditing.

Use case: Storing Stripe, GitHub, or weather API responses.

t.jsonb :api_response

=> Avoids having to map every response field into a column.

📦 4. Storing Logs or Events

Use case: Audit trails, system logs, user events

{
  "action": "login",
  "timestamp": "2025-04-18T10:15:00Z",
  "ip": "123.45.67.89"
}

=> Great for capturing varied data over time without a rigid schema.

📊 6. Embedded Mini-Structures

Use case: A form builder app storing user-created forms and fields.

{
  "fields": [
    { "type": "text", "label": "Name", "required": true },
    { "type": "email", "label": "Email", "required": false }
  ]
}

=> When each row can have nested, structured data — almost like a mini-document.

🕹️ 7. Device or Browser Info (User Agents)

Use case: Analytics, device fingerprinting

{
  "browser": "Safari",
  "os": "macOS",
  "version": "17.3"
}

=> You don’t need to normalize or query this often — perfect for JSON.

JSON vs JSONB in PostgreSQL

Use jsonb over json unless you need to preserve order or whitespace.

jsonb is binary format → faster and indexable
You can do fancy stuff like:

SELECT * FROM users WHERE preferences ->> 'theme' = 'dark';

Or in Rails:

User.where("preferences ->> 'theme' = ?", 'dark')

store and store_accessor

They let you treat JSON or text-based hash columns like structured data, so you can access fields as if they were real database columns.

🔹 `store`

Used to declare a serialized store (usually a jsonb, json, or text column) on your model.
Works best with key/value stores.

👉 Example:

Let’s say your users table has a settings column of type jsonb:

# migration
add_column :users, :settings, :jsonb, default: {}

Now in your model:

class User < ApplicationRecord
  store :settings, accessors: [:theme, :notifications], coder: JSON
end

You can now do this:

user.theme = "dark"
user.notifications = true
user.save

user.settings
# => { "theme" => "dark", "notifications" => true }

🔹 `store_accessor`

A lightweight version that only declares attribute accessors for keys inside a JSON column. Doesn’t include serialization logic — so you usually use it with a json/jsonb/text column that already works as a Hash.

👉 Example:

class User < ApplicationRecord
  store_accessor :settings, :theme, :notifications
end

This gives you:

user.theme, user.theme=
user.notifications, user.notifications=

🤔 When to Use Each?

Feature	When to Use
`store`	When you need both serialization and accessors
`store_accessor`	When your column is already serialized (`jsonb`, etc.)

If you’re using PostgreSQL with jsonb columns — it’s more common to just use store_accessor.

Querying JSON Fields

User.where("settings ->> 'theme' = ?", "dark")

Or if you’re using store_accessor:

User.where(theme: "dark")

💡 But remember: you’ll only be able to query these fields efficiently if you’re using jsonb + proper indexes.

🔥 Conclusion:

PostgreSQL can store, search, and index inside JSON fields natively.
This lets you keep your schema flexible and your queries fast.
Combined with its advanced indexing, it’s ideal for a modern e-commerce app with dynamic product attributes, filtering, and searching.

To install and set up PostgreSQL on macOS, you have a few options. The most common and cleanest method is using Homebrew. Here’s a step-by-step guide:

Writing Perfect Active Record 🗒️ Queries in Ruby on Rails 8

Active Record (AR) is the heart of Ruby on Rails when it comes to database interactions. Writing efficient and readable queries is crucial for application performance and maintainability. This guide will help you master Active Record queries with real-world examples and best practices.

Setting Up a Sample Database

To demonstrate complex Active Record queries, let’s create a Rails app with a sample database structure containing multiple tables.

Generate Models & Migrations

rails new MyApp --database=postgresql
cd MyApp
rails g model User name:string email:string
rails g model Post title:string body:text user:references
rails g model Comment body:text user:references post:references
rails g model Category name:string
rails g model PostCategory post:references category:references
rails g model Like user:references comment:references
rails db:migrate

Database Schema Overview

users: Stores user information.
posts: Stores blog posts written by users.
comments: Stores comments on posts, linked to users and posts.
categories: Stores post categories.
post_categories: Join table for posts and categories.
likes: Stores likes on comments by users.

Basic Active Record Queries

1. Fetching All Records

User.all  # Returns all users (Avoid using it directly on large datasets as it loads everything into memory)

⚠️ User.all can lead to performance issues if the table contains a large number of records. Instead, prefer pagination (User.limit(100).offset(0)) or batch processing (User.find_each).

2. Finding a Specific Record

User.find(1)  # Finds a user by ID
User.find_by(email: 'john@example.com')  # Finds by attribute

3. Filtering with `where` vs `having`

Post.where(user_id: 2)  # Fetch all posts by user with ID 2

Difference between where and having:

where is used for filtering records before grouping.
having is used for filtering after group operations.

Example:

Post.group(:user_id).having('COUNT(id) > ?', 5)  # Users with more than 5 posts

4. Ordering Results

User.order(:name)  # Order users alphabetically
Post.order(created_at: :desc)  # Order posts by newest first

5. Limiting Results

Post.limit(5)  # Get the first 5 posts

6. Selecting Specific Columns

User.select(:id, :name)  # Only fetch ID and name

7. Fetching Users with a Specific Email Domain

User.where("email LIKE ?", "%@gmail.com")

8. Fetching the Most Recent Posts

Post.order(created_at: :desc).limit(5)

9. Using `pluck` for Efficient Data Retrieval

User.pluck(:email)  # Fetch only emails as an array

10. Checking if a Record Exists Efficiently

User.exists?(email: 'john@example.com')

11. Including Associations (`eager loading` to avoid N+1 queries)

Post.includes(:comments).where(comments: { body: 'Great post!' })

Advanced Queries with Joins

1. Joining Tables (`INNER JOIN`)

Post.joins(:user).where(users: { name: 'John' })

2. Self Join Example

A self-join is useful when dealing with hierarchical relationships, such as an employee-manager structure.

Model Setup

class Employee < ApplicationRecord
  belongs_to :manager, class_name: 'Employee', optional: true
  has_many :subordinates, class_name: 'Employee', foreign_key: 'manager_id'
end

Sample Data

id	name	manager_id
1	Alice	NULL
2	Bob	1
3	Carol	1
4	Dave	2

Query: Find Employees Who Report to Alice

Employee.joins(:manager).where(managers_employees: { name: 'Alice' })

Result:

id	name	manager_id
2	Bob	1
3	Carol	1

This query fetches employees who report to Alice (i.e., those where manager_id = 1).

3. Fetching Users with No Posts (`LEFT JOIN` with `NULL` check)

User.left_outer_joins(:posts).where(posts: { id: nil })

4. Counting Posts Per User

User.joins(:posts).group('users.id').count

Complex Queries in Active Record

1. Fetching Posts with the Most Comments

Post.joins(:comments)
    .group('posts.id')
    .order('COUNT(comments.id) DESC')
    .limit(1)

2. Fetching Posts with More than 5 Comments

Post.joins(:comments)
    .group(:id)
    .having('COUNT(comments.id) > ?', 5)

3. Finding Users Who Liked the Most Comments

User.joins(comments: :likes)
    .group('users.id')
    .select('users.id, users.name, COUNT(likes.id) AS likes_count')
    .order('likes_count DESC')
    .limit(1)

4. Fetching Posts Belonging to Multiple Categories

Post.joins(:categories).group('posts.id').having('COUNT(categories.id) > ?', 1)

5. Fetching the Last Comment of Each Post

Comment.select('DISTINCT ON (post_id) *').order('post_id, created_at DESC')

6. Fetching Users Who Haven’t Commented on a Specific Post

User.where.not(id: Comment.where(post_id: 10).select(:user_id))

7. Fetching Users Who Have Commented on Every Post

User.joins(:comments).group(:id).having('COUNT(DISTINCT comments.post_id) = ?', Post.count)

8. Finding Posts With No Comments

Post.left_outer_joins(:comments).where(comments: { id: nil })

9. Fetching the User Who Created the Most Posts

User.joins(:posts)
    .group('users.id')
    .select('users.id, users.name, COUNT(posts.id) AS post_count')
    .order('post_count DESC')
    .limit(1)

10. Fetching the Most Liked Comment

Comment.joins(:likes)
    .group('comments.id')
    .order('COUNT(likes.id) DESC')
    .limit(1)

11. Fetching Comments with More than 3 Likes and Their Associated Posts

Comment.joins(:likes, :post)
    .group('comments.id', 'posts.id')
    .having('COUNT(likes.id) > ?', 3)

12. Finding Users Who Haven’t Liked Any Comments

User.left_outer_joins(:likes).where(likes: { id: nil })

13. Fetching Users, Their Posts, and the Count of Comments on Each Post

User.joins(posts: :comments)
    .group('users.id', 'posts.id')
    .select('users.id, users.name, posts.id AS post_id, COUNT(comments.id) AS comment_count')
    .order('comment_count DESC')

Importance of `inverse_of` in Model Associations

What is `inverse_of`?

The inverse_of option in Active Record associations helps Rails correctly link objects in memory, avoiding unnecessary database queries and ensuring bidirectional association consistency.

Example Usage

class User < ApplicationRecord
  has_many :posts, inverse_of: :user
end

class Post < ApplicationRecord
  belongs_to :user, inverse_of: :posts
end

Why Use `inverse_of`?

Performance Optimization: Prevents extra queries by using already loaded objects.
Ensures Data Consistency: Updates associations without additional database fetches.
Enables Nested Attributes: Helps when using accepts_nested_attributes_for.

Example:

user = User.new(name: 'Alice')
post = user.posts.build(title: 'First Post')
post.user == user  # True without needing an additional query

Best Practices to use in Rails Projects

1. Using Scopes for Readability

class Post < ApplicationRecord
  scope :recent, -> { order(created_at: :desc) }
end

Post.recent.limit(10)  # Fetch recent posts

2. Using `find_each` for Large Datasets

User.find_each(batch_size: 100) do |user|
  puts user.email
end

3. Avoiding `SELECT *` for Performance

User.select(:id, :name).load

4. Avoiding N+1 Queries with `includes`

Post.includes(:comments).each do |post|
  puts post.comments.count
end

Conclusion

Mastering Active Record queries is essential for writing performant and maintainable Rails applications. By using joins, scopes, batch processing, and eager loading, you can write clean and efficient queries that scale well.

Do you have any favorite Active Record query tricks? Share them in the comments!

Learn SQL: Day 2 – SELECT Queries (The Foundation of SQL)

Today we start writing queries.

Today’s Goals

By the end of Day 2 you should understand:

SELECT
WHERE
ORDER BY
LIMIT
OFFSET
DISTINCT
IN
BETWEEN
LIKE
ILIKE
NULL handling
ActiveRecord equivalents
Common interview questions
Common mistakes

Step 1: Create Our Practice Database

Connect:

psql sql_day1

Let’s create a new table.

			
DROP TABLE IF EXISTS users;
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(255),
  age INTEGER,
  city VARCHAR(100),
  salary NUMERIC(10,2),
  active BOOLEAN,
  created_at TIMESTAMP DEFAULT NOW()
);

		

Insert Sample Data

			
INSERT INTO users
(name, email, age, city, salary, active)
VALUES
('John',  'john@test.com', 30, 'New York', 70000, true),
('Mary',  'mary@test.com', 25, 'Chicago', 60000, true),
('Bob',   'bob@test.com', 35, 'Chicago', 90000, false),
('Alice', 'alice@test.com', 28, 'Boston', 75000, true),
('Tom',   'tom@test.com', 40, 'New York', 120000, false),
('Sara',  'sara@test.com', 32, 'Boston', 85000, true),
('Mike',  'mike@test.com', NULL, 'Chicago', NULL, true);

		

View data:

SELECT * FROM users;

1. SELECT

The most basic query.

SELECT * FROM users;

Meaning:

Give me all columns from users

Output:

id | name | email | age | city | salary

Select Specific Columns

Instead of everything:

SELECT name, email FROM users;

Output:

			
name   | email
--------+---------------
John   | john@test.com
Mary   | mary@test.com

Rails Equivalent

User.select(:name, :email)

Senior Insight

Avoid:

SELECT *

in production systems unless needed.

Why?

Because fetching unnecessary columns:

uses more memory
transfers more data
slows queries

Good:

SELECT id, name FROM users;

2. WHERE Clause

Filters rows.

Example

Only active users.

			
SELECT * FROM users
WHERE active = true;

Rails:

User.where(active: true)

Age Greater Than 30

			
SELECT * FROM users
WHERE age > 30;

Rails:

User.where("age > ?", 30)

Multiple Conditions

			
SELECT * FROM users
WHERE city = 'Chicago'
AND active = true;

Rails:

User.where(city: "Chicago", active: true)

OR

			
SELECT * FROM users
WHERE city = 'Chicago'
OR city = 'Boston';

Rails:

User.where(city: ["Chicago", "Boston"])

Interview Question

Which runs first?

WHERE A OR B AND C

Answer:

AND

before

OR

Use parentheses.

			
WHERE (A OR B)
AND C

3. ORDER BY

Sort results.

Ascending

			
SELECT * FROM users
ORDER BY age ASC;

Smallest age first.

Descending

			
SELECT * FROM users
ORDER BY salary DESC;

Highest salary first.

Rails:

User.order(salary: :desc)

Multiple Columns

			
SELECT * FROM users
ORDER BY city ASC, salary DESC;

Meaning:

			
Sort by city first
Inside each city
sort by salary

Common Interview Question

What happens if you omit ASC/DESC?

ORDER BY age

Default:

ASC

4. LIMIT

Return only N rows.

			
SELECT * FROM users
LIMIT 3;

Rails:

User.limit(3)

Why LIMIT Matters

Imagine:

10 million rows

Fetching all:

			
slow
memory-heavy
unnecessary

LIMIT reduces work.

5. OFFSET

Skip rows.

			
SELECT * FROM users
LIMIT 3
OFFSET 3;

Meaning:

			
Skip first 3
Return next 3

Rails

User.limit(3).offset(3)

Pagination Example

Page 1

LIMIT 10 OFFSET 0

Page 2

LIMIT 10 OFFSET 10

Page 3

LIMIT 10 OFFSET 20

Senior Insight

Large OFFSET values become expensive.

Example:

OFFSET 500000

PostgreSQL still scans through those rows.

Later we’ll learn:

Keyset Pagination

which is much faster.

6. DISTINCT

Remove duplicates.

Example:

SELECT city FROM users;

Result:

			
Chicago
Chicago
Chicago
Boston
Boston
New York

		

Distinct:

SELECT DISTINCT city FROM users;

Result:

			
Chicago
Boston
New York

Rails

User.select(:city).distinct

Multiple Columns

SELECT DISTINCT city, active FROM users;

Distinct applies to the combination.

7. IN

Cleaner alternative to multiple OR conditions.

Instead of:

			
WHERE city='Boston'
OR city='Chicago'
OR city='New York'

Use:

			
WHERE city IN
('Boston','Chicago','New York');

Rails:

User.where(city: ["Boston", "Chicago", "New York"])

8. BETWEEN

Range filtering.

Age between 25 and 35.

			
SELECT * FROM users
WHERE age BETWEEN 25 AND 35;

Equivalent:

			
age >= 25
AND
age <= 35

Rails

User.where(age: 25..35)

Salary Range

			
SELECT * FROM users
WHERE salary BETWEEN 60000 AND 90000;

Interview Question

Is BETWEEN inclusive?

Answer:

YES

Both boundaries included.

9. LIKE

Pattern matching.

Find names beginning with M.

			
SELECT * FROM users
WHERE name LIKE 'M%';

Result:

			
Mary
Mike

Ends With

WHERE email LIKE '%test.com'

Contains

WHERE name LIKE '%ar%'

Matches:

			
Mary
Sara

Wildcards

Symbol	Meaning
%	Any number of chars
_	Exactly one char

Example

WHERE name LIKE '_o%'

Matches:

			
Bob
Tom

10. ILIKE

PostgreSQL-specific.

Case-insensitive LIKE.

			
SELECT * FROM users
WHERE name ILIKE 'john';

Matches:

			
John
JOHN
john
JoHn

Rails

User.where("name ILIKE ?", "john")

Senior Interview Insight

In PostgreSQL:

LIKE

is case-sensitive.

ILIKE

is case-insensitive.

Many developers don’t know this.

11. NULL Handling

This is a favorite interview topic.

Let’s inspect:

SELECT * FROM users;

Mike has:

			
age = NULL
salary = NULL

Wrong

WHERE age = NULL

Returns:

Nothing

Correct

WHERE age IS NULL

Find users with no salary:

			
SELECT * FROM users
WHERE salary IS NULL;

Rails

User.where(age: nil)

Generates:

IS NULL

NOT NULL

			
SELECT * FROM users
WHERE salary IS NOT NULL;

Why NULL Is Special

SQL uses:

			
TRUE
FALSE
UNKNOWN

not just:

			
TRUE
FALSE

This is called:

Three-Valued Logic

Interviewers love asking this.

Practical Exercises

Exercise 1

Find all active users.

Exercise 2

Find users older than 30.

Exercise 3

Find users from Boston.

Exercise 4

Find top 3 highest-paid users.

Exercise 5

Find unique cities.

Exercise 6

Find users aged between 25 and 35.

Exercise 7

Find names starting with S.

Exercise 8

Find users whose salary is NULL.

Combining Everything

Example:

			
SELECT name, city, salary
FROM users
WHERE active = true
  AND city IN ('Chicago', 'Boston')
  AND salary IS NOT NULL
ORDER BY salary DESC
LIMIT 3;

		

Can you explain what this query does before running it?

That’s exactly the kind of reasoning expected in senior interviews.

ActiveRecord Translation Challenge

Convert this SQL:

			
SELECT *
FROM users
WHERE city = 'Chicago'
AND active = true
ORDER BY salary DESC
LIMIT 2;

		

into ActiveRecord.

Common Mistakes

Mistake 1

WHERE age = NULL

Wrong.

Use:

WHERE age IS NULL

Mistake 2

Using:

SELECT *

everywhere.

Mistake 3

Forgetting ORDER BY when using LIMIT.

LIMIT 5

without ordering can return arbitrary rows.

Mistake 4

Using huge OFFSET values.

Senior-Level Knowledge

Understand that SQL logically executes in this order:

			
FROM
WHERE
SELECT
DISTINCT
ORDER BY
LIMIT

		

Even though we write:

			
SELECT ...
FROM ...
WHERE ...

PostgreSQL conceptually processes the clauses in the above order.

This understanding becomes extremely important when we move to:

JOINs
GROUP BY
HAVING
Query Optimization
EXPLAIN ANALYZE

Homework

Create a new table:

			
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100),
  category VARCHAR(100),
  price NUMERIC(10,2),
  stock_quantity INTEGER
);

		

Insert at least 10 records.

Practice:

SELECT specific columns
WHERE with multiple conditions
ORDER BY price DESC
LIMIT 5
DISTINCT categories
BETWEEN on price
LIKE searches
Products with stock_quantity IS NULL

Day 3 Preview

Next we’ll cover one of the most important interview topics:

JOINs

Including:

INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
CROSS JOIN
Self Join
ActiveRecord joins
includes vs joins vs preload vs eager_load
Real Rails interview questions

Day 3 is where SQL starts becoming truly powerful.

Happy Learning! 🚀

Learn SQL: Day 1 – Relational Database Fundamentals

We’re going to learn these topics at three levels simultaneously:

Database Level (PostgreSQL)
SQL Level
Rails / ActiveRecord Level

For every topic, ask yourself:

“How does PostgreSQL store this?”

“How do I query this with SQL?”

“How does Rails represent this?”

This will help you to prepare for interviews. We use PostgreSQL here.

Today’s Goal

By the end of Day 1, you should fully understand:

Database
Table
Row
Column
Primary Key
Foreign Key
Constraints
One-to-One
One-to-Many
Many-to-Many
Rails associations behind them

Part 1: What is a Database?

Imagine you are building an e-commerce application.

You need to store:

Users
Products
Orders
Payments

A database is simply a structured place to store and retrieve that information.

In PostgreSQL:

CREATE DATABASE shop_app;

Connect to it:

psql postgres

Inside psql:

CREATE DATABASE shop_app;

Connect:

\c shop_app

Verify:

SELECT current_database();

Output:

 shop_app

Part 2: What is a Table?

A table is similar to an Excel sheet.

Example:

Users table

id	name	email
1	John	john@test.com
2	Mary	mary@test.com

Create it:

			
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(255)
);

		

Verify:

\d users

Interview Question:

Why not store everything in a single giant table?

Answer:

Because:

duplication increases
maintenance becomes difficult
relationships become unclear
updates become expensive

This concept is called normalization (we’ll study later).

Part 3: Rows

A row represents one record.

Insert data:

			
INSERT INTO users (name, email)
VALUES
('John', 'john@test.com'),
('Mary', 'mary@test.com');

View:

SELECT * FROM users;

Output:

 id | name | email
----+------+----------------
 1  | John | john@test.com
 2  | Mary | mary@test.com

Each row = one user.

Part 4: Columns

Columns describe attributes.

In users table:

			
id
name
email

View columns:

\d users

Senior Insight:

A database table models an entity.

Examples:

Entity	Table
User	users
Product	products
Order	orders

Columns represent attributes of that entity.

Part 5: Primary Keys

Every row needs a unique identifier.

Example:

id BIGSERIAL PRIMARY KEY

Meaning:

No duplicates.

No NULLs.

Try:

			
INSERT INTO users (id, name)
VALUES (1, 'Bob');

You should get:

duplicate key value violates unique constraint

Why Primary Keys Exist

Without a primary key:

			
John
John
John

Which John?

Nobody knows.

Primary key solves identity.

Rails Equivalent

Migration:

			
create_table :users do |t|
  t.string :name
  t.string :email
end

Rails automatically adds:

id

as the primary key.

Part 6: Constraints

Constraint = database rule.

Interviewers love this topic.

NOT NULL

Create:

			
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL
);

Try:

			
INSERT INTO products(name)
VALUES(NULL);

Fails.

UNIQUE

			
CREATE TABLE customers (
  id BIGSERIAL PRIMARY KEY,
  email VARCHAR(255) UNIQUE
);

Duplicate email:

			
INSERT INTO customers(email)
VALUES('test@test.com');
INSERT INTO customers(email)
VALUES('test@test.com');

Fails.

CHECK Constraint

Age must be positive.

			
CREATE TABLE employees (
  id BIGSERIAL PRIMARY KEY,
  age INTEGER CHECK(age > 0)
);

Fails:

			
INSERT INTO employees(age)
VALUES(-5);

Why Constraints Matter

Junior developer:

validates :email, uniqueness: true

Senior developer:

			
validates :email, uniqueness: true
+
UNIQUE(email)

Because application validations can be bypassed.

Database constraints cannot.

Part 7: Foreign Keys

Now let’s model:

User has many orders.

Create users:

			
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100)
);

Create orders:

			
CREATE TABLE orders (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT,
  total NUMERIC(10,2)
);

		

Foreign key:

			
ALTER TABLE orders
ADD CONSTRAINT fk_orders_user
FOREIGN KEY (user_id)
REFERENCES users(id);

Insert user:

			
INSERT INTO users(name)
VALUES('John');

Insert order:

			
INSERT INTO orders(user_id,total)
VALUES(1,100);

Works.

Try:

			
INSERT INTO orders(user_id,total)
VALUES(999,100);

Fails.

Because user doesn’t exist.

Why Foreign Keys Exist

Without them:

Order belongs to user 999

But user 999 doesn’t exist.

Database becomes corrupted.

Rails Equivalent

			
class User < ApplicationRecord
  has_many :orders
end
class Order < ApplicationRecord
  belongs_to :user
end

		

Migration:

			
t.references :user,
             null: false,
             foreign_key: true

Rails creates:

			
user_id
FOREIGN KEY

behind the scenes.

Part 8: One-to-Many Relationship

Most common relationship.

Example:

User -> Orders

One user:

John

Many orders:

			
Order 1
Order 2
Order 3

Diagram:

			
users
-----
id
orders
------
id
user_id

		

Rails:

			
User has_many :orders
Order belongs_to :user

Practical Exercise

Insert:

			
INSERT INTO users(name)
VALUES('Mary');

Orders:

			
INSERT INTO orders(user_id,total)
VALUES
(2,50),
(2,75),
(2,120);

		

Query:

			
SELECT *
FROM orders
WHERE user_id = 2;

Part 9: One-to-One Relationship

Less common.

Example:

			
User
Profile

Each user has exactly one profile.

Create profile table:

			
CREATE TABLE profiles (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT UNIQUE,
  bio TEXT,
  FOREIGN KEY(user_id)
  REFERENCES users(id)
);

		

Notice:

UNIQUE(user_id)

This forces:

			
One user
One profile

Rails:

			
class User < ApplicationRecord
  has_one :profile
end
class Profile < ApplicationRecord
  belongs_to :user
end

		

Interview Question:

How does a database enforce one-to-one?

Answer:

			
FOREIGN KEY
+
UNIQUE

on the foreign key column.

Part 10: Many-to-Many Relationship

Classic interview topic.

Example:

			
Students
Courses

Student can enroll in many courses.

Course can have many students.

Create students:

			
CREATE TABLE students (
  id BIGSERIAL PRIMARY KEY,
  name VARCHAR(100)
);

Create courses:

			
CREATE TABLE courses (
  id BIGSERIAL PRIMARY KEY,
  title VARCHAR(100)
);

Need a join table:

			
CREATE TABLE enrollments (
  student_id BIGINT,
  course_id BIGINT,
  PRIMARY KEY(student_id, course_id),
  FOREIGN KEY(student_id)
    REFERENCES students(id),
  FOREIGN KEY(course_id)
    REFERENCES courses(id)
);

		

Diagram:

			
students
    |
    |
enrollments
    |
    |
courses

		

Rails

			
class Student < ApplicationRecord
  has_many :enrollments
  has_many :courses, through: :enrollments
end
class Course < ApplicationRecord
  has_many :enrollments
  has_many :students, through: :enrollments
end
class Enrollment < ApplicationRecord
  belongs_to :student
  belongs_to :course
end

		

Senior-Level Insight

Most Rails developers stop at:

			
has_many
belongs_to

Strong backend engineers understand:

			
Association
      ↓
Foreign Key
      ↓
Constraint
      ↓
Index
      ↓
Storage

		

That understanding helps you:

debug production issues
optimize queries
design schemas
answer interview questions confidently

Interview Questions

Try answering without looking.

Q1

Difference between:

PRIMARY KEY

and

UNIQUE

Q2

Can a table have multiple UNIQUE constraints?

Q3

Can a table have multiple PRIMARY KEYS?

Q4

How is a one-to-one relationship implemented in PostgreSQL?

Q5

Why should foreign keys exist even when Rails validations exist?

Q6

What problem does a join table solve?

Practical Lab (Run Everything)

Create a fresh database:

CREATE DATABASE interview_sql_day1;

Connect:

\c interview_sql_day1

Create:

			
users
profiles
orders
students
courses
enrollments

		

Insert sample data.

Then practice:

			
SELECT * FROM users;
SELECT * FROM orders;
SELECT * FROM profiles;
SELECT * FROM enrollments;

Try intentionally violating:

PRIMARY KEY
UNIQUE
NOT NULL
FOREIGN KEY

and observe PostgreSQL’s error messages.

A senior engineer learns a lot from database errors.

Homework

Exercise 1

Create:

			
authors
books

One author → many books

Add proper foreign keys.

Exercise 2

Create:

			
employees
employee_details

One-to-one relationship.

Exercise 3

Create:

			
movies
actors
movie_actors

Many-to-many relationship.

Insert:

3 movies
5 actors

Create relationships.

Exercise 4

For every relationship above, write the equivalent Rails models and associations.

Day 2 Preview

Next we’ll cover the foundation of everything in SQL:

SELECT Queries

Including:

SELECT
WHERE
ORDER BY
LIMIT
OFFSET
DISTINCT
IN
BETWEEN
LIKE
ILIKE
NULL handling

plus PostgreSQL execution behavior and ActiveRecord equivalents.

This is where real querying begins.

Happy Learning! 🚀

🧬 Extracting and Joining on Ancestry Values in PostgreSQL: A Complete Guide

I am working on a project where we face issues in an ancestral path data in PostgreSql DB. Working with hierarchical data in PostgreSQL often involves dealing with ancestry paths stored as delimited strings. This comprehensive guide explores how to extract specific values from ancestry columns and utilize them effectively in join operations, complete with practical examples, troubleshooting tips and how I fixed the issues.

🎯 Introduction

PostgreSQL’s robust string manipulation capabilities make it ideal for handling complex hierarchical data structures. When working with ancestry values stored in text columns, you often need to extract specific parts of the hierarchy for data analysis, reporting, or joining operations.

This article demonstrates how to:

✨ Extract values from ancestry strings using regular expressions
🔗 Perform efficient joins on extracted ancestry data
🛡️ Handle edge cases and avoid common pitfalls
⚡ Optimize queries for better performance

❓ Problem Statement

📊 Scenario

Consider a projects table with an ancestry column containing hierarchical paths like:

-- Sample ancestry values
"6/4/5/3"     -- Parent chain: 6 → 4 → 5 → 3
"1/2"         -- Parent chain: 1 → 2
"9"           -- Single parent: 9
NULL          -- Root level project

🎯 Goal

We need to:

Extract the last integer value from the ancestry path
Use this value in a JOIN operation to fetch parent project data
Handle edge cases like NULL values and malformed strings

🏗️ Understanding the Data Structure

📁 Table Structure

CREATE TABLE projects (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    ancestry TEXT,  -- Stores parent hierarchy as "id1/id2/id3"
    created_at TIMESTAMP DEFAULT NOW()
);

-- Sample data
INSERT INTO projects (id, name, ancestry) VALUES
    (1, 'Root Project', NULL),
    (2, 'Department A', '1'),
    (3, 'Team Alpha', '1/2'),
    (4, 'Task 1', '1/2/3'),
    (5, 'Subtask 1A', '1/2/3/4');

🧭 Ancestry Path Breakdown

Project ID	Name	Ancestry	Immediate Parent
1	Root Project	`NULL`	None (root)
2	Department A	`1`	1
3	Team Alpha	`1/2`	2
4	Task 1	`1/2/3`	3
5	Subtask 1A	`1/2/3/4`	4

🔧 Solution Overview

🎯 Core Approach

🔍 Pattern Matching: Use regex to identify the last number in the ancestry string
✂️ Value Extraction: Extract the matched value using regexp_replace()
🔄 Type Conversion: Cast the extracted string to the appropriate numeric type
🔗 Join Operation: Use the converted value in JOIN conditions

📝 Basic Query Structure

SELECT projects.*
FROM projects
LEFT OUTER JOIN projects AS parent_project 
    ON CAST(
        regexp_replace(projects.ancestry, '.*\/(\d+)$', '\1')
        AS BIGINT
    ) = parent_project.id
WHERE projects.ancestry IS NOT NULL;

📝 Regular Expression Deep Dive

🎯 **Pattern Breakdown: `.*\/(\d+)$`**

Let’s dissect this regex pattern:

.*      -- Match any characters (greedy)
\/      -- Match literal forward slash
(\d+)   -- Capture group: one or more digits
$       -- End of string anchor

📊 Pattern Matching Examples

Ancestry String	Regex Match	Captured Group	Result
`"6/4/5/3"`	`5/3`	`3`	✅ 3
`"1/2"`	`1/2`	`2`	✅ 2
`"9"`	No match	–	❌ Original string
`"abc/def"`	No match	–	❌ Original string

🔧 Alternative Regex Patterns

-- For single-level ancestry (no slashes)
regexp_replace(ancestry, '^(\d+)$', '\1')

-- For extracting first parent instead of last
regexp_replace(ancestry, '^(\d+)\/.*', '\1')

-- For handling mixed delimiters (/ or -)
regexp_replace(ancestry, '.*[\/\-](\d+)$', '\1')

💻 Implementation Examples

🔧 Example 1: Basic Parent Lookup

-- Find each project with its immediate parent information
SELECT 
    p.id,
    p.name AS project_name,
    p.ancestry,
    parent.id AS parent_id,
    parent.name AS parent_name
FROM projects p
LEFT OUTER JOIN projects parent 
    ON CAST(
        regexp_replace(p.ancestry, '.*\/(\d+)$', '\1')
        AS BIGINT
    ) = parent.id
WHERE p.ancestry IS NOT NULL
ORDER BY p.id;

Expected Output:

 id | project_name | ancestry | parent_id | parent_name
----+--------------+----------+-----------+-------------
  2 | Department A | 1        |         1 | Root Project
  3 | Team Alpha   | 1/2      |         2 | Department A
  4 | Task 1       | 1/2/3    |         3 | Team Alpha
  5 | Subtask 1A   | 1/2/3/4  |         4 | Task 1

🎯 Example 2: Handling Edge Cases

-- Robust query that handles all edge cases
SELECT 
    p.id,
    p.name AS project_name,
    p.ancestry,
    CASE 
        WHEN p.ancestry IS NULL THEN 'Root Level'
        WHEN p.ancestry !~ '.*\/(\d+)$' THEN 'Single Parent'
        ELSE 'Multi-level'
    END AS hierarchy_type,
    parent.name AS parent_name
FROM projects p
LEFT OUTER JOIN projects parent ON 
    CASE 
        -- Handle multi-level ancestry
        WHEN p.ancestry ~ '.*\/(\d+)$' THEN
            CAST(regexp_replace(p.ancestry, '.*\/(\d+)$', '\1') AS BIGINT)
        -- Handle single-level ancestry
        WHEN p.ancestry ~ '^\d+$' THEN
            CAST(p.ancestry AS BIGINT)
        ELSE NULL
    END = parent.id
ORDER BY p.id;

📈 Example 3: Aggregating Child Counts

-- Count children for each project
WITH parent_child_mapping AS (
    SELECT 
        p.id AS child_id,
        CASE 
            WHEN p.ancestry ~ '.*\/(\d+)$' THEN
                CAST(regexp_replace(p.ancestry, '.*\/(\d+)$', '\1') AS BIGINT)
            WHEN p.ancestry ~ '^\d+$' THEN
                CAST(p.ancestry AS BIGINT)
            ELSE NULL
        END AS parent_id
    FROM projects p
    WHERE p.ancestry IS NOT NULL
)
SELECT 
    p.id,
    p.name,
    COUNT(pcm.child_id) AS direct_children_count
FROM projects p
LEFT JOIN parent_child_mapping pcm ON p.id = pcm.parent_id
GROUP BY p.id, p.name
ORDER BY direct_children_count DESC;

🚨 Common Errors and Solutions

❌ Error 1: “invalid input syntax for type bigint”

Problem:

-- ❌ Incorrect: Casting entire ancestry string
CAST(projects.ancestry AS BIGINT) = parent.id

Solution:

-- ✅ Correct: Cast only the extracted value
CAST(
    regexp_replace(projects.ancestry, '.*\/(\d+)$', '\1') 
    AS BIGINT
) = parent.id

❌ Error 2: Unexpected Results with Single-Level Ancestry

Problem: Single values like "9" don’t match the pattern .*\/(\d+)$

Solution:

-- ✅ Handle both multi-level and single-level ancestry
CASE 
    WHEN ancestry ~ '.*\/(\d+)$' THEN
        CAST(regexp_replace(ancestry, '.*\/(\d+)$', '\1') AS BIGINT)
    WHEN ancestry ~ '^\d+$' THEN
        CAST(ancestry AS BIGINT)
    ELSE NULL
END

❌ Error 3: NULL Ancestry Values Causing Issues

Problem: NULL values can cause unexpected behaviour in joins

Solution:

-- ✅ Explicitly handle NULL values
WHERE ancestry IS NOT NULL 
AND ancestry != ''

🛡️ Complete Error-Resistant Query

SELECT 
    p.id,
    p.name AS project_name,
    p.ancestry,
    parent.id AS parent_id,
    parent.name AS parent_name
FROM projects p
LEFT OUTER JOIN projects parent ON 
    CASE 
        WHEN p.ancestry IS NULL OR p.ancestry = '' THEN NULL
        WHEN p.ancestry ~ '.*\/(\d+)$' THEN
            CAST(regexp_replace(p.ancestry, '.*\/(\d+)$', '\1') AS BIGINT)
        WHEN p.ancestry ~ '^\d+$' THEN
            CAST(p.ancestry AS BIGINT)
        ELSE NULL
    END = parent.id
ORDER BY p.id;

⚡ Performance Considerations

📊 Indexing Strategies

-- Create index on ancestry for faster pattern matching
CREATE INDEX idx_projects_ancestry ON projects (ancestry);

-- Create partial index for non-null ancestry values
CREATE INDEX idx_projects_ancestry_not_null 
ON projects (ancestry) 
WHERE ancestry IS NOT NULL;

-- Create functional index for extracted parent IDs
CREATE INDEX idx_projects_parent_id ON projects (
    CASE 
        WHEN ancestry ~ '.*\/(\d+)$' THEN
            CAST(regexp_replace(ancestry, '.*\/(\d+)$', '\1') AS BIGINT)
        WHEN ancestry ~ '^\d+$' THEN
            CAST(ancestry AS BIGINT)
        ELSE NULL
    END
) WHERE ancestry IS NOT NULL;

🔄 Query Optimization Tips

🎯 Use CTEs for Complex Logic

WITH parent_lookup AS (
    SELECT 
        id,
        CASE 
            WHEN ancestry ~ '.*\/(\d+)$' THEN
                CAST(regexp_replace(ancestry, '.*\/(\d+)$', '\1') AS BIGINT)
            WHEN ancestry ~ '^\d+$' THEN
                CAST(ancestry AS BIGINT)
        END AS parent_id
    FROM projects
    WHERE ancestry IS NOT NULL
)
SELECT p.*, parent.name AS parent_name
FROM parent_lookup p
JOIN projects parent ON p.parent_id = parent.id;

⚡ Consider Materialized Views for Frequent Queries

CREATE MATERIALIZED VIEW project_hierarchy AS
SELECT 
    p.id,
    p.name,
    p.ancestry,
    CASE 
        WHEN p.ancestry ~ '.*\/(\d+)$' THEN
            CAST(regexp_replace(p.ancestry, '.*\/(\d+)$', '\1') AS BIGINT)
        WHEN p.ancestry ~ '^\d+$' THEN
            CAST(p.ancestry AS BIGINT)
    END AS parent_id
FROM projects p;

-- Refresh when data changes
REFRESH MATERIALIZED VIEW project_hierarchy;

🛠️ Advanced Techniques

🔍 Extracting Multiple Ancestry Levels

-- Extract all ancestry levels as an array
SELECT 
    id,
    name,
    ancestry,
    string_to_array(ancestry, '/') AS ancestry_array,
    -- Get specific levels
    split_part(ancestry, '/', 1) AS level_1,
    split_part(ancestry, '/', 2) AS level_2,
    split_part(ancestry, '/', -1) AS last_level
FROM projects
WHERE ancestry IS NOT NULL;

🧮 Calculating Hierarchy Depth

-- Calculate the depth of each project in the hierarchy
SELECT 
    id,
    name,
    ancestry,
    CASE 
        WHEN ancestry IS NULL THEN 0
        ELSE array_length(string_to_array(ancestry, '/'), 1)
    END AS hierarchy_depth
FROM projects
ORDER BY hierarchy_depth, id;

🌳 Building Complete Hierarchy Paths

-- Recursive CTE to build full hierarchy paths
WITH RECURSIVE hierarchy_path AS (
    -- Base case: root projects
    SELECT 
        id,
        name,
        ancestry,
        name AS full_path,
        0 AS level
    FROM projects 
    WHERE ancestry IS NULL

    UNION ALL

    -- Recursive case: child projects
    SELECT 
        p.id,
        p.name,
        p.ancestry,
        hp.full_path || ' → ' || p.name AS full_path,
        hp.level + 1 AS level
    FROM projects p
    JOIN hierarchy_path hp ON 
        CASE 
            WHEN p.ancestry ~ '.*\/(\d+)$' THEN
                CAST(regexp_replace(p.ancestry, '.*\/(\d+)$', '\1') AS BIGINT)
            WHEN p.ancestry ~ '^\d+$' THEN
                CAST(p.ancestry AS BIGINT)
        END = hp.id
)
SELECT * FROM hierarchy_path
ORDER BY level, id;

✅ Best Practices

🎯 Data Validation

✅ Validate Ancestry Format on Insert/Update

-- Add constraint to ensure valid ancestry format
ALTER TABLE projects 
ADD CONSTRAINT check_ancestry_format 
CHECK (
    ancestry IS NULL 
    OR ancestry ~ '^(\d+)(\/\d+)*$'
);

🔍 Regular Data Integrity Checks

-- Find orphaned projects (ancestry points to non-existent parent)
SELECT p.id, p.name, p.ancestry
FROM projects p
WHERE p.ancestry IS NOT NULL
AND NOT EXISTS (
    SELECT 1 FROM projects parent
    WHERE parent.id = CASE 
        WHEN p.ancestry ~ '.*\/(\d+)$' THEN
            CAST(regexp_replace(p.ancestry, '.*\/(\d+)$', '\1') AS BIGINT)
        WHEN p.ancestry ~ '^\d+$' THEN
            CAST(p.ancestry AS BIGINT)
    END
);

🛡️ Error Handling

-- Function to safely extract parent ID
CREATE OR REPLACE FUNCTION extract_parent_id(ancestry_text TEXT)
RETURNS BIGINT AS $$
BEGIN
    IF ancestry_text IS NULL OR ancestry_text = '' THEN
        RETURN NULL;
    END IF;

    IF ancestry_text ~ '.*\/(\d+)$' THEN
        RETURN CAST(regexp_replace(ancestry_text, '.*\/(\d+)$', '\1') AS BIGINT);
    ELSIF ancestry_text ~ '^\d+$' THEN
        RETURN CAST(ancestry_text AS BIGINT);
    ELSE
        RETURN NULL;
    END IF;
EXCEPTION 
    WHEN OTHERS THEN
        RETURN NULL;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

-- Usage
SELECT p.*, parent.name AS parent_name
FROM projects p
LEFT JOIN projects parent ON extract_parent_id(p.ancestry) = parent.id;

📊 Monitoring and Maintenance

-- Query to analyze ancestry data quality
SELECT 
    'Total Projects' AS metric,
    COUNT(*) AS count
FROM projects

UNION ALL

SELECT 
    'Projects with Ancestry' AS metric,
    COUNT(*) AS count
FROM projects 
WHERE ancestry IS NOT NULL

UNION ALL

SELECT 
    'Valid Ancestry Format' AS metric,
    COUNT(*) AS count
FROM projects 
WHERE ancestry ~ '^(\d+)(\/\d+)*$'

UNION ALL

SELECT 
    'Orphaned Projects' AS metric,
    COUNT(*) AS count
FROM projects p
WHERE p.ancestry IS NOT NULL
AND extract_parent_id(p.ancestry) NOT IN (SELECT id FROM projects);

📝 Conclusion

Working with ancestry data in PostgreSQL requires careful handling of string manipulation, type conversion, and edge cases. By following the techniques outlined in this guide, you can:

🎯 Key Takeaways

🔍 Use robust regex patterns to handle different ancestry formats
🛡️ Always handle edge cases like NULL values and malformed strings
⚡ Consider performance implications and use appropriate indexing
✅ Implement data validation to maintain ancestry integrity
🔧 Create reusable functions for complex extraction logic

💡 Final Recommendations

🎯 Test thoroughly with various ancestry formats
📊 Monitor query performance and optimize as needed
🔄 Consider alternative approaches like ltree for complex hierarchies
📚 Document your ancestry format for team members
🛠️ Implement proper error handling in production code

The techniques demonstrated here provide a solid foundation for working with hierarchical data in PostgreSQL. Whether you’re building organizational charts, category trees, or project hierarchies, these patterns will help you extract and manipulate ancestry data effectively and reliably! 🚀

📖 Additional Resources

PostgreSQL String Functions Documentation
PostgreSQL ltree Extension (for advanced hierarchical data)
Regular Expressions in PostgreSQL
PostgreSQL Performance Tips

PostgreSQL commands to remember

List of commands to remember using postgres DB managment system.

# login to psql client
psql postgres # OR
psql -U postgres
create database mydb; # create db
create user abhilash with SUPERUSER CREATEDB CREATEROLE encrypted password 'abhilashPass!'; 
grant all privileges on database mydb to myuser; # add privileges

Connect to DB, List tables and users, functions, views, schema

\l # lists all the databases
\c dbname # connect to db
\dt # show tables
\d table_name # Describe a table
\dn # List available schema
\df #  List available functions
\dS [your_table_name] # List triggers
\dv # List available views
\du # lists all user accounts and roles 
\du+ # is the extended version which shows even more information.

Show history, save to file, edit using editor, execution time, help

SELECT version(); # version of psql
\g  # Execute the previous command
\s # Command history
\s filename # save Command history to a file
\i filename # Execute psql commands from a file
\? # help on psql commands
\h ALTER TABLE # To get help on specific PostgreSQL statement
\timing #  Turn on/off query execution time
\e # Edit command in your own editor
\e [function_name] # It is more useful when you edit a function in the editor. Do \df for functions
\o [file_name] # send all next query results to file
    \o out.txt
    \dt 
    \o # switch
    \dt

Change output, Quit psql

# Switch output options
\a command switches from aligned to non-aligned column output.
\H command formats the output to HTML format.
\q # quit psql

Reference: https://www.postgresqltutorial.com/postgresql-administration/psql-commands/

PostgreSQL Cheat Sheet

CREATE DATABASE

CREATE DATABASE dbName;

CREATE TABLE (with auto numbering integer id)

CREATE TABLE tableName (
 id serial PRIMARY KEY,
 name varchar(50) UNIQUE NOT NULL,
 dateCreated timestamp DEFAULT current_timestamp
);

Add a primary key

ALTER TABLE tableName ADD PRIMARY KEY (id);

Create an INDEX

CREATE UNIQUE INDEX indexName ON tableName (columnNames);

Backup a database (command line)

pg_dump dbName > dbName.sql

Backup all databases (command line)

pg_dumpall > pgbackup.sql

Run a SQL script (command line)

psql -f script.sql databaseName

Search using a regular expression

SELECT column FROM table WHERE column ~ 'foo.*';

The first N records

SELECT columns FROM table LIMIT 10;

Pagination

SELECT cols FROM table LIMIT 10 OFFSET 30;

Prepared Statements

PREPARE preparedInsert (int, varchar) AS
  INSERT INTO tableName (intColumn, charColumn) VALUES ($1, $2);
EXECUTE preparedInsert (1,'a');
EXECUTE preparedInsert (2,'b');
DEALLOCATE preparedInsert;

Create a Function

CREATE OR REPLACE FUNCTION month (timestamp) RETURNS integer 
 AS 'SELECT date_part(''month'', $1)::integer;'
LANGUAGE 'sql';

Table Maintenance

VACUUM ANALYZE table;

Reindex a database, table or index

REINDEX DATABASE dbName;

Show query plan

EXPLAIN SELECT * FROM table;

Import from a file

COPY destTable FROM '/tmp/somefile';

Show all runtime parameters

SHOW ALL;

Grant all permissions to a user

GRANT ALL PRIVILEGES ON table TO username;

Perform a transaction

BEGIN TRANSACTION 
 UPDATE accounts SET balance += 50 WHERE id = 1;
COMMIT;

Basic SQL

Get all columns and rows from a table

SELECT * FROM table;

Add a new row

INSERT INTO table (column1,column2)
VALUES (1, 'one');

Update a row

UPDATE table SET foo = 'bar' WHERE id = 1;

Delete a row

DELETE FROM table WHERE id = 1;

From: https://www.petefreitag.com/cheatsheets/postgresql/

Why Indexing on Unique Columns (like email) Improves Lookup 🔍

Use Case

🌲 How SQL Stores Indexes Internally (PostgreSQL)

📚 PostgreSQL uses B-Tree indexes by default.

⚙️ Unique Index = Even Faster

🧠 Summary

What Exactly is a Clustered Index in MySQL (InnoDB)?

📈 Visual for MySQL Clustered Index

📈 Quick Visual:

📚 How Secondary Indexes Work

🔵 PostgreSQL Secondary Index Behavior

🟠 MySQL InnoDB Secondary Index Behavior

📈 Quick Visual:

🌐 Now, let’s do some Real SQL Query ⛁ Examples!

1. Simple SELECT * FROM users WHERE user_id = 102;

2. SELECT username FROM users WHERE user_id = 102; (Only 1 Column)

3. SELECT * FROM users WHERE username = 'Bob'; (Secondary Index Search)

Consider the below situation:

🔵 PostgreSQL Behavior

🟠 MySQL InnoDB Behavior

🔥 Why sometimes Postgres can still be faster?

🎯 Conclusion:

Isn’t PostgreSQL also doing 2 B-tree scans? One for secondary index and one for table (row_id)?

Let’s look at PostgreSQL first:

🟠 Meanwhile in MySQL InnoDB:

Real-world Summary:

Is heap fetch a searching technique? Why is it faster than B-tree?

📚 Let’s start from the basics:

🔵 How PostgreSQL uses TID?

📈 Visual Example

Answer:

🌿 B-tree vs 📁 Heap Fetch

🎯 Final and short answer:

🧩 Our exact question:

🔵 Here’s the real truth:

📚 Why?

📈 Diagram (imagine the table file):

🤔 What happens technically?

If in memory (shared buffers / page cache):

If from disk (rare if cached):

📊 Final Step: Inside the Block

⚡ So to summarize:

🔥 Key concept:

🎯 Simple Real World Example:

🧠 Deep thought:

🔑 What Is a Composite Key?

👉 Example: Composite Key in Action

Without Composite Key:

With Composite Key:

📌 Why Use Composite Keys?

⚠️ Things to Keep in Mind

🔎 What Is a Candidate Key?

👉 Example: Users Table

Candidate Keys:

📌 Characteristics of Candidate Keys

👥 Candidate Key vs Composite Key

💡 When Designing a Database

🛍️ Example Schema: Shopping App

1. Users Table

Candidate Keys:

2. Products Table

Candidate Keys:

3. Orders Table

Candidate Keys:

4. OrderItems Table (Join Table)

Candidate Key:

👀 Summary of Candidate Keys by Table

🔹 1. Candidate Keys in SQL (PostgreSQL Example)

💎 Composite Key Example (OrderItems)

🔸 2. Candidate Keys in Rails (ActiveRecord)

✅ users Migration (with candidate keys)

✅ User Model

✅ Composite Key with OrderItem (Join Table)

Migration:

Model:

➕ Extra: Use composite_primary_keys Gem (Optional)

1. Product Table

2. Product Variant Table

🔍 What Is Data Normalization?

🧐 Why Normalize?

Why Indexing on Unique Columns (like `email`) Improves Lookup 🔍

1. **Simple `SELECT * FROM users WHERE user_id = 102;`**

2. `SELECT username FROM users WHERE user_id = 102;` (Only 1 Column)

3. **`SELECT * FROM users WHERE username = 'Bob';` (Secondary Index Search)**

✅ `users` Migration (with candidate keys)

✅ `User` Model

✅ Composite Key with `OrderItem` (Join Table)

➕ Extra: Use `composite_primary_keys` Gem (Optional)

2. Update `Gemfile`

3. Update `config/database.yml`

8. Update `.gitignore`

What does `precision: 1, scale: 1` mean?

In Rails (via `with` from gems like `scenic` or `with_cte`):

🔹 `store`

🔹 `store_accessor`