Mysql – The Rails Drop

Rails 8 App: Setup Test DB in PostgreSQL | Query Performance Using EXPLAIN ANALYZE

Now we’ll go full-on query performance pro mode using EXPLAIN ANALYZE and real plans. We’ll learn how PostgreSQL makes decisions, how to catch slow queries, and how your indexes make them 10x faster.

💎 Part 1: What is `EXPLAIN ANALYZE`?

EXPLAIN shows how PostgreSQL plans to execute your query.

ANALYZE runs the query and adds actual time, rows, loops, etc.

Syntax:

EXPLAIN ANALYZE
SELECT * FROM users WHERE username = 'bob';

✏️ Example 1: Without Index

SELECT * FROM users WHERE username = 'bob';

If username has no index, plan shows:

Seq Scan on users
  Filter: (username = 'bob')
  Rows Removed by Filter: 9999

❌ PostgreSQL scans all rows = Sequential Scan = slow!

➕ Add Index:

CREATE INDEX idx_users_username ON users (username);

Now rerun:

EXPLAIN ANALYZE SELECT * FROM users WHERE username = 'bob';

You’ll see:

Index Scan using idx_users_username on users
  Index Cond: (username = 'bob')

✅ PostgreSQL uses B-tree index
🚀 Massive speed-up!

🔥 Want even faster?

SELECT username FROM users WHERE username = 'bob';

If PostgreSQL shows:

Index Only Scan using idx_users_username on users
  Index Cond: (username = 'bob')

🎉 Index Only Scan! = covering index success!
No heap fetch = lightning-fast.

⚠️ Note: Index-only scan only works if:

Index covers all selected columns
Table is vacuumed (PostgreSQL uses visibility map)

If you still get Seq scan output like:

test_db=# EXPLAIN ANALYSE SELECT * FROM users where username = 'aman_chetri';
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..1.11 rows=1 width=838) (actual time=0.031..0.034 rows=1 loops=1)
   Filter: ((username)::text = 'aman_chetri'::text)
   Rows Removed by Filter: 2
 Planning Time: 0.242 ms
 Execution Time: 0.077 ms
(5 rows)

even after adding an index, because PostgreSQL is saying:

🤔 “The table is so small (cost = 1.11), scanning the whole thing is cheaper than using the index.”
Also: Your query uses only SELECT username, which could be eligible for Index Only Scan, but heap fetch might still be needed due to visibility map.

🔧 Step-by-step Fix:

✅ 1. Add Data for Bigger Table

If the table is small (few rows), PostgreSQL will prefer Seq Scan no matter what.

Try adding ~10,000 rows:

INSERT INTO users (username, email, phone_number)
SELECT 'user_' || i, 'user_' || i || '@mail.com', '1234567890'
FROM generate_series(1, 10000) i;

Then VACUUM ANALYZE users; again and retry EXPLAIN.

✅ 2. Confirm Index Exists

First, check your index exists and is recognized:

\d users

You should see something like:

Indexes:
    "idx_users_username" btree (username)

If not, add:

CREATE INDEX idx_users_username ON users(username);

✅ 3. Run `ANALYZE` (Update Stats)

ANALYZE users;

This updates statistics — PostgreSQL might not be using the index if it thinks only one row matches or the table is tiny.

✅ 4. Vacuum for Index-Only Scan

Index-only scans require the visibility map to be set.

Run:

VACUUM ANALYZE users;

This marks pages in the table as “all-visible,” enabling PostgreSQL to avoid reading the heap.

✅ 5. Force PostgreSQL to Consider Index

You can turn off sequential scan temporarily (for testing):

SET enable_seqscan = OFF;

EXPLAIN SELECT username FROM users WHERE username = 'bob';

You should now see:

Index Scan using idx_users_username on users ...

⚠️ Use this only for testing/debugging — not in production.

💡 Extra Tip (optional): Use `EXPLAIN (ANALYZE, BUFFERS)`

EXPLAIN (ANALYZE, BUFFERS)
SELECT username FROM users WHERE username = 'bob';

This will show:

Whether heap was accessed
Buffer hits
Actual rows

📋 Summary

Step	Command
Check Index	`\d users`
Analyze table	`ANALYZE users;`
Vacuum for visibility	`VACUUM ANALYZE users;`
Disable seq scan for test	`SET enable_seqscan = OFF;`
Add more rows (optional)	`INSERT INTO ...`

🚨 How to catch bad index usage?

Always look for:

“Seq Scan” instead of “Index Scan” ➔ missing index
“Heap Fetch” ➔ not a covering index
“Rows Removed by Filter” ➔ inefficient filtering
“Loops: 1000+” ➔ possible N+1 issue

Common Pattern Optimizations

Pattern	Fix
`WHERE column = ?`	B-tree index on `column`
`WHERE column LIKE 'prefix%'`	B-tree works (with text_ops)
`SELECT col1 WHERE col2 = ?`	Covering index: `(col2, col1)` or `(col2) INCLUDE (col1)`
`WHERE col BETWEEN ?`	Composite index with range second: `(status, created_at)`
`WHERE col IN (?, ?, ?)`	Index still helps
`ORDER BY col LIMIT 10`	Index on `col` helps sort fast

⚡ Tip: Use `pg_stat_statements` to Find Slow Queries

Enable it in postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'

Then run:

SELECT query, total_exec_time, calls
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;

🎯 Find your worst queries & optimize them with new indexes!

🧪 Try It Yourself

Want a little lab setup to practice?

CREATE TABLE users (
  user_id serial PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150),
  phone_number VARCHAR(20)
);

-- Insert 100K fake rows
INSERT INTO users (username, email, phone_number)
SELECT
  'user_' || i,
  'user_' || i || '@example.com',
  '999-000-' || i
FROM generate_series(1, 100000) i;

Then test:

EXPLAIN ANALYZE SELECT * FROM users WHERE username = 'user_5000';
Add INDEX ON username
Re-run, compare speed!

🎯 Extra Pro Tools for Query Performance

EXPLAIN ANALYZE → Always first tool
pg_stat_statements → Find slow queries in real apps
auto_explain → Log slow plans automatically
pgBadger or pgHero → Visual query monitoring

💥 Now We Know:

✅ How to read query plans
✅ When you’re doing full scans vs index scans
✅ How to achieve index-only scans
✅ How to catch bad performance early
✅ How to test and fix in real world

Happy Performance Fixing.. 🚀

Rails 8 App: Setup Test DB | Comprehensive Guide 📖 for PostgreSQL , Mysql Indexing – PostgreSQL Heap ⛰ vs Mysql InnoDB B-Tree 🌿

Enter into psql terminal:

✗ psql postgres
psql (14.17 (Homebrew))
Type "help" for help.

postgres=# \l
                                     List of databases
           Name            |  Owner   | Encoding | Collate | Ctype |   Access privileges
---------------------------+----------+----------+---------+-------+-----------------------
 studio_development | postgres | UTF8     | C       | C     |

Create a new test database
Create a users Table
Check the db and table details

postgres=# create database test_db;
CREATE DATABASE

test_db=# CREATE TABLE users (
user_id INT,
username VARCHAR(220),
email VARCHAR(150),
phone_number VARCHAR(20)
);
CREATE TABLE

test_db=# \dt
List of relations
 Schema | Name  | Type  |  Owner
--------+-------+-------+----------
 public | users | table | abhilash
(1 row)

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           |          |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |

Add a Primary key to users and check the user table.

test_db=# ALTER TABLE users ADD PRIMARY KEY (user_id);
ALTER TABLE

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           | not null |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (user_id)

# OR add primary key when creating the table:
CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150),
  phone_number VARCHAR(20)
);

You can a unique constraint and an index added when adding a primary key.

Why does adding a primary key also add an index?

A primary key must guarantee that each value is unique and fast to find.
Without an index, the database would have to scan the whole table every time you look up a primary key, which would be very slow.
So PostgreSQL automatically creates a unique index on the primary key to make lookups efficient and to enforce uniqueness at the database level.

👉 It needs the index for speed and to enforce the “no duplicates” rule of primary keys.

What is btree?

btree stands for Balanced Tree (specifically, a “B-tree” data structure).
It’s the default index type in PostgreSQL.
B-tree indexes organize the data in a tree structure, so that searches, inserts, updates, and deletes are all very efficient — about O(log n) time.
It’s great for looking up exact matches (like WHERE user_id = 123) or range queries (like WHERE user_id BETWEEN 100 AND 200).

👉 So when you see btree, it just means it’s using a very efficient tree structure for your primary key index.

Summary in one line:
Adding a primary key automatically adds a btree index to enforce uniqueness and make lookups super fast.

In MySQL (specifically InnoDB engine, which is default now):

Primary keys always create an index automatically.
The index is a clustered index — this is different from Postgres!
The index uses a B-tree structure too, just like Postgres.

👉 So yes, MySQL also adds an index and uses a B-tree under the hood for primary keys.

But here’s a big difference:

In InnoDB, the table data itself is stored inside the primary key’s B-tree.
- That’s called a clustered index.
- It means the physical storage of the table rows follows the order of the primary key.
In PostgreSQL, the index and the table are stored separately (non-clustered by default).

Example: If you have a table like this in MySQL:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150)
);

user_id will have a B-tree clustered index.
The rows themselves will be stored sorted by user_id.

Short version:

Database	Primary Key Behavior	B-tree?	Clustered?
PostgreSQL	Separate index created for PK	Yes	No (separate by default)
MySQL (InnoDB)	PK index + Table rows stored inside the PK’s B-tree	Yes	Yes (always clustered)

Why Indexing on Unique Columns (like `email`) Improves Lookup 🔍

Use Case

You frequently run queries like:

SELECT * FROM students WHERE email = 'john@example.com';

Without an index, this results in a full table scan — checking each row one-by-one.

With an index, the database can jump directly to the row using a sorted structure, significantly reducing lookup time — especially in large tables.

🌲 How SQL Stores Indexes Internally (PostgreSQL)

📚 PostgreSQL uses B-Tree indexes by default.

When you run:

CREATE UNIQUE INDEX idx_students_on_email ON students(email);

PostgreSQL creates a balanced B-tree like this:

          m@example.com
         /              \
  d@example.com     t@example.com
  /        \           /         \
...      ...        ...         ...

✅ Keys (email values) are sorted lexicographically.
✅ Each leaf node contains a pointer to the actual row in the students table (called a tuple pointer or TID).
✅ Lookup uses binary search, giving O(log n) performance.

⚙️ Unique Index = Even Faster

Because all email values are unique, the database:

Can stop searching immediately once a match is found.
Doesn’t need to scan multiple leaf entries (no duplicates).

🧠 Summary

Feature	Value
Index Type	B-tree (default in PostgreSQL)
Lookup Time	O(log n) vs O(n) without index
Optimized for	Equality search (`WHERE email = ...`), sorting, joins
Email is unique?	✅ Yes – index helps even more (no need to check multiple rows)
Table scan avoided?	✅ Yes – PostgreSQL jumps directly via B-tree lookup

What Exactly is a Clustered Index in MySQL (InnoDB)?

🔹 In MySQL InnoDB, the primary key IS the table.

🔹 A Clustered Index means:

The table’s data rows are physically organized in the order of the primary key.
No separate storage for the table – it’s merged into the primary key’s B-tree structure.

In simple words:
👉 “The table itself lives inside the primary key B-tree.”

That’s why:

Every secondary index must store the primary key value (not a row pointer).
InnoDB can only have one clustered index (because you can’t physically order a table in two different ways).

📈 Visual for MySQL Clustered Index

Suppose you have:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(255),
  email VARCHAR(255)
);

The storage looks like:

B-tree by user_id (Clustered)

user_id  | username | email
----------------------------
101      | Alice    | a@x.com
102      | Bob      | b@x.com
103      | Carol    | c@x.com

👉 Table rows stored directly inside the B-tree nodes by user_id!

🔵 PostgreSQL (Primary Key Index = Separate)

Imagine you have a users table:

users table (physical table):

row_id | user_id | username | email
-------------------------------------
  1    |   101   | Alice    | a@example.com
  2    |   102   | Bob      | b@example.com
  3    |   103   | Carol    | c@example.com

And the Primary Key Index looks like:

Primary Key B-Tree (separate structure):

user_id -> row pointer
 101    -> row_id 1
 102    -> row_id 2
 103    -> row_id 3

👉 When you query WHERE user_id = 102, PostgreSQL goes:

Find user_id 102 in the B-tree index,
Then jump to row_id 2 in the actual table.

🔸 Index and Table are separate.
🔸 Extra step: index lookup ➔ then fetch row.

🟠 MySQL InnoDB (Primary Key Index = Clustered)

Same users table, but stored like this:

Primary Key Clustered B-Tree (index + data together):

user_id | username | email
---------------------------------
  101   | Alice    | a@example.com
  102   | Bob      | b@example.com
  103   | Carol    | c@example.com

👉 When you query WHERE user_id = 102, MySQL:

Goes straight to user_id 102 in the B-tree,
Data is already there, no extra lookup.

🔸 Index and Table are merged.
🔸 One step: direct access!

📈 Quick Visual:

PostgreSQL
(Index)    ➔    (Table Row)
    |
    ➔ extra lookup needed

MySQL InnoDB
(Index + Row Together)
    |
    ➔ data found immediately

Summary:

PostgreSQL: primary key index is separate ➔ needs 2 steps (index ➔ table).
MySQL InnoDB: primary key index is clustered ➔ 1 step (index = table).

📚 How Secondary Indexes Work

Secondary Index = an index on a column that is not the primary key.

Example:

CREATE INDEX idx_username ON users(username);

Now you have an index on username.

🔵 PostgreSQL Secondary Index Behavior

Secondary indexes are separate structures from the table (just like the primary key index).
When you query by username, PostgreSQL:
1. Finds the matching row_id using the secondary B-tree index.
2. Then fetches the full row from the table by row_id.
This is called an Index Scan + Heap Fetch.

📜 Example:

Secondary Index (username -> row_id):

username -> row_id
------------------
Alice    -> 1
Bob      -> 2
Carol    -> 3

(users table is separate)

👉 Flexible, but needs 2 steps: index (row_id) ➔ table.

🟠 MySQL InnoDB Secondary Index Behavior

In InnoDB, secondary indexes don’t store row pointers.
Instead, they store the primary key value!

So:

Find the matching primary key using the secondary index.
Use the primary key to find the actual row inside the clustered primary key B-tree.

📜 Example:

Secondary Index (username -> user_id):

username -> user_id
--------------------
Alice    -> 101
Bob      -> 102
Carol    -> 103

(Then find user_id inside Clustered B-Tree)

✅ Needs 2 steps too: secondary index (primary key) ➔ clustered table.

📈 Quick Visual:

Feature	PostgreSQL	MySQL InnoDB
Secondary Index	username ➔ row pointer (row_id)	username ➔ primary key (user_id)
Fetch Full Row	Use row_id to get table row	Use primary key to find row in clustered index
Steps to Fetch	Index ➔ Table	Index ➔ Primary Key ➔ Table (clustered)

Action	PostgreSQL	MySQL InnoDB
Primary Key Lookup	Index ➔ Row (2 steps)	Clustered Index (1 step)
Secondary Index Lookup	Index (row_id) ➔ Row (2 steps)	Secondary Index (PK) ➔ Row (2 steps)
Storage Model	Separate index and table	Primary key and table merged (clustered)

🌐 Now, let’s do some Real SQL Query ⛁ Examples!

1. **Simple `SELECT * FROM users WHERE user_id = 102;`**

PostgreSQL:
Look into PK btree ➔ find row pointer ➔ fetch row separately.
MySQL InnoDB:
Directly find the row inside the PK B-tree (no extra lookup).

✅ MySQL is a little faster here because it needs only 1 step!

2. `SELECT username FROM users WHERE user_id = 102;` (Only 1 Column)

PostgreSQL:
Might do an Index Only Scan if all needed data is in the index (very fast).
MySQL:
Clustered index contains all columns already, no special optimization needed.

✅ Both can be very fast, but PostgreSQL shines if the index is “covering” (i.e., contains all needed columns). Because index table has less size than clustered index of mysql.

3. **`SELECT * FROM users WHERE username = 'Bob';` (Secondary Index Search)**

PostgreSQL:
Secondary index on username ➔ row pointer ➔ fetch table row.
MySQL:
Secondary index on username ➔ get primary key ➔ clustered index lookup ➔ fetch data.

✅ Both are 2 steps, but MySQL needs 2 different B-trees: secondary ➔ primary clustered.

Consider the below situation:

SELECT username FROM users WHERE user_id = 102;

user_id is the Primary Key.
You only want username, not full row.

Now:

🔵 PostgreSQL Behavior

👉 In PostgreSQL, by default:

It uses the primary key btree to find the row pointer.
Then fetches the full row from the table (heap fetch).

👉 But PostgreSQL has an optimization called Index-Only Scan.

If all requested columns are already present in the index,
And if the table visibility map says the row is still valid (no deleted/updated row needing visibility check),
Then Postgres does not fetch the heap.

👉 So in this case:

If the primary key index also stores username internally (or if an extra index is created covering username), Postgres can satisfy the query just from the index.

✅ Result: No table lookup needed ➔ Very fast (almost as fast as InnoDB clustered lookup).

📢 Postgres primary key indexes usually don’t store extra columns, unless you specifically create an index that includes them (INCLUDE (username) syntax in modern Postgres 11+).

🟠 MySQL InnoDB Behavior

In InnoDB:
Since the primary key B-tree already holds all columns (user_id, username, email),
It directly finds the row from the clustered index.
So when you query by PK, even if you only need one column, it has everything inside the same page/block.

✅ One fast lookup.

🔥 Why sometimes Postgres can still be faster?

If PostgreSQL uses Index-Only Scan, and the page is already cached, and no extra visibility check is needed,
Then Postgres may avoid touching the table at all and only scan the tiny index pages.
In this case, for very narrow queries (e.g., only 1 small field), Postgres can outperform even MySQL clustered fetch.

💡 Because fetching from a small index page (~8KB) is faster than reading bigger table pages.

🎯 Conclusion:

✅ MySQL clustered index is always fast for PK lookups.
✅ PostgreSQL can be even faster for small/narrow queries if Index-Only Scan is triggered.

👉 Quick Tip:

In PostgreSQL, you can force an index to include extra columns by using: CREATE INDEX idx_user_id_username ON users(user_id) INCLUDE (username); Then index-only scans become more common and predictable! 🚀

Isn’t PostgreSQL also doing 2 B-tree scans? One for secondary index and one for table (row_id)?

When you query with a secondary index, like:

SELECT * FROM users WHERE username = 'Bob';

In MySQL InnoDB, I said:
1. Find in secondary index (username ➔ user_id)
2. Then go to primary clustered index (user_id ➔ full row)

Let’s look at PostgreSQL first:

♦️ Step 1: Search Secondary Index B-tree on username.

It finds the matching TID (tuple ID) or row pointer.
- TID is a pair (block_number, row_offset).
- Not a B-tree! Just a physical pointer.

♦️ Step 2: Use the TID to directly jump into the heap (the table).

The heap (table) is not a B-tree — it’s just a collection of unordered pages (blocks of rows).
PostgreSQL goes directly to the block and offset — like jumping straight into a file.

🔔 Important:

Secondary index ➔ TID ➔ heap fetch.
No second B-tree traversal for the table!

🟠 Meanwhile in MySQL InnoDB:

♦️ Step 1: Search Secondary Index B-tree on username.

It finds the Primary Key value (user_id).

♦️ Step 2: Now, search the Primary Key Clustered B-tree to find the full row.

Need another B-tree traversal based on user_id.

🔔 Important:

Secondary index ➔ Primary Key B-tree ➔ data fetch.
Two full B-tree traversals!

Real-world Summary:

♦️ PostgreSQL

Secondary index gives a direct shortcut to the heap.
One B-tree scan (secondary) ➔ Direct heap fetch.

♦️ MySQL

Secondary index gives PK.
Then another B-tree scan (primary clustered) to find full row.

✅ PostgreSQL does not scan a second B-tree when fetching from the table — just a direct page lookup using TID.

✅ MySQL does scan a second B-tree (primary clustered index) when fetching full row after secondary lookup.

Is heap fetch a searching technique? Why is it faster than B-tree?

📚 Let’s start from the basics:

When PostgreSQL finds a match in a secondary index, what it gets is a TID.

♦️ A TID (Tuple ID) is a physical address made of:

Block Number (page number)
Offset Number (row slot inside the page)

Example:

TID = (block_number = 1583, offset = 7)

🔵 How PostgreSQL uses TID?

It directly calculates the location of the block (disk page) using block_number.
It reads that block (if not already in memory).
Inside that block, it finds the row at offset 7.

♦️ No search, no btree, no extra traversal — just:

Find the page (via simple number addressing)
Find the row slot

📈 Visual Example

Secondary index (username ➔ TID):

username	TID
Alice	(1583, 7)
Bob	(1592, 3)
Carol	(1601, 12)

♦️ When you search for “Bob”:

Find (1592, 3) from secondary index B-tree.
Jump directly to Block 1592, Offset 3.
Done ✅!

Answer:

Heap fetch is NOT a search.
It’s a direct address lookup (fixed number).
Heap = unordered collection of pages.
Pages = fixed-size blocks (usually 8 KB each).
TID gives an exact GPS location inside heap — no searching required.

That’s why heap fetch is faster than another B-tree search:

No binary search, no B-tree traversal needed.
Only a simple disk/memory read + row offset jump.

🌿 B-tree vs 📁 Heap Fetch

Action	B-tree	Heap Fetch
What it does	Binary search inside sorted tree nodes	Direct jump to block and slot
Steps needed	Traverse nodes (root ➔ internal ➔ leaf)	Directly read page and slot
Time complexity	O(log n)	O(1)
Speed	Slower (needs comparisons)	Very fast (direct)

🎯 Final and short answer:

♦️ In PostgreSQL, after finding the TID in the secondary index, the heap fetch is a direct, constant-time (O(1)) access — no B-tree needed!
♦️ This is faster than scanning another B-tree like in MySQL InnoDB.

🧩 Our exact question:

When we say:

Jump directly to Block 1592, Offset 3.

We are thinking:

There are thousands of blocks.
How can we directly jump to block 1592?
Shouldn’t that be O(n) (linear time)?
Shouldn’t there be some traversal?

🔵 Here’s the real truth:

No traversal needed.
No O(n) work.
Accessing Block 1592 is O(1) — constant time.

📚 Why?

Because of how files, pages, and memory work inside a database.

When PostgreSQL stores a table (the “heap”), it saves it in a file on disk.
The file is just a long array of fixed-size pages.

Each page = 8KB (default in Postgres).
Each block = 1 page = fixed 8KB chunk.
Block 0 is the first 8KB.
Block 1 is next 8KB.
Block 2 is next 8KB.
…
Block 1592 = (1592 × 8 KB) offset from the beginning.

✅ So block 1592 is simply located at 1592 × 8192 bytes offset from the start of the file.

✅ Operating systems (and PostgreSQL’s Buffer Manager) know exactly how to seek to that byte position without reading everything before it.

📈 Diagram (imagine the table file):

+-----------+-----------+-----------+-----------+-----------+------+
| Block 0   | Block 1   | Block 2   | Block 3   | Block 4   |  ... |
+-----------+-----------+-----------+-----------+-----------+------+
  (8KB)       (8KB)       (8KB)       (8KB)       (8KB)

Finding Block 1592 ➔
Seek directly to offset 1592 * 8192 bytes ➔
Read 8KB ➔
Find row at Offset 3 inside it.

🤔 What happens technically?

If in memory (shared buffers / page cache):

PostgreSQL checks its buffer pool (shared memory).
“Do I already have block 1592 cached?”
- ✅ Yes: immediately access memory address.
- ❌ No: Load block 1592 from disk into memory.

If from disk (rare if cached):

File systems (ext4, xfs, etc) know how to seek to a byte offset in a file without reading previous parts.
Seek to (block_number × 8192) bytes.
Read exactly 8KB into memory.
No need to scan the whole file linearly.

📊 Final Step: Inside the Block

Once the block is loaded:

The block internally is structured like an array of tuples.
Each tuple is placed into an offset slot.
Offset 3 ➔ third tuple inside the block.

♦️ Again, this is just array lookup — no traversal, no O(n).

⚡ So to summarize:

Question	Answer
How does PostgreSQL jump directly to block?	Using the block number × page size calculation (fixed offset math).
Is it O(n)?	❌ No, it’s O(1) constant time
Is there any traversal?	❌ No traversal. Just a seek + memory read.
How fast?	Extremely fast if cached, still fast if disk seeks.

🔥 Key concept:

PostgreSQL heap access is O(1) because the heap file is a flat sequence of fixed-size pages, and the TID gives exact coordinates.

🎯 Simple Real World Example:

Imagine you have a giant book (the table file).
Each page of the book is numbered (block number).

If someone says:

👉 “Go to page 1592.”

♦️ You don’t need to read pages 1 to 1591 first.
♦️ You just flip directly to page 1592.

📗 Same idea: no linear traversal, just positional lookup.

🧠 Deep thought:

Because blocks are fixed size and TID is known,
heap fetch is almost as fast as reading a small array.

(Actually faster than searching B-tree because B-tree needs multiple comparisons at each node.)

Enjoy SQL! 🚀

Basic Software installation| Moving micro-services into AWS EC2 instance – Part 1

As I mentioned in the previous post, I have decided to move away from micro-services. To achieve this, I am taking an AWS EC2 instance and configuring each micro-service on this instance. For this setup, I am using an Ubuntu 16.04 machine because my application setup is a bit old. However, if you have newer versions of Rails, Ruby, etc., you may want to choose Ubuntu 20.04.

Our setup includes Ruby on Rails (5.2.1) micro-services (5-10 in number), a NodeJS application, a Sinatra Application, and an Angular 9.1 Front-End Application.

To begin, go to the AWS EC2 home page and select an Ubuntu 16.04 machine with default configurations and SSH enabled.

https://ap-south-1.console.aws.amazon.com/ec2/v2/home

Now login to this new instance and install all the packages we needed for our setup.

Software Installation

Update the package list.

sudo apt-get update

Install Ruby dependencies.

sudo apt-get install ruby-dev
sudo apt-get install libxml2-dev
sudo apt-get install libxslt-dev
sudo apt-get install graphviz

Install NodeJS

curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
sudo apt-get install -y nodejs
node -v

Install yarn and other dependencies.

curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt-get update

sudo apt-get install git-core zlib1g-dev build-essential libssl-dev libreadline-dev libyaml-dev libsqlite3-dev sqlite3 libxml2-dev libxslt1-dev libcurl4-openssl-dev software-properties-common libffi-dev nodejs yarn

Install Mysql 5.7 (Remember this is for Ubuntu 16.04, 18.04 versions)

sudo apt-get install mysql-server-5.7 mysql-client-core-5.7 libmysqlclient-dev
sudo service mysql status # or
systemctl status mysql
username: <your-username>, password: <your-password>

You can also try
mysql_secure_installation, if you use other mysql version.

Note that if you are setting up Ubuntu 20.04, there is a significant change in MySQL, as the version of MySQL is now 8.0 instead of 5.7. If you have applications running in MySQL 5.7, it is recommended that you set up and use Ubuntu 16.04 or 18.04.

We will continue the installation process in our next post.

Liferay 7.3: Connect with MYSQL database

Go inside the downloaded liferay portal folder. And start the tomcat server

cd ~/liferay-ce-portal-tomcat-7.3.0-ga1-20200127150653953/liferay-ce-portal-7.3.0-ga1/tomcat-9.0.17/bin

sh startup.sh

Goto http://localhost:8080/ and start setting up liferay configurations as follows:

*Do the basic configuration and change database to mysql*

Mysql database is recommended fot staging and production environment.

*JDBC URL:*
*jdbc:mysql://localhost/lportal?characterEncoding=UTF-8&dontTrackOpenResources=true&holdResultsOpenOverStatementClose=true&serverTimezone*
*=GMT&useFastDateParsing=false&useUnicode=true*
*JDBC Driver Class Name: com.mysql.cj.jdbc.Driver*

If you get the error like this, some of your provided mysql configuration went wrong. Check your mysql username and password.

If the given mysql configuration is correct, then you have to check the Liferay DB exists in your Mysql database. Else create the DB `lportal` as follows:

Now click on the Configure button, you will be ready to go

Open ‘~/liferay-ce-portal-tomcat-7.3.0-ga1-20200127150653953/liferay-ce-portal-7.3.0-ga1/portal-setup-wizard.properties’ file and you can see the above configurations added by liferay.

You can see the above server log in Dev studio. Here mentioned about which property files loads during the start and picking MYSQL DB, creating tables for first time in `lportal` database, inserting data to the table

You can see the following tables populated in the database lportal

mysql> show tables;
+--------------------------------+
| Tables_in_lportal              |
+--------------------------------+
| AMImageEntry                   |
| AccountEntry                   |
| AccountEntryOrganizationRel    |
| AccountEntryUserRel            |
| AccountRole                    |
| Account_                       |
| Address                        |
| AnnouncementsDelivery          |
| AnnouncementsEntry             |
| AnnouncementsFlag              |
| AppBuilderApp                  |
| AppBuilderAppDeployment        |
| AssetAutoTaggerEntry           |
| AssetCategory                  |
| AssetCategoryProperty          |
| AssetDisplayPageEntry          |
| AssetEntries_AssetCategories   |
| AssetEntries_AssetTags         |
| AssetEntry                     |
| AssetEntryAssetCategoryRel     |
| AssetEntryUsage                |
| AssetLink                      |
| AssetListEntry                 |
| AssetListEntryAssetEntryRel    |
| AssetListEntrySegmentsEntryRel |
| AssetListEntryUsage            |
| AssetTag                       |
| AssetVocabulary                |
| Audit_AuditEvent               |
| BackgroundTask                 |
| BatchEngineExportTask          |
| BatchEngineImportTask          |
| BlogsEntry                     |
| BlogsStatsUser                 |
| BrowserTracker                 |
| CTCollection                   |
| CTEntry                        |
| CTMessage                      |
| CTPreferences                  |
| CTProcess                      |
| CTSContent                     |
| Calendar                       |
| CalendarBooking                |
| CalendarNotificationTemplate   |
| CalendarResource               |
| ChangesetCollection            |
| ChangesetEntry                 |
| ClassName_                     |
| Company                        |
| CompanyInfo                    |
| Configuration_                 |
| Contact_                       |
| Contacts_Entry                 |
| Counter                        |
| Country                        |
| DDLRecord                      |
| DDLRecordSet                   |
| DDLRecordSetVersion            |
| DDLRecordVersion               |
| DDMContent                     |
| DDMDataProviderInstance        |
| DDMDataProviderInstanceLink    |
| DDMFormInstance                |
| DDMFormInstanceRecord          |
| DDMFormInstanceRecordVersion   |
| DDMFormInstanceVersion         |
| DDMStorageLink                 |
| DDMStructure                   |
| DDMStructureLayout             |
| DDMStructureLink               |
| DDMStructureVersion            |
| DDMTemplate                    |
| DDMTemplateLink                |
| DDMTemplateVersion             |
| DEDataDefinitionFieldLink      |
| DEDataListView                 |
| DLContent                      |
| DLFileEntry                    |
| DLFileEntryMetadata            |
| DLFileEntryType                |
| DLFileEntryTypes_DLFolders     |
| DLFileRank                     |
| DLFileShortcut                 |
| DLFileVersion                  |
| DLFileVersionPreview           |
| DLFolder                       |
| DLOpenerFileEntryReference     |
| DLSyncEvent                    |
| EmailAddress                   |
| ExpandoColumn                  |
| ExpandoRow                     |
| ExpandoTable                   |
| ExpandoValue                   |
| ExportImportConfiguration      |
| FragmentCollection             |
| FragmentEntry                  |
| FragmentEntryLink              |
| FriendlyURLEntry               |
| FriendlyURLEntryLocalization   |
| FriendlyURLEntryMapping        |
| Group_                         |
| Groups_Orgs                    |
| Groups_Roles                   |
| Groups_UserGroups              |
| HtmlPreviewEntry               |
| IM_MemberRequest               |
| Image                          |
| JournalArticle                 |
| JournalArticleLocalization     |
| JournalArticleResource         |
| JournalContentSearch           |
| JournalFeed                    |
| JournalFolder                  |
| KBArticle                      |
| KBComment                      |
| KBFolder                       |
| KBTemplate                     |
| KaleoAction                    |
| KaleoCondition                 |
| KaleoDefinition                |
| KaleoDefinitionVersion         |
| KaleoInstance                  |
| KaleoInstanceToken             |
| KaleoLog                       |
| KaleoNode                      |
| KaleoNotification              |
| KaleoNotificationRecipient     |
| KaleoTask                      |
| KaleoTaskAssignment            |
| KaleoTaskAssignmentInstance    |
| KaleoTaskForm                  |
| KaleoTaskFormInstance          |
| KaleoTaskInstanceToken         |
| KaleoTimer                     |
| KaleoTimerInstanceToken        |
| KaleoTransition                |
| Layout                         |
| LayoutBranch                   |
| LayoutClassedModelUsage        |
| LayoutFriendlyURL              |
| LayoutPageTemplateCollection   |
| LayoutPageTemplateEntry        |
| LayoutPageTemplateStructure    |
| LayoutPageTemplateStructureRel |
| LayoutPrototype                |
| LayoutRevision                 |
| LayoutSEOEntry                 |
| LayoutSEOSite                  |
| LayoutSet                      |
| LayoutSetBranch                |
| LayoutSetPrototype             |
| ListType                       |
| Lock_                          |
| MBBan                          |
| MBCategory                     |
| MBDiscussion                   |
| MBMailingList                  |
| MBMessage                      |
| MBStatsUser                    |
| MBThread                       |
| MBThreadFlag                   |
| MDRAction                      |
| MDRRule                        |
| MDRRuleGroup                   |
| MDRRuleGroupInstance           |
| Marketplace_App                |
| Marketplace_Module             |
| MembershipRequest              |
| OA2Auths_OA2ScopeGrants        |
| OAuth2Application              |
| OAuth2ApplicationScopeAliases  |
| OAuth2Authorization            |
| OAuth2ScopeGrant               |
| OrgGroupRole                   |
| OrgLabor                       |
| Organization_                  |
| PasswordPolicy                 |
| PasswordPolicyRel              |
| PasswordTracker                |
| Phone                          |
| PluginSetting                  |
| PollsChoice                    |
| PollsQuestion                  |
| PollsVote                      |
| PortalPreferences              |
| Portlet                        |
| PortletItem                    |
| PortletPreferences             |
| QUARTZ_BLOB_TRIGGERS           |
| QUARTZ_CALENDARS               |
| QUARTZ_CRON_TRIGGERS           |
| QUARTZ_FIRED_TRIGGERS          |
| QUARTZ_JOB_DETAILS             |
| QUARTZ_LOCKS                   |
| QUARTZ_PAUSED_TRIGGER_GRPS     |
| QUARTZ_SCHEDULER_STATE         |
| QUARTZ_SIMPLE_TRIGGERS         |
| QUARTZ_SIMPROP_TRIGGERS        |
| QUARTZ_TRIGGERS                |
| RatingsEntry                   |
| RatingsStats                   |
| ReadingTimeEntry               |
| RecentLayoutBranch             |
| RecentLayoutRevision           |
| RecentLayoutSetBranch          |
| Region                         |
| Release_                       |
| Repository                     |
| RepositoryEntry                |
| ResourceAction                 |
| ResourcePermission             |
| Role_                          |
| SAPEntry                       |
| SegmentsEntry                  |
| SegmentsEntryRel               |
| SegmentsEntryRole              |
| SegmentsExperience             |
| SegmentsExperiment             |
| SegmentsExperimentRel          |
| ServiceComponent               |
| SharingEntry                   |
| SiteFriendlyURL                |
| SiteNavigationMenu             |
| SiteNavigationMenuItem         |
| SocialActivity                 |
| SocialActivityAchievement      |
| SocialActivityCounter          |
| SocialActivityLimit            |
| SocialActivitySet              |
| SocialActivitySetting          |
| SocialRelation                 |
| SocialRequest                  |
| Subscription                   |
| SystemEvent                    |
| Team                           |
| Ticket                         |
| TrashEntry                     |
| TrashVersion                   |
| UserGroup                      |
| UserGroupGroupRole             |
| UserGroupRole                  |
| UserGroups_Teams               |
| UserIdMapper                   |
| UserNotificationDelivery       |
| UserNotificationEvent          |
| UserTracker                    |
| UserTrackerPath                |
| User_                          |
| Users_Groups                   |
| Users_Orgs                     |
| Users_Roles                    |
| Users_Teams                    |
| Users_UserGroups               |
| ViewCountEntry                 |
| VirtualHost                    |
| WebDAVProps                    |
| Website                        |
| WikiNode                       |
| WikiPage                       |
| WikiPageResource               |
| WorkflowDefinitionLink         |
| WorkflowInstanceLink           |
+--------------------------------+
262 rows in set (0.00 sec)

mysql>

Here is the list of Users created. We are selected `Add Sample data` option at the time configuration. That is why we can see lot of users with my domain is listed

Email: provide the email given at the time of register

Password: test (default password for version 7.3)

If you are using Liferay 7.2 version, then this default password may not work. In this version liferay generates a random password, that you can check in the log file as below:

cd ~/Downloads/liferay-ce-portal-tomcat-7.2.1-ga2-20191111141448326/liferay-ce-portal-7.2.1-ga2/tomcat-9.0.17/logs/ && tail -f catalina.out

Search word password. You can get it.

After login you can see this terms of use page

*DONE! Congrats, you can see the admin home page with all controls*

changing to new db from old db

Add the new db name ‘liferay721’ to portal-setup-wizard.properties

.........
jdbc.default.url=jdbc:mysql://localhost/liferay721?characterEncoding=UTF-8&dontTrackOpenResources=true&holdResultsOpenOverStatementClose=true&serverTimezone=GMT&useFastDateParsing=false&useUnicode=true
.........

Restart the server. Server Log:

Starting Liferay Community Edition Portal 7.2.1 CE GA2 (Mueller / Build 7201 / November 12, 2019)

2020-05-08 14:45:53.921 INFO  [main][StartupHelper:75] There are no patches installed
2020-05-08 14:45:54.000 WARN  [main][ReleaseLocalServiceImpl:238] Table 'liferay721.Release_' doesn't exist
2020-05-08 14:45:54.003 INFO  [main][ReleaseLocalServiceImpl:129] Create tables and populate with default data
...................
...................

Issues you may face during the setup

Once I created a DB names liferay_7_3_1 and the setup wizard finishes successfully and message shown to restart the server for get into the portal.

When I restart it I received the following issue:

An unexpected system error occurred.

java.lang.NullPointerException

Database is not setup properly. Check the DB name. Use characters as DB name if possible and start the server. If proper DB setup is done, then you can see 250 – 280 tables in database

and the following tables were created in the database named liferay_7_3_1

Only 155 rows

+--------------------------------+
| Tables_in_liferay_7_3_1        |
+--------------------------------+
| AccountEntry                   |
| AccountEntryOrganizationRel    |
| AccountEntryUserRel            |
.........
|
| Website                        |
| WikiNode                       |
| WikiPage                       |
| WikiPageResource               |
| WorkflowDefinitionLink         |
| WorkflowInstanceLink           |
+--------------------------------+
155 rows in set (0.00 sec)

Then I renamed the DB to liferay731. Then it works! So beware of underscore characters in your DB name.

Another error is shown when the server is not able to connect to your database, or database is not created that is specified as in your portal.ext.properties or portal-setup-wizard.properties

💎 Part 1: What is EXPLAIN ANALYZE?

✏️ Example 1: Without Index

➕ Add Index:

🔥 Want even faster?

🔧 Step-by-step Fix:

✅ 1. Add Data for Bigger Table

✅ 2. Confirm Index Exists

✅ 3. Run ANALYZE (Update Stats)

✅ 4. Vacuum for Index-Only Scan

✅ 5. Force PostgreSQL to Consider Index

💡 Extra Tip (optional): Use EXPLAIN (ANALYZE, BUFFERS)

📋 Summary

🚨 How to catch bad index usage?

Common Pattern Optimizations

⚡ Tip: Use pg_stat_statements to Find Slow Queries

🧪 Try It Yourself

🎯 Extra Pro Tools for Query Performance

💥 Now We Know:

Why Indexing on Unique Columns (like email) Improves Lookup 🔍

Use Case

🌲 How SQL Stores Indexes Internally (PostgreSQL)

📚 PostgreSQL uses B-Tree indexes by default.

⚙️ Unique Index = Even Faster

🧠 Summary

What Exactly is a Clustered Index in MySQL (InnoDB)?

📈 Visual for MySQL Clustered Index

📈 Quick Visual:

📚 How Secondary Indexes Work

🔵 PostgreSQL Secondary Index Behavior

🟠 MySQL InnoDB Secondary Index Behavior

📈 Quick Visual:

🌐 Now, let’s do some Real SQL Query ⛁ Examples!

1. Simple SELECT * FROM users WHERE user_id = 102;

2. SELECT username FROM users WHERE user_id = 102; (Only 1 Column)

3. SELECT * FROM users WHERE username = 'Bob'; (Secondary Index Search)

Consider the below situation:

🔵 PostgreSQL Behavior

🟠 MySQL InnoDB Behavior

🔥 Why sometimes Postgres can still be faster?

🎯 Conclusion:

Isn’t PostgreSQL also doing 2 B-tree scans? One for secondary index and one for table (row_id)?

Let’s look at PostgreSQL first:

🟠 Meanwhile in MySQL InnoDB:

Real-world Summary:

Is heap fetch a searching technique? Why is it faster than B-tree?

📚 Let’s start from the basics:

🔵 How PostgreSQL uses TID?

📈 Visual Example

Answer:

🌿 B-tree vs 📁 Heap Fetch

🎯 Final and short answer:

🧩 Our exact question:

🔵 Here’s the real truth:

📚 Why?

📈 Diagram (imagine the table file):

🤔 What happens technically?

If in memory (shared buffers / page cache):

If from disk (rare if cached):

📊 Final Step: Inside the Block

⚡ So to summarize:

🔥 Key concept:

🎯 Simple Real World Example:

🧠 Deep thought:

changing to new db from old db

Issues you may face during the setup

💎 Part 1: What is `EXPLAIN ANALYZE`?

✅ 3. Run `ANALYZE` (Update Stats)

💡 Extra Tip (optional): Use `EXPLAIN (ANALYZE, BUFFERS)`

⚡ Tip: Use `pg_stat_statements` to Find Slow Queries

Why Indexing on Unique Columns (like `email`) Improves Lookup 🔍

1. **Simple `SELECT * FROM users WHERE user_id = 102;`**

2. `SELECT username FROM users WHERE user_id = 102;` (Only 1 Column)

3. **`SELECT * FROM users WHERE username = 'Bob';` (Secondary Index Search)**