Rails 8 App: Setup Test DB in PostgreSQL | Query Performance Using EXPLAIN ANALYZE

Now we’ll go full-on query performance pro mode using EXPLAIN ANALYZE and real plans. We’ll learn how PostgreSQL makes decisions, how to catch slow queries, and how your indexes make them 10x faster.

๐Ÿ’Ž Part 1: What is EXPLAIN ANALYZE?

EXPLAIN shows how PostgreSQL plans to execute your query.

ANALYZE runs the query and adds actual time, rows, loops, etc.

Syntax:

EXPLAIN ANALYZE
SELECT * FROM users WHERE username = 'bob';

โœ๏ธ Example 1: Without Index

SELECT * FROM users WHERE username = 'bob';

If username has no index, plan shows:

Seq Scan on users
  Filter: (username = 'bob')
  Rows Removed by Filter: 9999

โŒ PostgreSQL scans all rows = Sequential Scan = slow!

โž• Add Index:

CREATE INDEX idx_users_username ON users (username);

Now rerun:

EXPLAIN ANALYZE SELECT * FROM users WHERE username = 'bob';

You’ll see:

Index Scan using idx_users_username on users
  Index Cond: (username = 'bob')

โœ… PostgreSQL uses B-tree index
๐Ÿš€ Massive speed-up!

๐Ÿ”ฅ Want even faster?

SELECT username FROM users WHERE username = 'bob';

If PostgreSQL shows:

Index Only Scan using idx_users_username on users
  Index Cond: (username = 'bob')

๐ŸŽ‰ Index Only Scan! = covering index success!
No heap fetch = lightning-fast.

โš ๏ธ Note: Index-only scan only works if:

  • Index covers all selected columns
  • Table is vacuumed (PostgreSQL uses visibility map)

If you still get Seq scan output like:

test_db=# EXPLAIN ANALYSE SELECT * FROM users where username = 'aman_chetri';
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..1.11 rows=1 width=838) (actual time=0.031..0.034 rows=1 loops=1)
   Filter: ((username)::text = 'aman_chetri'::text)
   Rows Removed by Filter: 2
 Planning Time: 0.242 ms
 Execution Time: 0.077 ms
(5 rows)

even after adding an index, because PostgreSQL is saying:

  • ๐Ÿค” “The table is so small (cost = 1.11), scanning the whole thing is cheaper than using the index.”
  • Also: Your query uses only SELECT username, which could be eligible for Index Only Scan, but heap fetch might still be needed due to visibility map.

๐Ÿ”ง Step-by-step Fix:

โœ… 1. Add Data for Bigger Table

If the table is small (few rows), PostgreSQL will prefer Seq Scan no matter what.

Try adding ~10,000 rows:

INSERT INTO users (username, email, phone_number)
SELECT 'user_' || i, 'user_' || i || '@mail.com', '1234567890'
FROM generate_series(1, 10000) i;

Then VACUUM ANALYZE users; again and retry EXPLAIN.

โœ… 2. Confirm Index Exists

First, check your index exists and is recognized:

\d users

You should see something like:

Indexes:
    "idx_users_username" btree (username)

If not, add:

CREATE INDEX idx_users_username ON users(username);

โœ… 3. Run ANALYZE (Update Stats)
ANALYZE users;

This updates statistics โ€” PostgreSQL might not be using the index if it thinks only one row matches or the table is tiny.

โœ… 4. Vacuum for Index-Only Scan

Index-only scans require the visibility map to be set.

Run:

VACUUM ANALYZE users;

This marks pages in the table as “all-visible,” enabling PostgreSQL to avoid reading the heap.

โœ… 5. Force PostgreSQL to Consider Index

You can turn off sequential scan temporarily (for testing):

SET enable_seqscan = OFF;

EXPLAIN SELECT username FROM users WHERE username = 'bob';

You should now see:

Index Scan using idx_users_username on users ...

โš ๏ธ Use this only for testing/debugging โ€” not in production.

๐Ÿ’ก Extra Tip (optional): Use EXPLAIN (ANALYZE, BUFFERS)
EXPLAIN (ANALYZE, BUFFERS)
SELECT username FROM users WHERE username = 'bob';

This will show:

  • Whether heap was accessed
  • Buffer hits
  • Actual rows
๐Ÿ“‹ Summary
StepCommand
Check Index\d users
Analyze tableANALYZE users;
Vacuum for visibilityVACUUM ANALYZE users;
Disable seq scan for testSET enable_seqscan = OFF;
Add more rows (optional)INSERT INTO ...

๐Ÿšจ How to catch bad index usage?

Always look for:

  • “Seq Scan” instead of “Index Scan” โž” missing index
  • “Heap Fetch” โž” not a covering index
  • “Rows Removed by Filter” โž” inefficient filtering
  • “Loops: 1000+” โž” possible N+1 issue

Common Pattern Optimizations

PatternFix
WHERE column = ?B-tree index on column
WHERE column LIKE 'prefix%'B-tree works (with text_ops)
SELECT col1 WHERE col2 = ?Covering index: (col2, col1) or (col2) INCLUDE (col1)
WHERE col BETWEEN ?Composite index with range second: (status, created_at)
WHERE col IN (?, ?, ?)Index still helps
ORDER BY col LIMIT 10Index on col helps sort fast

โšก Tip: Use pg_stat_statements to Find Slow Queries

Enable it in postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'

Then run:

SELECT query, total_exec_time, calls
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;

๐ŸŽฏ Find your worst queries & optimize them with new indexes!

๐Ÿงช Try It Yourself

Want a little lab setup to practice?

CREATE TABLE users (
  user_id serial PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150),
  phone_number VARCHAR(20)
);

-- Insert 100K fake rows
INSERT INTO users (username, email, phone_number)
SELECT
  'user_' || i,
  'user_' || i || '@example.com',
  '999-000-' || i
FROM generate_series(1, 100000) i;

Then test:

  1. EXPLAIN ANALYZE SELECT * FROM users WHERE username = 'user_5000';
  2. Add INDEX ON username
  3. Re-run, compare speed!

๐ŸŽฏ Extra Pro Tools for Query Performance

  • EXPLAIN ANALYZE โ†’ Always first tool
  • pg_stat_statements โ†’ Find slow queries in real apps
  • auto_explain โ†’ Log slow plans automatically
  • pgBadger or pgHero โ†’ Visual query monitoring
๐Ÿ’ฅ Now We Know:

โœ… How to read query plans
โœ… When you’re doing full scans vs index scans
โœ… How to achieve index-only scans
โœ… How to catch bad performance early
โœ… How to test and fix in real world

Happy Performance Fixing..ย ๐Ÿš€

Rails 8 App: Setup Test DB | Comprehensive Guide ๐Ÿ“– for PostgreSQL , Mysql Indexing – PostgreSQL Heap โ›ฐ vs Mysql InnoDB B-Tree ๐ŸŒฟ

Enter into psql terminal:

โœ— psql postgres
psql (14.17 (Homebrew))
Type "help" for help.

postgres=# \l
                                     List of databases
           Name            |  Owner   | Encoding | Collate | Ctype |   Access privileges
---------------------------+----------+----------+---------+-------+-----------------------
 studio_development | postgres | UTF8     | C       | C     |
  • Create a new test database
  • Create a users Table
  • Check the db and table details
postgres=# create database test_db;
CREATE DATABASE

test_db=# CREATE TABLE users (
user_id INT,
username VARCHAR(220),
email VARCHAR(150),
phone_number VARCHAR(20)
);
CREATE TABLE

test_db=# \dt
List of relations
 Schema | Name  | Type  |  Owner
--------+-------+-------+----------
 public | users | table | abhilash
(1 row)

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           |          |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |

Add a Primary key to users and check the user table.

test_db=# ALTER TABLE users ADD PRIMARY KEY (user_id);
ALTER TABLE

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           | not null |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (user_id)

# OR add primary key when creating the table:
CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150),
  phone_number VARCHAR(20)
);

You can a unique constraint and an index added when adding a primary key.

Why does adding a primary key also add an index?

  • A primary key must guarantee that each value is unique and fast to find.
  • Without an index, the database would have to scan the whole table every time you look up a primary key, which would be very slow.
  • So PostgreSQL automatically creates a unique index on the primary key to make lookups efficient and to enforce uniqueness at the database level.

๐Ÿ‘‰ It needs the index for speed and to enforce the “no duplicates” rule of primary keys.

What is btree?

  • btree stands for Balanced Tree (specifically, a “B-tree” data structure).
  • It’s the default index type in PostgreSQL.
  • B-tree indexes organize the data in a tree structure, so that searches, inserts, updates, and deletes are all very efficient โ€” about O(log n) time.
  • It’s great for looking up exact matches (like WHERE user_id = 123) or range queries (like WHERE user_id BETWEEN 100 AND 200).

๐Ÿ‘‰ So when you see btree, it just means it’s using a very efficient tree structure for your primary key index.

Summary in one line:
Adding a primary key automatically adds a btree index to enforce uniqueness and make lookups super fast.


In MySQL (specifically InnoDB engine, which is default now):

  • Primary keys always create an index automatically.
  • The index is a clustered index โ€” this is different from Postgres!
  • The index uses a B-tree structure too, just like Postgres.

๐Ÿ‘‰ So yes, MySQL also adds an index and uses a B-tree under the hood for primary keys.

But here’s a big difference:

  • In InnoDB, the table data itself is stored inside the primary key’s B-tree.
    • Thatโ€™s called a clustered index.
    • It means the physical storage of the table rows follows the order of the primary key.
  • In PostgreSQL, the index and the table are stored separately (non-clustered by default).

Example: If you have a table like this in MySQL:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150)
);
  • user_id will have a B-tree clustered index.
  • The rows themselves will be stored sorted by user_id.

Short version:

DatabasePrimary Key BehaviorB-tree?Clustered?
PostgreSQLSeparate index created for PKYesNo (separate by default)
MySQL (InnoDB)PK index + Table rows stored inside the PK’s B-treeYesYes (always clustered)

Why Indexing on Unique Columns (like email) Improves Lookup ๐Ÿ”

Use Case

You frequently run queries like:

SELECT * FROM students WHERE email = 'john@example.com';

Without an index, this results in a full table scan โ€” checking each row one-by-one.

With an index, the database can jump directly to the row using a sorted structure, significantly reducing lookup time โ€” especially in large tables.


๐ŸŒฒ How SQL Stores Indexes Internally (PostgreSQL)

๐Ÿ“š PostgreSQL uses B-Tree indexes by default.

When you run:

CREATE UNIQUE INDEX idx_students_on_email ON students(email);

PostgreSQL creates a balanced B-tree like this:

          m@example.com
         /              \
  d@example.com     t@example.com
  /        \           /         \
...      ...        ...         ...

  • โœ… Keys (email values) are sorted lexicographically.
  • โœ… Each leaf node contains a pointer to the actual row in the students table (called a tuple pointer or TID).
  • โœ… Lookup uses binary search, giving O(log n) performance.

โš™๏ธ Unique Index = Even Faster

Because all email values are unique, the database:

  • Can stop searching immediately once a match is found.
  • Doesnโ€™t need to scan multiple leaf entries (no duplicates).

๐Ÿง  Summary

FeatureValue
Index TypeB-tree (default in PostgreSQL)
Lookup TimeO(log n) vs O(n) without index
Optimized forEquality search (WHERE email = ...), sorting, joins
Email is unique?โœ… Yes โ€“ index helps even more (no need to check multiple rows)
Table scan avoided?โœ… Yes โ€“ PostgreSQL jumps directly via B-tree lookup

What Exactly is a Clustered Index in MySQL (InnoDB)?

๐Ÿ”น In MySQL InnoDB, the primary key IS the table.

๐Ÿ”น A Clustered Index means:

  • The table’s data rows are physically organized in the order of the primary key.
  • No separate storage for the table – it’s merged into the primary keyโ€™s B-tree structure.

In simple words:
๐Ÿ‘‰ “The table itself lives inside the primary key B-tree.”

Thatโ€™s why:

  • Every secondary index must store the primary key value (not a row pointer).
  • InnoDB can only have one clustered index (because you can’t physically order a table in two different ways).
๐Ÿ“ˆ Visual for MySQL Clustered Index

Suppose you have:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(255),
  email VARCHAR(255)
);

The storage looks like:

B-tree by user_id (Clustered)

user_id  | username | email
----------------------------
101      | Alice    | a@x.com
102      | Bob      | b@x.com
103      | Carol    | c@x.com

๐Ÿ‘‰ Table rows stored directly inside the B-tree nodes by user_id!


๐Ÿ”ต PostgreSQL (Primary Key Index = Separate)

Imagine you have a users table:

users table (physical table):

row_id | user_id | username | email
-------------------------------------
  1    |   101   | Alice    | a@example.com
  2    |   102   | Bob      | b@example.com
  3    |   103   | Carol    | c@example.com

And the Primary Key Index looks like:

Primary Key B-Tree (separate structure):

user_id -> row pointer
 101    -> row_id 1
 102    -> row_id 2
 103    -> row_id 3

๐Ÿ‘‰ When you query WHERE user_id = 102, PostgreSQL goes:

  • Find user_id 102 in the B-tree index,
  • Then jump to row_id 2 in the actual table.

๐Ÿ”ธ Index and Table are separate.
๐Ÿ”ธ Extra step: index lookup โž” then fetch row.

๐ŸŸ  MySQL InnoDB (Primary Key Index = Clustered)

Same users table, but stored like this:

Primary Key Clustered B-Tree (index + data together):

user_id | username | email
---------------------------------
  101   | Alice    | a@example.com
  102   | Bob      | b@example.com
  103   | Carol    | c@example.com

๐Ÿ‘‰ When you query WHERE user_id = 102, MySQL:

  • Goes straight to user_id 102 in the B-tree,
  • Data is already there, no extra lookup.

๐Ÿ”ธ Index and Table are merged.
๐Ÿ”ธ One step: direct access!

๐Ÿ“ˆ Quick Visual:

PostgreSQL
(Index)    โž”    (Table Row)
    |
    โž” extra lookup needed

MySQL InnoDB
(Index + Row Together)
    |
    โž” data found immediately

Summary:

  • PostgreSQL: primary key index is separate โž” needs 2 steps (index โž” table).
  • MySQL InnoDB: primary key index is clustered โž” 1 step (index = table).

๐Ÿ“š How Secondary Indexes Work

Secondary Index = an index on a column that is not the primary key.

Example:

CREATE INDEX idx_username ON users(username);

Now you have an index on username.

๐Ÿ”ต PostgreSQL Secondary Index Behavior

  • Secondary indexes are separate structures from the table (just like the primary key index).
  • When you query by username, PostgreSQL:
    1. Finds the matching row_id using the secondary B-tree index.
    2. Then fetches the full row from the table by row_id.
  • This is called an Index Scan + Heap Fetch.

๐Ÿ“œ Example:

Secondary Index (username -> row_id):

username -> row_id
------------------
Alice    -> 1
Bob      -> 2
Carol    -> 3

(users table is separate)

๐Ÿ‘‰ Flexible, but needs 2 steps: index (row_id) โž” table.

๐ŸŸ  MySQL InnoDB Secondary Index Behavior

  • In InnoDB, secondary indexes don’t store row pointers.
  • Instead, they store the primary key value!

So:

  1. Find the matching primary key using the secondary index.
  2. Use the primary key to find the actual row inside the clustered primary key B-tree.

๐Ÿ“œ Example:

Secondary Index (username -> user_id):

username -> user_id
--------------------
Alice    -> 101
Bob      -> 102
Carol    -> 103

(Then find user_id inside Clustered B-Tree)

โœ… Needs 2 steps too: secondary index (primary key) โž” clustered table.

๐Ÿ“ˆ Quick Visual:

FeaturePostgreSQLMySQL InnoDB
Secondary Indexusername โž” row pointer (row_id)username โž” primary key (user_id)
Fetch Full RowUse row_id to get table rowUse primary key to find row in clustered index
Steps to FetchIndex โž” TableIndex โž” Primary Key โž” Table (clustered)
ActionPostgreSQLMySQL InnoDB
Primary Key LookupIndex โž” Row (2 steps)Clustered Index (1 step)
Secondary Index LookupIndex (row_id) โž” Row (2 steps)Secondary Index (PK) โž” Row (2 steps)
Storage ModelSeparate index and tablePrimary key and table merged (clustered)

๐ŸŒ Now, let’s do some Real SQL Query โ› Examples!

1. Simple SELECT * FROM users WHERE user_id = 102;
  • PostgreSQL:
    Look into PK btree โž” find row pointer โž” fetch row separately.
  • MySQL InnoDB:
    Directly find the row inside the PK B-tree (no extra lookup).

โœ… MySQL is a little faster here because it needs only 1 step!

2. SELECT username FROM users WHERE user_id = 102; (Only 1 Column)
  • PostgreSQL:
    Might do an Index Only Scan if all needed data is in the index (very fast).
  • MySQL:
    Clustered index contains all columns already, no special optimization needed.

โœ… Both can be very fast, but PostgreSQL shines if the index is “covering” (i.e., contains all needed columns). Because index table has less size than clustered index of mysql.

3. SELECT * FROM users WHERE username = 'Bob'; (Secondary Index Search)
  • PostgreSQL:
    Secondary index on username โž” row pointer โž” fetch table row.
  • MySQL:
    Secondary index on username โž” get primary key โž” clustered index lookup โž” fetch data.

โœ… Both are 2 steps, but MySQL needs 2 different B-trees: secondary โž” primary clustered.

Consider the below situation:

SELECT username FROM users WHERE user_id = 102;
  • user_id is the Primary Key.
  • You only want username, not full row.

Now:

๐Ÿ”ต PostgreSQL Behavior

๐Ÿ‘‰ In PostgreSQL, by default:

  • It uses the primary key btree to find the row pointer.
  • Then fetches the full row from the table (heap fetch).

๐Ÿ‘‰ But PostgreSQL has an optimization called Index-Only Scan.

  • If all requested columns are already present in the index,
  • And if the table visibility map says the row is still valid (no deleted/updated row needing visibility check),
  • Then Postgres does not fetch the heap.

๐Ÿ‘‰ So in this case:

  • If the primary key index also stores username internally (or if an extra index is created covering username), Postgres can satisfy the query just from the index.

โœ… Result: No table lookup needed โž” Very fast (almost as fast as InnoDB clustered lookup).

๐Ÿ“ข Postgres primary key indexes usually don’t store extra columns, unless you specifically create an index that includes them (INCLUDE (username) syntax in modern Postgres 11+).

๐ŸŸ  MySQL InnoDB Behavior
  • In InnoDB:
    Since the primary key B-tree already holds all columns (user_id, username, email),
    It directly finds the row from the clustered index.
  • So when you query by PK, even if you only need one column, it has everything inside the same page/block.

โœ… One fast lookup.

๐Ÿ”ฅ Why sometimes Postgres can still be faster?
  • If PostgreSQL uses Index-Only Scan, and the page is already cached, and no extra visibility check is needed,
    Then Postgres may avoid touching the table at all and only scan the tiny index pages.
  • In this case, for very narrow queries (e.g., only 1 small field), Postgres can outperform even MySQL clustered fetch.

๐Ÿ’ก Because fetching from a small index page (~8KB) is faster than reading bigger table pages.

๐ŸŽฏ Conclusion:

โœ… MySQL clustered index is always fast for PK lookups.
โœ… PostgreSQL can be even faster for small/narrow queries if Index-Only Scan is triggered.

๐Ÿ‘‰ Quick Tip:

  • In PostgreSQL, you can force an index to include extra columns by using: CREATE INDEX idx_user_id_username ON users(user_id) INCLUDE (username); Then index-only scans become more common and predictable! ๐Ÿš€

Isn’t PostgreSQL also doing 2 B-tree scans? One for secondary index and one for table (row_id)?

When you query with a secondary index, like:

SELECT * FROM users WHERE username = 'Bob';
  • In MySQL InnoDB, I said:
    1. Find in secondary index (username โž” user_id)
    2. Then go to primary clustered index (user_id โž” full row)
Let’s look at PostgreSQL first:

โ™ฆ๏ธ Step 1: Search Secondary Index B-tree on username.

  • It finds the matching TID (tuple ID) or row pointer.
    • TID is a pair (block_number, row_offset).
    • Not a B-tree! Just a physical pointer.

โ™ฆ๏ธ Step 2: Use the TID to directly jump into the heap (the table).

  • The heap (table) is not a B-tree โ€” itโ€™s just a collection of unordered pages (blocks of rows).
  • PostgreSQL goes directly to the block and offset โ€” like jumping straight into a file.

๐Ÿ”” Important:

  • Secondary index โž” TID โž” heap fetch.
  • No second B-tree traversal for the table!
๐ŸŸ  Meanwhile in MySQL InnoDB:

โ™ฆ๏ธ Step 1: Search Secondary Index B-tree on username.

  • It finds the Primary Key value (user_id).

โ™ฆ๏ธ Step 2: Now, search the Primary Key Clustered B-tree to find the full row.

  • Need another B-tree traversal based on user_id.

๐Ÿ”” Important:

  • Secondary index โž” Primary Key B-tree โž” data fetch.
  • Two full B-tree traversals!
Real-world Summary:

โ™ฆ๏ธ PostgreSQL

  • Secondary index gives a direct shortcut to the heap.
  • One B-tree scan (secondary) โž” Direct heap fetch.

โ™ฆ๏ธ MySQL

  • Secondary index gives PK.
  • Then another B-tree scan (primary clustered) to find full row.

โœ… PostgreSQL does not scan a second B-tree when fetching from the table โ€” just a direct page lookup using TID.

โœ… MySQL does scan a second B-tree (primary clustered index) when fetching full row after secondary lookup.

Is heap fetch a searching technique? Why is it faster than B-tree?

๐Ÿ“š Let’s start from the basics:

When PostgreSQL finds a match in a secondary index, what it gets is a TID.

โ™ฆ๏ธ A TID (Tuple ID) is a physical address made of:

  • Block Number (page number)
  • Offset Number (row slot inside the page)

Example:

TID = (block_number = 1583, offset = 7)

๐Ÿ”ต How PostgreSQL uses TID?

  1. It directly calculates the location of the block (disk page) using block_number.
  2. It reads that block (if not already in memory).
  3. Inside that block, it finds the row at offset 7.

โ™ฆ๏ธ No search, no btree, no extra traversal โ€” just:

  • Find the page (via simple number addressing)
  • Find the row slot

๐Ÿ“ˆ Visual Example

Secondary index (username โž” TID):

usernameTID
Alice(1583, 7)
Bob(1592, 3)
Carol(1601, 12)

โ™ฆ๏ธ When you search for “Bob”:

  • Find (1592, 3) from secondary index B-tree.
  • Jump directly to Block 1592, Offset 3.
  • Done โœ…!

Answer:

  • Heap fetch is NOT a search.
  • It’s a direct address lookup (fixed number).
  • Heap = unordered collection of pages.
  • Pages = fixed-size blocks (usually 8 KB each).
  • TID gives an exact GPS location inside heap โ€” no searching required.

That’s why heap fetch is faster than another B-tree search:

  • No binary search, no B-tree traversal needed.
  • Only a simple disk/memory read + row offset jump.

๐ŸŒฟ B-tree vs ๐Ÿ“ Heap Fetch

ActionB-treeHeap Fetch
What it doesBinary search inside sorted tree nodesDirect jump to block and slot
Steps neededTraverse nodes (root โž” internal โž” leaf)Directly read page and slot
Time complexityO(log n)O(1)
SpeedSlower (needs comparisons)Very fast (direct)

๐ŸŽฏ Final and short answer:

โ™ฆ๏ธ In PostgreSQL, after finding the TID in the secondary index, the heap fetch is a direct, constant-time (O(1)) access โ€” no B-tree needed!
โ™ฆ๏ธ This is faster than scanning another B-tree like in MySQL InnoDB.


๐Ÿงฉ Our exact question:

When we say:

Jump directly to Block 1592, Offset 3.

We are thinking:

  • There are thousands of blocks.
  • How can we directly jump to block 1592?
  • Shouldn’t that be O(n) (linear time)?
  • Shouldn’t there be some traversal?

๐Ÿ”ต Here’s the real truth:

  • No traversal needed.
  • No O(n) work.
  • Accessing Block 1592 is O(1) โ€” constant time.

๐Ÿ“š Why?

Because of how files, pages, and memory work inside a database.

When PostgreSQL stores a table (the “heap”), it saves it in a file on disk.
The file is just a long array of fixed-size pages.

  • Each page = 8KB (default in Postgres).
  • Each block = 1 page = fixed 8KB chunk.
  • Block 0 is the first 8KB.
  • Block 1 is next 8KB.
  • Block 2 is next 8KB.
  • Block 1592 = (1592 ร— 8 KB) offset from the beginning.

โœ… So block 1592 is simply located at 1592 ร— 8192 bytes offset from the start of the file.

โœ… Operating systems (and PostgreSQL’s Buffer Manager) know exactly how to seek to that byte position without reading everything before it.

๐Ÿ“ˆ Diagram (imagine the table file):
+-----------+-----------+-----------+-----------+-----------+------+
| Block 0   | Block 1   | Block 2   | Block 3   | Block 4   |  ... |
+-----------+-----------+-----------+-----------+-----------+------+
  (8KB)       (8KB)       (8KB)       (8KB)       (8KB)

Finding Block 1592 โž”
Seek directly to offset 1592 * 8192 bytes โž”
Read 8KB โž”
Find row at Offset 3 inside it.

๐Ÿค” What happens technically?

If in memory (shared buffers / page cache):
  • PostgreSQL checks its buffer pool (shared memory).
  • “Do I already have block 1592 cached?”
    • โœ… Yes: immediately access memory address.
    • โŒ No: Load block 1592 from disk into memory.
If from disk (rare if cached):
  • File systems (ext4, xfs, etc) know how to seek to a byte offset in a file without reading previous parts.
  • Seek to (block_number ร— 8192) bytes.
  • Read exactly 8KB into memory.
  • No need to scan the whole file linearly.

๐Ÿ“Š Final Step: Inside the Block

Once the block is loaded:

  • The block internally is structured like an array of tuples.
  • Each tuple is placed into an offset slot.
  • Offset 3 โž” third tuple inside the block.

โ™ฆ๏ธ Again, this is just array lookup โ€” no traversal, no O(n).

โšก So to summarize:
QuestionAnswer
How does PostgreSQL jump directly to block?Using the block number ร— page size calculation (fixed offset math).
Is it O(n)?โŒ No, it’s O(1) constant time
Is there any traversal?โŒ No traversal. Just a seek + memory read.
How fast?Extremely fast if cached, still fast if disk seeks.
๐Ÿ”ฅ Key concept:

PostgreSQL heap access is O(1) because the heap file is a flat sequence of fixed-size pages, and the TID gives exact coordinates.

๐ŸŽฏ Simple Real World Example:

Imagine you have a giant book (the table file).
Each page of the book is numbered (block number).

If someone says:

๐Ÿ‘‰ “Go to page 1592.”

โ™ฆ๏ธ You don’t need to read pages 1 to 1591 first.
โ™ฆ๏ธ You just flip directly to page 1592.

๐Ÿ“— Same idea: no linear traversal, just positional lookup.

๐Ÿง  Deep thought:

Because blocks are fixed size and TID is known,
heap fetch is almost as fast as reading a small array.

(Actually faster than searching B-tree because B-tree needs multiple comparisons at each node.)

Enjoy SQL! ๐Ÿš€

Basic Software installation| Moving micro-services into AWS EC2 instance – Part 1

As I mentioned in the previous post, I have decided to move away from micro-services. To achieve this, I am taking an AWS EC2 instance and configuring each micro-service on this instance. For this setup, I am using an Ubuntu 16.04 machine because my application setup is a bit old. However, if you have newer versions of Rails, Ruby, etc., you may want to choose Ubuntu 20.04.

Our setup includes Ruby on Rails (5.2.1) micro-services (5-10 in number), a NodeJS application, a Sinatra Application, and an Angular 9.1 Front-End Application.

To begin, go to the AWS EC2 home page and select an Ubuntu 16.04 machine with default configurations and SSH enabled.

https://ap-south-1.console.aws.amazon.com/ec2/v2/home

Now login to this new instance and install all the packages we needed for our setup.

Software Installation

Update the package list.

sudo apt-get update

Install Ruby dependencies.

sudo apt-get install ruby-dev
sudo apt-get install libxml2-dev
sudo apt-get install libxslt-dev
sudo apt-get install graphviz

Install NodeJS

curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
sudo apt-get install -y nodejs
node -v

Install yarn and other dependencies.

curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt-get update
sudo apt-get install git-core zlib1g-dev build-essential libssl-dev libreadline-dev libyaml-dev libsqlite3-dev sqlite3 libxml2-dev libxslt1-dev libcurl4-openssl-dev software-properties-common libffi-dev nodejs yarn

Install Mysql 5.7 (Remember this is for Ubuntu 16.04, 18.04 versions)

sudo apt-get install mysql-server-5.7 mysql-client-core-5.7 libmysqlclient-dev
sudo service mysql status # or
systemctl status mysql
username: <your-username>, password: <your-password>

You can also try
mysql_secure_installation, if you use other mysql version.

Note that if you are setting up Ubuntu 20.04, there is a significant change in MySQL, as the version of MySQL is now 8.0 instead of 5.7. If you have applications running in MySQL 5.7, it is recommended that you set up and use Ubuntu 16.04 or 18.04.

We will continue the installation process in our next post.

Liferay 7.3: Connect with MYSQL database

Go inside the downloaded liferay portal folder. And start the tomcat server

cd ~/liferay-ce-portal-tomcat-7.3.0-ga1-20200127150653953/liferay-ce-portal-7.3.0-ga1/tomcat-9.0.17/bin
sh startup.sh


Goto http://localhost:8080/ and start setting up liferay configurations as follows:

Do the basic configuration and change database to mysql

Mysql database is recommended fot staging and production environment.

JDBC URL:
jdbc:mysql://localhost/lportal?characterEncoding=UTF-8&dontTrackOpenResources=true&holdResultsOpenOverStatementClose=true&serverTimezone
=GMT&useFastDateParsing=false&useUnicode=true
JDBC Driver Class Name: com.mysql.cj.jdbc.Driver

If you get the error like this, some of your provided mysql configuration went wrong. Check your mysql username and password.

If the given mysql configuration is correct, then you have to check the Liferay DB exists in your Mysql database. Else create the DB `lportal` as follows:

Create database ‘lportal’

Now click on the Configure button, you will be ready to go

Open โ€˜~/liferay-ce-portal-tomcat-7.3.0-ga1-20200127150653953/liferay-ce-portal-7.3.0-ga1/portal-setup-wizard.propertiesโ€™ file and you can see the above configurations added by liferay.

You can see the above server log in Dev studio. Here mentioned about which property files loads during the start and picking MYSQL DB, creating tables for first time in `lportal` database, inserting data to the table

You can see the following tables populated in the database lportal

mysql> show tables;
+--------------------------------+
| Tables_in_lportal              |
+--------------------------------+
| AMImageEntry                   |
| AccountEntry                   |
| AccountEntryOrganizationRel    |
| AccountEntryUserRel            |
| AccountRole                    |
| Account_                       |
| Address                        |
| AnnouncementsDelivery          |
| AnnouncementsEntry             |
| AnnouncementsFlag              |
| AppBuilderApp                  |
| AppBuilderAppDeployment        |
| AssetAutoTaggerEntry           |
| AssetCategory                  |
| AssetCategoryProperty          |
| AssetDisplayPageEntry          |
| AssetEntries_AssetCategories   |
| AssetEntries_AssetTags         |
| AssetEntry                     |
| AssetEntryAssetCategoryRel     |
| AssetEntryUsage                |
| AssetLink                      |
| AssetListEntry                 |
| AssetListEntryAssetEntryRel    |
| AssetListEntrySegmentsEntryRel |
| AssetListEntryUsage            |
| AssetTag                       |
| AssetVocabulary                |
| Audit_AuditEvent               |
| BackgroundTask                 |
| BatchEngineExportTask          |
| BatchEngineImportTask          |
| BlogsEntry                     |
| BlogsStatsUser                 |
| BrowserTracker                 |
| CTCollection                   |
| CTEntry                        |
| CTMessage                      |
| CTPreferences                  |
| CTProcess                      |
| CTSContent                     |
| Calendar                       |
| CalendarBooking                |
| CalendarNotificationTemplate   |
| CalendarResource               |
| ChangesetCollection            |
| ChangesetEntry                 |
| ClassName_                     |
| Company                        |
| CompanyInfo                    |
| Configuration_                 |
| Contact_                       |
| Contacts_Entry                 |
| Counter                        |
| Country                        |
| DDLRecord                      |
| DDLRecordSet                   |
| DDLRecordSetVersion            |
| DDLRecordVersion               |
| DDMContent                     |
| DDMDataProviderInstance        |
| DDMDataProviderInstanceLink    |
| DDMFormInstance                |
| DDMFormInstanceRecord          |
| DDMFormInstanceRecordVersion   |
| DDMFormInstanceVersion         |
| DDMStorageLink                 |
| DDMStructure                   |
| DDMStructureLayout             |
| DDMStructureLink               |
| DDMStructureVersion            |
| DDMTemplate                    |
| DDMTemplateLink                |
| DDMTemplateVersion             |
| DEDataDefinitionFieldLink      |
| DEDataListView                 |
| DLContent                      |
| DLFileEntry                    |
| DLFileEntryMetadata            |
| DLFileEntryType                |
| DLFileEntryTypes_DLFolders     |
| DLFileRank                     |
| DLFileShortcut                 |
| DLFileVersion                  |
| DLFileVersionPreview           |
| DLFolder                       |
| DLOpenerFileEntryReference     |
| DLSyncEvent                    |
| EmailAddress                   |
| ExpandoColumn                  |
| ExpandoRow                     |
| ExpandoTable                   |
| ExpandoValue                   |
| ExportImportConfiguration      |
| FragmentCollection             |
| FragmentEntry                  |
| FragmentEntryLink              |
| FriendlyURLEntry               |
| FriendlyURLEntryLocalization   |
| FriendlyURLEntryMapping        |
| Group_                         |
| Groups_Orgs                    |
| Groups_Roles                   |
| Groups_UserGroups              |
| HtmlPreviewEntry               |
| IM_MemberRequest               |
| Image                          |
| JournalArticle                 |
| JournalArticleLocalization     |
| JournalArticleResource         |
| JournalContentSearch           |
| JournalFeed                    |
| JournalFolder                  |
| KBArticle                      |
| KBComment                      |
| KBFolder                       |
| KBTemplate                     |
| KaleoAction                    |
| KaleoCondition                 |
| KaleoDefinition                |
| KaleoDefinitionVersion         |
| KaleoInstance                  |
| KaleoInstanceToken             |
| KaleoLog                       |
| KaleoNode                      |
| KaleoNotification              |
| KaleoNotificationRecipient     |
| KaleoTask                      |
| KaleoTaskAssignment            |
| KaleoTaskAssignmentInstance    |
| KaleoTaskForm                  |
| KaleoTaskFormInstance          |
| KaleoTaskInstanceToken         |
| KaleoTimer                     |
| KaleoTimerInstanceToken        |
| KaleoTransition                |
| Layout                         |
| LayoutBranch                   |
| LayoutClassedModelUsage        |
| LayoutFriendlyURL              |
| LayoutPageTemplateCollection   |
| LayoutPageTemplateEntry        |
| LayoutPageTemplateStructure    |
| LayoutPageTemplateStructureRel |
| LayoutPrototype                |
| LayoutRevision                 |
| LayoutSEOEntry                 |
| LayoutSEOSite                  |
| LayoutSet                      |
| LayoutSetBranch                |
| LayoutSetPrototype             |
| ListType                       |
| Lock_                          |
| MBBan                          |
| MBCategory                     |
| MBDiscussion                   |
| MBMailingList                  |
| MBMessage                      |
| MBStatsUser                    |
| MBThread                       |
| MBThreadFlag                   |
| MDRAction                      |
| MDRRule                        |
| MDRRuleGroup                   |
| MDRRuleGroupInstance           |
| Marketplace_App                |
| Marketplace_Module             |
| MembershipRequest              |
| OA2Auths_OA2ScopeGrants        |
| OAuth2Application              |
| OAuth2ApplicationScopeAliases  |
| OAuth2Authorization            |
| OAuth2ScopeGrant               |
| OrgGroupRole                   |
| OrgLabor                       |
| Organization_                  |
| PasswordPolicy                 |
| PasswordPolicyRel              |
| PasswordTracker                |
| Phone                          |
| PluginSetting                  |
| PollsChoice                    |
| PollsQuestion                  |
| PollsVote                      |
| PortalPreferences              |
| Portlet                        |
| PortletItem                    |
| PortletPreferences             |
| QUARTZ_BLOB_TRIGGERS           |
| QUARTZ_CALENDARS               |
| QUARTZ_CRON_TRIGGERS           |
| QUARTZ_FIRED_TRIGGERS          |
| QUARTZ_JOB_DETAILS             |
| QUARTZ_LOCKS                   |
| QUARTZ_PAUSED_TRIGGER_GRPS     |
| QUARTZ_SCHEDULER_STATE         |
| QUARTZ_SIMPLE_TRIGGERS         |
| QUARTZ_SIMPROP_TRIGGERS        |
| QUARTZ_TRIGGERS                |
| RatingsEntry                   |
| RatingsStats                   |
| ReadingTimeEntry               |
| RecentLayoutBranch             |
| RecentLayoutRevision           |
| RecentLayoutSetBranch          |
| Region                         |
| Release_                       |
| Repository                     |
| RepositoryEntry                |
| ResourceAction                 |
| ResourcePermission             |
| Role_                          |
| SAPEntry                       |
| SegmentsEntry                  |
| SegmentsEntryRel               |
| SegmentsEntryRole              |
| SegmentsExperience             |
| SegmentsExperiment             |
| SegmentsExperimentRel          |
| ServiceComponent               |
| SharingEntry                   |
| SiteFriendlyURL                |
| SiteNavigationMenu             |
| SiteNavigationMenuItem         |
| SocialActivity                 |
| SocialActivityAchievement      |
| SocialActivityCounter          |
| SocialActivityLimit            |
| SocialActivitySet              |
| SocialActivitySetting          |
| SocialRelation                 |
| SocialRequest                  |
| Subscription                   |
| SystemEvent                    |
| Team                           |
| Ticket                         |
| TrashEntry                     |
| TrashVersion                   |
| UserGroup                      |
| UserGroupGroupRole             |
| UserGroupRole                  |
| UserGroups_Teams               |
| UserIdMapper                   |
| UserNotificationDelivery       |
| UserNotificationEvent          |
| UserTracker                    |
| UserTrackerPath                |
| User_                          |
| Users_Groups                   |
| Users_Orgs                     |
| Users_Roles                    |
| Users_Teams                    |
| Users_UserGroups               |
| ViewCountEntry                 |
| VirtualHost                    |
| WebDAVProps                    |
| Website                        |
| WikiNode                       |
| WikiPage                       |
| WikiPageResource               |
| WorkflowDefinitionLink         |
| WorkflowInstanceLink           |
+--------------------------------+
262 rows in set (0.00 sec)

mysql> 

Here is the list of Users created. We are selected `Add Sample data` option at the time configuration. That is why we can see lot of users with my domain is listed

Login with

Email: provide the email given at the time of register

Password: test (default password for version 7.3)

If you are using Liferay 7.2 version, then this default password may not work. In this version liferay generates a random password, that you can check in the log file as below:

cd ~/Downloads/liferay-ce-portal-tomcat-7.2.1-ga2-20191111141448326/liferay-ce-portal-7.2.1-ga2/tomcat-9.0.17/logs/ && tail -f catalina.out


Search word password. You can get it.

After login you can see this terms of use page

Then a Password remainder page

DONE! Congrats, you can see the admin home page with all controls

changing to new db from old db

Add the new db name ‘liferay721’ to portal-setup-wizard.properties

.........
jdbc.default.url=jdbc:mysql://localhost/liferay721?characterEncoding=UTF-8&dontTrackOpenResources=true&holdResultsOpenOverStatementClose=true&serverTimezone=GMT&useFastDateParsing=false&useUnicode=true
.........

Restart the server. Server Log:

Starting Liferay Community Edition Portal 7.2.1 CE GA2 (Mueller / Build 7201 / November 12, 2019)

2020-05-08 14:45:53.921 INFO  [main][StartupHelper:75] There are no patches installed
2020-05-08 14:45:54.000 WARN  [main][ReleaseLocalServiceImpl:238] Table 'liferay721.Release_' doesn't exist
2020-05-08 14:45:54.003 INFO  [main][ReleaseLocalServiceImpl:129] Create tables and populate with default data
...................
...................

Issues you may face during the setup

Once I created a DB names liferay_7_3_1 and the setup wizard finishes successfully and message shown to restart the server for get into the portal.

When I restart it I received the following issue:

An unexpected system error occurred.

java.lang.NullPointerException

Database is not setup properly. Check the DB name. Use characters as DB name if possible and start the server. If proper DB setup is done, then you can see 250 – 280 tables in database

and the following tables were created in the database named liferay_7_3_1

Only 155 rows

+--------------------------------+
| Tables_in_liferay_7_3_1        |
+--------------------------------+
| AccountEntry                   |
| AccountEntryOrganizationRel    |
| AccountEntryUserRel            |
.........
|
| Website                        |
| WikiNode                       |
| WikiPage                       |
| WikiPageResource               |
| WorkflowDefinitionLink         |
| WorkflowInstanceLink           |
+--------------------------------+
155 rows in set (0.00 sec)

Then I renamed the DB to liferay731. Then it works! So beware of underscore characters in your DB name.

Another error is shown when the server is not able to connect to your database, or database is not created that is specified as in your portal.ext.properties or portal-setup-wizard.properties

Create a database specified in properties file to get rid of the issue. Else check given DB permissions are correct