Rails 8 App: Setup Test DB in PostgreSQL | Write SQL Queries | Operators | Joins

Here’s a list of commonly used SQL comparison operators with brief explanations and examples:

📋 Basic Comparison Operators:

OperatorMeaningExampleResult
=Equal toWHERE age = 25Matches rows where age is 25
<>Not equal to (standard)WHERE status <> 'active'Matches rows where status is not 'active'
!=Not equal to (alternative)WHERE id != 10Same as <>, matches if id is not 10
>Greater thanWHERE salary > 50000Matches rows with salary above 50k
<Less thanWHERE created_at < '2024-01-01'Matches dates before Jan 1, 2024
>=Greater than or equalWHERE age >= 18Matches age 18 and above
<=Less than or equalWHERE age <= 65Matches age 65 and below

📋 Other Common Operators:

OperatorMeaningExample
BETWEENWithin a rangeWHERE price BETWEEN 100 AND 200
INMatch any value in a listWHERE country IN ('US', 'CA', 'UK')
NOT INNot in a listWHERE role NOT IN ('admin', 'staff')
IS NULLValue is nullWHERE deleted_at IS NULL
IS NOT NULLValue is not nullWHERE updated_at IS NOT NULL
LIKEPattern match (case-insensitive in some DBs)WHERE name LIKE 'J%'
ILIKECase-insensitive LIKE (PostgreSQL only)WHERE email ILIKE '%@gmail.com'

Now we’ve our products and product_variants schema, let’s re-explore all major SQL JOINs using these two related tables.

####### Products

   Column    |              Type              | Collation | Nullable |               Default
-------------+--------------------------------+-----------+----------+--------------------------------------
 id          | bigint                         |           | not null | nextval('products_id_seq'::regclass)
 description | text                           |           |          |
 category    | character varying              |           |          |
 created_at  | timestamp(6) without time zone |           | not null |
 updated_at  | timestamp(6) without time zone |           | not null |
 name        | character varying              |           | not null |
 rating      | numeric(2,1)                   |           |          | 0.0
 brand       | character varying              |           |          |

######## Product variants

      Column      |              Type              | Collation | Nullable |                   Default
------------------+--------------------------------+-----------+----------+----------------------------------------------
 id               | bigint                         |           | not null | nextval('product_variants_id_seq'::regclass)
 product_id       | bigint                         |           | not null |
 sku              | character varying              |           | not null |
 mrp              | numeric(10,2)                  |           | not null |
 price            | numeric(10,2)                  |           | not null |
 discount_percent | numeric(5,2)                   |           |          |
 size             | character varying              |           |          |
 color            | character varying              |           |          |
 stock_quantity   | integer                        |           |          | 0
 specs            | jsonb                          |           | not null | '{}'::jsonb
 created_at       | timestamp(6) without time zone |           | not null |
 updated_at       | timestamp(6) without time zone |           | not null |

💎 SQL JOINS with products and product_variants

These tables are related through:

product_variants.product_id → products.id

So we can use that for all join examples.

🔸 1. INNER JOIN – Show only products with variants
SELECT 
  p.name, 
  pv.sku, 
  pv.price 
FROM products p
INNER JOIN product_variants pv ON p.id = pv.product_id;

♦️ Only returns products that have at least one variant.

🔸 2. LEFT JOIN – Show all products, with variants if available
SELECT 
  p.name, 
  pv.sku, 
  pv.price 
FROM products p
LEFT JOIN product_variants pv ON p.id = pv.product_id;

♦️ Returns all products, even those with no variants (NULLs in variant columns).

🔸 3. RIGHT JOIN – Show all variants, with product info if available

(Less common, but useful if variants might exist without a product record)

SELECT 
  pv.sku, 
  pv.price, 
  p.name 
FROM products p
RIGHT JOIN product_variants pv ON p.id = pv.product_id;

🔸 4. FULL OUTER JOIN – All records from both tables
SELECT 
  p.name AS product_name, 
  pv.sku AS variant_sku 
FROM products p
FULL OUTER JOIN product_variants pv ON p.id = pv.product_id;

♦️ Shows all products and all variants, even when there’s no match.

🔸 5. SELF JOIN Example (for product_variants comparing similar sizes or prices)

Let’s compare variants of the same product that are different sizes.

SELECT 
  pv1.product_id,
  pv1.size AS size_1,
  pv2.size AS size_2,
  pv1.sku AS sku_1,
  pv2.sku AS sku_2
FROM product_variants pv1
JOIN product_variants pv2 
  ON pv1.product_id = pv2.product_id 
  AND pv1.size <> pv2.size
WHERE pv1.product_id = 101;  -- example product

♦️ Useful to analyze size comparisons or price differences within a product.

🧬 Complex Combined JOIN Example

Show each product with its variants, and include only discounted ones (price < MRP):

SELECT 
  p.name AS product_name,
  pv.sku,
  pv.price,
  pv.mrp,
  (pv.mrp - pv.price) AS discount_value
FROM products p
INNER JOIN product_variants pv ON p.id = pv.product_id
WHERE pv.price < pv.mrp
ORDER BY discount_value DESC;

📑 JOIN Summary with These Tables

JOIN TypeUse Case
INNER JOINOnly products with variants
LEFT JOINAll products, even if they don’t have variants
RIGHT JOINAll variants, even if product is missing
FULL OUTER JOINEverything — useful in data audits
SELF JOINCompare or relate rows within the same table

Let’s now look at JOIN queries with more realistic conditions using products and product_variants.

🦾 Advanced JOIN Queries with Conditions to practice

🔹 1. All products with variants in stock AND discounted

SELECT 
  p.name AS product_name,
  pv.sku,
  pv.size,
  pv.color,
  pv.stock_quantity,
  pv.mrp,
  pv.price,
  (pv.mrp - pv.price) AS discount_amount
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE pv.stock_quantity > 0
  AND pv.price < pv.mrp
ORDER BY discount_amount DESC;

♦️ Shows available discounted variants, ordered by discount.

🔹 2. Products with high rating (4.5+) and at least one low-stock variant (< 10 items)

SELECT 
  p.name AS product_name,
  p.rating,
  pv.sku,
  pv.stock_quantity
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE p.rating >= 4.5
  AND pv.stock_quantity < 10;

🔹 3. LEFT JOIN to find products with no variants or all variants out of stock

SELECT 
  p.name AS product_name,
  pv.id AS variant_id,
  pv.stock_quantity
FROM products p
LEFT JOIN product_variants pv 
  ON p.id = pv.product_id AND pv.stock_quantity > 0
WHERE pv.id IS NULL;

✅ This tells you:

  • Either the product has no variants
  • Or all variants are out of stock

🔹 4. Group and Count Variants per Product

SELECT 
  p.name AS product_name,
  COUNT(pv.id) AS variant_count
FROM products p
LEFT JOIN product_variants pv ON p.id = pv.product_id
GROUP BY p.name
ORDER BY variant_count DESC;

🔹 5. Variants with price-percentage discount more than 30%

SELECT 
  p.name AS product_name,
  pv.sku,
  pv.mrp,
  pv.price,
  ROUND(100.0 * (pv.mrp - pv.price) / pv.mrp, 2) AS discount_percent
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE pv.price < pv.mrp
  AND (100.0 * (pv.mrp - pv.price) / pv.mrp) > 30;

🔹 6. Color-wise stock summary for a product category

SELECT 
  p.category,
  pv.color,
  SUM(pv.stock_quantity) AS total_stock
FROM products p
JOIN product_variants pv ON p.id = pv.product_id
WHERE p.category = 'Shoes'
GROUP BY p.category, pv.color
ORDER BY total_stock DESC;

These queries simulate real-world dashboard views: inventory tracking, product health, stock alerts, etc.


Happy SQL Query Writing! 🚀

Rails 8 App: Setup Test DB in PostgreSQL | Query Performance Using EXPLAIN ANALYZE

Now we’ll go full-on query performance pro mode using EXPLAIN ANALYZE and real plans. We’ll learn how PostgreSQL makes decisions, how to catch slow queries, and how your indexes make them 10x faster.

💎 Part 1: What is EXPLAIN ANALYZE?

EXPLAIN shows how PostgreSQL plans to execute your query.

ANALYZE runs the query and adds actual time, rows, loops, etc.

Syntax:

EXPLAIN ANALYZE
SELECT * FROM users WHERE username = 'bob';

✏️ Example 1: Without Index

SELECT * FROM users WHERE username = 'bob';

If username has no index, plan shows:

Seq Scan on users
  Filter: (username = 'bob')
  Rows Removed by Filter: 9999

❌ PostgreSQL scans all rows = Sequential Scan = slow!

➕ Add Index:

CREATE INDEX idx_users_username ON users (username);

Now rerun:

EXPLAIN ANALYZE SELECT * FROM users WHERE username = 'bob';

You’ll see:

Index Scan using idx_users_username on users
  Index Cond: (username = 'bob')

✅ PostgreSQL uses B-tree index
🚀 Massive speed-up!

🔥 Want even faster?

SELECT username FROM users WHERE username = 'bob';

If PostgreSQL shows:

Index Only Scan using idx_users_username on users
  Index Cond: (username = 'bob')

🎉 Index Only Scan! = covering index success!
No heap fetch = lightning-fast.

⚠️ Note: Index-only scan only works if:

  • Index covers all selected columns
  • Table is vacuumed (PostgreSQL uses visibility map)

If you still get Seq scan output like:

test_db=# EXPLAIN ANALYSE SELECT * FROM users where username = 'aman_chetri';
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..1.11 rows=1 width=838) (actual time=0.031..0.034 rows=1 loops=1)
   Filter: ((username)::text = 'aman_chetri'::text)
   Rows Removed by Filter: 2
 Planning Time: 0.242 ms
 Execution Time: 0.077 ms
(5 rows)

even after adding an index, because PostgreSQL is saying:

  • 🤔 “The table is so small (cost = 1.11), scanning the whole thing is cheaper than using the index.”
  • Also: Your query uses only SELECT username, which could be eligible for Index Only Scan, but heap fetch might still be needed due to visibility map.

🔧 Step-by-step Fix:

✅ 1. Add Data for Bigger Table

If the table is small (few rows), PostgreSQL will prefer Seq Scan no matter what.

Try adding ~10,000 rows:

INSERT INTO users (username, email, phone_number)
SELECT 'user_' || i, 'user_' || i || '@mail.com', '1234567890'
FROM generate_series(1, 10000) i;

Then VACUUM ANALYZE users; again and retry EXPLAIN.

✅ 2. Confirm Index Exists

First, check your index exists and is recognized:

\d users

You should see something like:

Indexes:
    "idx_users_username" btree (username)

If not, add:

CREATE INDEX idx_users_username ON users(username);

✅ 3. Run ANALYZE (Update Stats)
ANALYZE users;

This updates statistics — PostgreSQL might not be using the index if it thinks only one row matches or the table is tiny.

✅ 4. Vacuum for Index-Only Scan

Index-only scans require the visibility map to be set.

Run:

VACUUM ANALYZE users;

This marks pages in the table as “all-visible,” enabling PostgreSQL to avoid reading the heap.

✅ 5. Force PostgreSQL to Consider Index

You can turn off sequential scan temporarily (for testing):

SET enable_seqscan = OFF;

EXPLAIN SELECT username FROM users WHERE username = 'bob';

You should now see:

Index Scan using idx_users_username on users ...

⚠️ Use this only for testing/debugging — not in production.

💡 Extra Tip (optional): Use EXPLAIN (ANALYZE, BUFFERS)
EXPLAIN (ANALYZE, BUFFERS)
SELECT username FROM users WHERE username = 'bob';

This will show:

  • Whether heap was accessed
  • Buffer hits
  • Actual rows
📋 Summary
StepCommand
Check Index\d users
Analyze tableANALYZE users;
Vacuum for visibilityVACUUM ANALYZE users;
Disable seq scan for testSET enable_seqscan = OFF;
Add more rows (optional)INSERT INTO ...

🚨 How to catch bad index usage?

Always look for:

  • “Seq Scan” instead of “Index Scan” ➔ missing index
  • “Heap Fetch” ➔ not a covering index
  • “Rows Removed by Filter” ➔ inefficient filtering
  • “Loops: 1000+” ➔ possible N+1 issue

Common Pattern Optimizations

PatternFix
WHERE column = ?B-tree index on column
WHERE column LIKE 'prefix%'B-tree works (with text_ops)
SELECT col1 WHERE col2 = ?Covering index: (col2, col1) or (col2) INCLUDE (col1)
WHERE col BETWEEN ?Composite index with range second: (status, created_at)
WHERE col IN (?, ?, ?)Index still helps
ORDER BY col LIMIT 10Index on col helps sort fast

⚡ Tip: Use pg_stat_statements to Find Slow Queries

Enable it in postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'

Then run:

SELECT query, total_exec_time, calls
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;

🎯 Find your worst queries & optimize them with new indexes!

🧪 Try It Yourself

Want a little lab setup to practice?

CREATE TABLE users (
  user_id serial PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150),
  phone_number VARCHAR(20)
);

-- Insert 100K fake rows
INSERT INTO users (username, email, phone_number)
SELECT
  'user_' || i,
  'user_' || i || '@example.com',
  '999-000-' || i
FROM generate_series(1, 100000) i;

Then test:

  1. EXPLAIN ANALYZE SELECT * FROM users WHERE username = 'user_5000';
  2. Add INDEX ON username
  3. Re-run, compare speed!

🎯 Extra Pro Tools for Query Performance

  • EXPLAIN ANALYZE → Always first tool
  • pg_stat_statements → Find slow queries in real apps
  • auto_explain → Log slow plans automatically
  • pgBadger or pgHero → Visual query monitoring
💥 Now We Know:

✅ How to read query plans
✅ When you’re doing full scans vs index scans
✅ How to achieve index-only scans
✅ How to catch bad performance early
✅ How to test and fix in real world

Happy Performance Fixing.. 🚀

Rails 8 App: Setup Test DB | Comprehensive Guide 📖 for PostgreSQL , Mysql Indexing – PostgreSQL Heap ⛰ vs Mysql InnoDB B-Tree 🌿

Enter into psql terminal:

✗ psql postgres
psql (14.17 (Homebrew))
Type "help" for help.

postgres=# \l
                                     List of databases
           Name            |  Owner   | Encoding | Collate | Ctype |   Access privileges
---------------------------+----------+----------+---------+-------+-----------------------
 studio_development | postgres | UTF8     | C       | C     |
  • Create a new test database
  • Create a users Table
  • Check the db and table details
postgres=# create database test_db;
CREATE DATABASE

test_db=# CREATE TABLE users (
user_id INT,
username VARCHAR(220),
email VARCHAR(150),
phone_number VARCHAR(20)
);
CREATE TABLE

test_db=# \dt
List of relations
 Schema | Name  | Type  |  Owner
--------+-------+-------+----------
 public | users | table | abhilash
(1 row)

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           |          |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |

Add a Primary key to users and check the user table.

test_db=# ALTER TABLE users ADD PRIMARY KEY (user_id);
ALTER TABLE

test_db=# \d users;
                          Table "public.users"
    Column    |          Type          | Collation | Nullable | Default
--------------+------------------------+-----------+----------+---------
 user_id      | integer                |           | not null |
 username     | character varying(220) |           |          |
 email        | character varying(150) |           |          |
 phone_number | character varying(20)  |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (user_id)

# OR add primary key when creating the table:
CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150),
  phone_number VARCHAR(20)
);

You can a unique constraint and an index added when adding a primary key.

Why does adding a primary key also add an index?

  • A primary key must guarantee that each value is unique and fast to find.
  • Without an index, the database would have to scan the whole table every time you look up a primary key, which would be very slow.
  • So PostgreSQL automatically creates a unique index on the primary key to make lookups efficient and to enforce uniqueness at the database level.

👉 It needs the index for speed and to enforce the “no duplicates” rule of primary keys.

What is btree?

  • btree stands for Balanced Tree (specifically, a “B-tree” data structure).
  • It’s the default index type in PostgreSQL.
  • B-tree indexes organize the data in a tree structure, so that searches, inserts, updates, and deletes are all very efficient — about O(log n) time.
  • It’s great for looking up exact matches (like WHERE user_id = 123) or range queries (like WHERE user_id BETWEEN 100 AND 200).

👉 So when you see btree, it just means it’s using a very efficient tree structure for your primary key index.

Summary in one line:
Adding a primary key automatically adds a btree index to enforce uniqueness and make lookups super fast.


In MySQL (specifically InnoDB engine, which is default now):

  • Primary keys always create an index automatically.
  • The index is a clustered index — this is different from Postgres!
  • The index uses a B-tree structure too, just like Postgres.

👉 So yes, MySQL also adds an index and uses a B-tree under the hood for primary keys.

But here’s a big difference:

  • In InnoDB, the table data itself is stored inside the primary key’s B-tree.
    • That’s called a clustered index.
    • It means the physical storage of the table rows follows the order of the primary key.
  • In PostgreSQL, the index and the table are stored separately (non-clustered by default).

Example: If you have a table like this in MySQL:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(220),
  email VARCHAR(150)
);
  • user_id will have a B-tree clustered index.
  • The rows themselves will be stored sorted by user_id.

Short version:

DatabasePrimary Key BehaviorB-tree?Clustered?
PostgreSQLSeparate index created for PKYesNo (separate by default)
MySQL (InnoDB)PK index + Table rows stored inside the PK’s B-treeYesYes (always clustered)

Why Indexing on Unique Columns (like email) Improves Lookup 🔍

Use Case

You frequently run queries like:

SELECT * FROM students WHERE email = 'john@example.com';

Without an index, this results in a full table scan — checking each row one-by-one.

With an index, the database can jump directly to the row using a sorted structure, significantly reducing lookup time — especially in large tables.


🌲 How SQL Stores Indexes Internally (PostgreSQL)

📚 PostgreSQL uses B-Tree indexes by default.

When you run:

CREATE UNIQUE INDEX idx_students_on_email ON students(email);

PostgreSQL creates a balanced B-tree like this:

          m@example.com
         /              \
  d@example.com     t@example.com
  /        \           /         \
...      ...        ...         ...

  • ✅ Keys (email values) are sorted lexicographically.
  • ✅ Each leaf node contains a pointer to the actual row in the students table (called a tuple pointer or TID).
  • ✅ Lookup uses binary search, giving O(log n) performance.

⚙️ Unique Index = Even Faster

Because all email values are unique, the database:

  • Can stop searching immediately once a match is found.
  • Doesn’t need to scan multiple leaf entries (no duplicates).

🧠 Summary

FeatureValue
Index TypeB-tree (default in PostgreSQL)
Lookup TimeO(log n) vs O(n) without index
Optimized forEquality search (WHERE email = ...), sorting, joins
Email is unique?✅ Yes – index helps even more (no need to check multiple rows)
Table scan avoided?✅ Yes – PostgreSQL jumps directly via B-tree lookup

What Exactly is a Clustered Index in MySQL (InnoDB)?

🔹 In MySQL InnoDB, the primary key IS the table.

🔹 A Clustered Index means:

  • The table’s data rows are physically organized in the order of the primary key.
  • No separate storage for the table – it’s merged into the primary key’s B-tree structure.

In simple words:
👉 “The table itself lives inside the primary key B-tree.”

That’s why:

  • Every secondary index must store the primary key value (not a row pointer).
  • InnoDB can only have one clustered index (because you can’t physically order a table in two different ways).
📈 Visual for MySQL Clustered Index

Suppose you have:

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(255),
  email VARCHAR(255)
);

The storage looks like:

B-tree by user_id (Clustered)

user_id  | username | email
----------------------------
101      | Alice    | a@x.com
102      | Bob      | b@x.com
103      | Carol    | c@x.com

👉 Table rows stored directly inside the B-tree nodes by user_id!


🔵 PostgreSQL (Primary Key Index = Separate)

Imagine you have a users table:

users table (physical table):

row_id | user_id | username | email
-------------------------------------
  1    |   101   | Alice    | a@example.com
  2    |   102   | Bob      | b@example.com
  3    |   103   | Carol    | c@example.com

And the Primary Key Index looks like:

Primary Key B-Tree (separate structure):

user_id -> row pointer
 101    -> row_id 1
 102    -> row_id 2
 103    -> row_id 3

👉 When you query WHERE user_id = 102, PostgreSQL goes:

  • Find user_id 102 in the B-tree index,
  • Then jump to row_id 2 in the actual table.

🔸 Index and Table are separate.
🔸 Extra step: index lookup ➔ then fetch row.

🟠 MySQL InnoDB (Primary Key Index = Clustered)

Same users table, but stored like this:

Primary Key Clustered B-Tree (index + data together):

user_id | username | email
---------------------------------
  101   | Alice    | a@example.com
  102   | Bob      | b@example.com
  103   | Carol    | c@example.com

👉 When you query WHERE user_id = 102, MySQL:

  • Goes straight to user_id 102 in the B-tree,
  • Data is already there, no extra lookup.

🔸 Index and Table are merged.
🔸 One step: direct access!

📈 Quick Visual:

PostgreSQL
(Index)    ➔    (Table Row)
    |
    ➔ extra lookup needed

MySQL InnoDB
(Index + Row Together)
    |
    ➔ data found immediately

Summary:

  • PostgreSQL: primary key index is separate ➔ needs 2 steps (index ➔ table).
  • MySQL InnoDB: primary key index is clustered1 step (index = table).

📚 How Secondary Indexes Work

Secondary Index = an index on a column that is not the primary key.

Example:

CREATE INDEX idx_username ON users(username);

Now you have an index on username.

🔵 PostgreSQL Secondary Index Behavior

  • Secondary indexes are separate structures from the table (just like the primary key index).
  • When you query by username, PostgreSQL:
    1. Finds the matching row_id using the secondary B-tree index.
    2. Then fetches the full row from the table by row_id.
  • This is called an Index Scan + Heap Fetch.

📜 Example:

Secondary Index (username -> row_id):

username -> row_id
------------------
Alice    -> 1
Bob      -> 2
Carol    -> 3

(users table is separate)

👉 Flexible, but needs 2 steps: index (row_id) ➔ table.

🟠 MySQL InnoDB Secondary Index Behavior

  • In InnoDB, secondary indexes don’t store row pointers.
  • Instead, they store the primary key value!

So:

  1. Find the matching primary key using the secondary index.
  2. Use the primary key to find the actual row inside the clustered primary key B-tree.

📜 Example:

Secondary Index (username -> user_id):

username -> user_id
--------------------
Alice    -> 101
Bob      -> 102
Carol    -> 103

(Then find user_id inside Clustered B-Tree)

✅ Needs 2 steps too: secondary index (primary key) ➔ clustered table.

📈 Quick Visual:

FeaturePostgreSQLMySQL InnoDB
Secondary Indexusername ➔ row pointer (row_id)username ➔ primary key (user_id)
Fetch Full RowUse row_id to get table rowUse primary key to find row in clustered index
Steps to FetchIndex ➔ TableIndex ➔ Primary Key ➔ Table (clustered)
ActionPostgreSQLMySQL InnoDB
Primary Key LookupIndex ➔ Row (2 steps)Clustered Index (1 step)
Secondary Index LookupIndex (row_id) ➔ Row (2 steps)Secondary Index (PK) ➔ Row (2 steps)
Storage ModelSeparate index and tablePrimary key and table merged (clustered)

🌐 Now, let’s do some Real SQL Query ⛁ Examples!

1. Simple SELECT * FROM users WHERE user_id = 102;
  • PostgreSQL:
    Look into PK btree ➔ find row pointer ➔ fetch row separately.
  • MySQL InnoDB:
    Directly find the row inside the PK B-tree (no extra lookup).

MySQL is a little faster here because it needs only 1 step!

2. SELECT username FROM users WHERE user_id = 102; (Only 1 Column)
  • PostgreSQL:
    Might do an Index Only Scan if all needed data is in the index (very fast).
  • MySQL:
    Clustered index contains all columns already, no special optimization needed.

Both can be very fast, but PostgreSQL shines if the index is “covering” (i.e., contains all needed columns). Because index table has less size than clustered index of mysql.

3. SELECT * FROM users WHERE username = 'Bob'; (Secondary Index Search)
  • PostgreSQL:
    Secondary index on username ➔ row pointer ➔ fetch table row.
  • MySQL:
    Secondary index on username ➔ get primary key ➔ clustered index lookup ➔ fetch data.

Both are 2 steps, but MySQL needs 2 different B-trees: secondary ➔ primary clustered.

Consider the below situation:

SELECT username FROM users WHERE user_id = 102;
  • user_id is the Primary Key.
  • You only want username, not full row.

Now:

🔵 PostgreSQL Behavior

👉 In PostgreSQL, by default:

  • It uses the primary key btree to find the row pointer.
  • Then fetches the full row from the table (heap fetch).

👉 But PostgreSQL has an optimization called Index-Only Scan.

  • If all requested columns are already present in the index,
  • And if the table visibility map says the row is still valid (no deleted/updated row needing visibility check),
  • Then Postgres does not fetch the heap.

👉 So in this case:

  • If the primary key index also stores username internally (or if an extra index is created covering username), Postgres can satisfy the query just from the index.

✅ Result: No table lookup needed ➔ Very fast (almost as fast as InnoDB clustered lookup).

📢 Postgres primary key indexes usually don’t store extra columns, unless you specifically create an index that includes them (INCLUDE (username) syntax in modern Postgres 11+).

🟠 MySQL InnoDB Behavior
  • In InnoDB:
    Since the primary key B-tree already holds all columns (user_id, username, email),
    It directly finds the row from the clustered index.
  • So when you query by PK, even if you only need one column, it has everything inside the same page/block.

One fast lookup.

🔥 Why sometimes Postgres can still be faster?
  • If PostgreSQL uses Index-Only Scan, and the page is already cached, and no extra visibility check is needed,
    Then Postgres may avoid touching the table at all and only scan the tiny index pages.
  • In this case, for very narrow queries (e.g., only 1 small field), Postgres can outperform even MySQL clustered fetch.

💡 Because fetching from a small index page (~8KB) is faster than reading bigger table pages.

🎯 Conclusion:

✅ MySQL clustered index is always fast for PK lookups.
✅ PostgreSQL can be even faster for small/narrow queries if Index-Only Scan is triggered.

👉 Quick Tip:

  • In PostgreSQL, you can force an index to include extra columns by using: CREATE INDEX idx_user_id_username ON users(user_id) INCLUDE (username); Then index-only scans become more common and predictable! 🚀

Isn’t PostgreSQL also doing 2 B-tree scans? One for secondary index and one for table (row_id)?

When you query with a secondary index, like:

SELECT * FROM users WHERE username = 'Bob';
  • In MySQL InnoDB, I said:
    1. Find in secondary index (username ➔ user_id)
    2. Then go to primary clustered index (user_id ➔ full row)
Let’s look at PostgreSQL first:

♦️ Step 1: Search Secondary Index B-tree on username.

  • It finds the matching TID (tuple ID) or row pointer.
    • TID is a pair (block_number, row_offset).
    • Not a B-tree! Just a physical pointer.

♦️ Step 2: Use the TID to directly jump into the heap (the table).

  • The heap (table) is not a B-tree — it’s just a collection of unordered pages (blocks of rows).
  • PostgreSQL goes directly to the block and offset — like jumping straight into a file.

🔔 Important:

  • Secondary index ➔ TID ➔ heap fetch.
  • No second B-tree traversal for the table!
🟠 Meanwhile in MySQL InnoDB:

♦️ Step 1: Search Secondary Index B-tree on username.

  • It finds the Primary Key value (user_id).

♦️ Step 2: Now, search the Primary Key Clustered B-tree to find the full row.

  • Need another B-tree traversal based on user_id.

🔔 Important:

  • Secondary index ➔ Primary Key B-tree ➔ data fetch.
  • Two full B-tree traversals!
Real-world Summary:

♦️ PostgreSQL

  • Secondary index gives a direct shortcut to the heap.
  • One B-tree scan (secondary) ➔ Direct heap fetch.

♦️ MySQL

  • Secondary index gives PK.
  • Then another B-tree scan (primary clustered) to find full row.

PostgreSQL does not scan a second B-tree when fetching from the table — just a direct page lookup using TID.

MySQL does scan a second B-tree (primary clustered index) when fetching full row after secondary lookup.

Is heap fetch a searching technique? Why is it faster than B-tree?

📚 Let’s start from the basics:

When PostgreSQL finds a match in a secondary index, what it gets is a TID.

♦️ A TID (Tuple ID) is a physical address made of:

  • Block Number (page number)
  • Offset Number (row slot inside the page)

Example:

TID = (block_number = 1583, offset = 7)

🔵 How PostgreSQL uses TID?

  1. It directly calculates the location of the block (disk page) using block_number.
  2. It reads that block (if not already in memory).
  3. Inside that block, it finds the row at offset 7.

♦️ No search, no btree, no extra traversal — just:

  • Find the page (via simple number addressing)
  • Find the row slot

📈 Visual Example

Secondary index (username ➔ TID):

usernameTID
Alice(1583, 7)
Bob(1592, 3)
Carol(1601, 12)

♦️ When you search for “Bob”:

  • Find (1592, 3) from secondary index B-tree.
  • Jump directly to Block 1592, Offset 3.
  • Done ✅!

Answer:

  • Heap fetch is NOT a search.
  • It’s a direct address lookup (fixed number).
  • Heap = unordered collection of pages.
  • Pages = fixed-size blocks (usually 8 KB each).
  • TID gives an exact GPS location inside heap — no searching required.

That’s why heap fetch is faster than another B-tree search:

  • No binary search, no B-tree traversal needed.
  • Only a simple disk/memory read + row offset jump.

🌿 B-tree vs 📁 Heap Fetch

ActionB-treeHeap Fetch
What it doesBinary search inside sorted tree nodesDirect jump to block and slot
Steps neededTraverse nodes (root ➔ internal ➔ leaf)Directly read page and slot
Time complexityO(log n)O(1)
SpeedSlower (needs comparisons)Very fast (direct)

🎯 Final and short answer:

♦️ In PostgreSQL, after finding the TID in the secondary index, the heap fetch is a direct, constant-time (O(1)) accessno B-tree needed!
♦️ This is faster than scanning another B-tree like in MySQL InnoDB.


🧩 Our exact question:

When we say:

Jump directly to Block 1592, Offset 3.

We are thinking:

  • There are thousands of blocks.
  • How can we directly jump to block 1592?
  • Shouldn’t that be O(n) (linear time)?
  • Shouldn’t there be some traversal?

🔵 Here’s the real truth:

  • No traversal needed.
  • No O(n) work.
  • Accessing Block 1592 is O(1) — constant time.

📚 Why?

Because of how files, pages, and memory work inside a database.

When PostgreSQL stores a table (the “heap”), it saves it in a file on disk.
The file is just a long array of fixed-size pages.

  • Each page = 8KB (default in Postgres).
  • Each block = 1 page = fixed 8KB chunk.
  • Block 0 is the first 8KB.
  • Block 1 is next 8KB.
  • Block 2 is next 8KB.
  • Block 1592 = (1592 × 8 KB) offset from the beginning.

✅ So block 1592 is simply located at 1592 × 8192 bytes offset from the start of the file.

✅ Operating systems (and PostgreSQL’s Buffer Manager) know exactly how to seek to that byte position without reading everything before it.

📈 Diagram (imagine the table file):
+-----------+-----------+-----------+-----------+-----------+------+
| Block 0   | Block 1   | Block 2   | Block 3   | Block 4   |  ... |
+-----------+-----------+-----------+-----------+-----------+------+
  (8KB)       (8KB)       (8KB)       (8KB)       (8KB)

Finding Block 1592 ➔
Seek directly to offset 1592 * 8192 bytes ➔
Read 8KB ➔
Find row at Offset 3 inside it.

🤔 What happens technically?

If in memory (shared buffers / page cache):
  • PostgreSQL checks its buffer pool (shared memory).
  • “Do I already have block 1592 cached?”
    • ✅ Yes: immediately access memory address.
    • ❌ No: Load block 1592 from disk into memory.
If from disk (rare if cached):
  • File systems (ext4, xfs, etc) know how to seek to a byte offset in a file without reading previous parts.
  • Seek to (block_number × 8192) bytes.
  • Read exactly 8KB into memory.
  • No need to scan the whole file linearly.

📊 Final Step: Inside the Block

Once the block is loaded:

  • The block internally is structured like an array of tuples.
  • Each tuple is placed into an offset slot.
  • Offset 3 ➔ third tuple inside the block.

♦️ Again, this is just array lookup — no traversal, no O(n).

⚡ So to summarize:
QuestionAnswer
How does PostgreSQL jump directly to block?Using the block number × page size calculation (fixed offset math).
Is it O(n)?❌ No, it’s O(1) constant time
Is there any traversal?❌ No traversal. Just a seek + memory read.
How fast?Extremely fast if cached, still fast if disk seeks.
🔥 Key concept:

PostgreSQL heap access is O(1) because the heap file is a flat sequence of fixed-size pages, and the TID gives exact coordinates.

🎯 Simple Real World Example:

Imagine you have a giant book (the table file).
Each page of the book is numbered (block number).

If someone says:

👉 “Go to page 1592.”

♦️ You don’t need to read pages 1 to 1591 first.
♦️ You just flip directly to page 1592.

📗 Same idea: no linear traversal, just positional lookup.

🧠 Deep thought:

Because blocks are fixed size and TID is known,
heap fetch is almost as fast as reading a small array.

(Actually faster than searching B-tree because B-tree needs multiple comparisons at each node.)

Enjoy SQL! 🚀

Setup 🛠 Rails 8 App – Part 13: Composite keys & Candidate keys in Rails DB

🔑 What Is a Composite Key?

A composite key is a primary key made up of two or more columns that together uniquely identify a row in a table.

Use a composite key when no single column is unique on its own, but the combination is.

👉 Example: Composite Key in Action

Let’s say we’re building a table to track which students are enrolled in which courses.

Without Composite Key:
-- This table might allow duplicates
CREATE TABLE Enrollments (
  student_id INT,
  course_id INT
);

Nothing stops the same student from enrolling in the same course multiple times!

With Composite Key:
CREATE TABLE Enrollments (
  student_id INT,
  course_id INT,
  PRIMARY KEY (student_id, course_id)
);

Now:

  • student_id alone is not unique
  • course_id alone is not unique
  • But together → each (student_id, course_id) pair is unique

📌 Why Use Composite Keys?

When to UseWhy
Tracking many-to-many relationshipsEnsures unique pairs
Bridging/junction tablese.g., students-courses, authors-books
No natural single-column keyBut the combination is unique

⚠️ Things to Keep in Mind

  • Composite keys enforce uniqueness across multiple columns.
  • They can also be used as foreign keys in other tables.
  • Some developers prefer to add an auto-increment id as the primary key instead—but that’s a design choice.

🔎 What Is a Candidate Key?

A candidate key is any column (or combination of columns) in a table that can uniquely identify each row.

  • Every table can have multiple candidate keys
  • One of them is chosen to be the primary key
  • The rest are called alternate keys

🔑 Think of candidate keys as “potential primary keys”

👉 Example: Users Table

CREATE TABLE Users (
  user_id INT,
  username VARCHAR(80),
  email VARCHAR(150),
  phone_number VARCHAR(30)
);

Let’s have some hands own experience in SQL queries by creating a TEST DB. Check https://railsdrop.com/2025/04/25/rails-8-app-part-13-2-test-sql-queries/

Assume:

  • user_id is unique
  • username is unique
  • email is unique
Candidate Keys:
  • user_id
  • username
  • email

You can choose any one of them as the primary key, depending on your design needs.

-- Choosing user_id as the primary key
PRIMARY KEY (user_id)

The rest (username, email) are alternate keys.

📌 Characteristics of Candidate Keys

PropertyDescription
UniquenessMust uniquely identify each row
Non-nullCannot contain NULL values
MinimalityMust be the smallest set of columns that uniquely identifies a row (no extra columns)
No duplicatesNo two rows have the same value(s)

👥 Candidate Key vs Composite Key

ConceptExplanation
Candidate KeyAny unique identifier (single or multiple columns)
Composite KeyA candidate key that uses multiple columns

So: All composite keys are candidate keys, but not all candidate keys are composite.

💡 When Designing a Database

  • Find all possible candidate keys
  • Choose one as the primary key
  • (Optional) Define other candidate keys as unique constraints
CREATE TABLE Users (
  user_id INT PRIMARY KEY,
  username VARCHAR UNIQUE,
  email VARCHAR UNIQUE
);


Let’s walk through a real-world example using a schema we are already working on: a shopping app that sells clothing for women, men, kids, and infants.

We’ll look at how candidate keys apply to real tables like Users, Products, Orders, etc.

🛍️ Example Schema: Shopping App

1. Users Table

CREATE TABLE Users (
  user_id SERIAL PRIMARY KEY,
  email VARCHAR UNIQUE,
  username VARCHAR UNIQUE,
  phone_number VARCHAR
);

Candidate Keys:

  • user_id
  • email
  • username

We chose user_id as the primary key, but both email and username could also uniquely identify a user — so they’re candidate keys.


2. Products Table

CREATE TABLE Products (
  product_id SERIAL PRIMARY KEY,
  sku VARCHAR UNIQUE,
  name VARCHAR,
  category VARCHAR
);

Candidate Keys:

  • product_id
  • sku ✅ (Stock Keeping Unit – a unique identifier for each product)

sku is a candidate key. We use product_id as the primary key, but you could use sku if you wanted a natural key instead.

3. Orders Table

CREATE TABLE Orders (
  order_id SERIAL PRIMARY KEY,
  user_id INT REFERENCES Users(user_id),
  order_number VARCHAR UNIQUE,
  created_at TIMESTAMP
);

Candidate Keys:

  • order_id
  • order_number

You might use order_number (e.g., "ORD-20250417-0012") for external reference and order_id internally. Both are unique identifiers = candidate keys.

4. OrderItems Table (Join Table)

This table links orders to the specific products and quantities purchased.

CREATE TABLE OrderItems (
  order_id INT,
  product_id INT,
  quantity INT,
  PRIMARY KEY (order_id, product_id),
  FOREIGN KEY (order_id) REFERENCES Orders(order_id),
  FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

Candidate Key:

  • Composite key: (order_id, product_id)

Here, a combination of order_id and product_id uniquely identifies a row — i.e., what product was ordered in which order — making it a composite candidate key, and we’ve selected it as the primary key.

👀 Summary of Candidate Keys by Table

TableCandidate KeysPrimary Key Used
Usersuser_id, email, usernameuser_id
Productsproduct_id, skuproduct_id
Ordersorder_id, order_numberorder_id
OrderItems(order_id, product_id)(order_id, product_id)

Let’s explore how to implement candidate keys in both SQL and Rails (Active Record). Since we are working on a shopping app in Rails 8, I’ll show how to enforce uniqueness and data integrity in both layers:

🔹 1. Candidate Keys in SQL (PostgreSQL Example)

Let’s take the Users table with multiple candidate keys (email, username, and user_id).

CREATE TABLE users (
  user_id SERIAL PRIMARY KEY,
  email VARCHAR(255) NOT NULL UNIQUE,
  username VARCHAR(100) NOT NULL UNIQUE,
  phone_number VARCHAR(20)
);

  • user_id: chosen as the primary key
  • email and username: candidate keys, enforced via UNIQUE constraints

💎 Composite Key Example (OrderItems)

CREATE TABLE order_items (
  order_id INT,
  product_id INT,
  quantity INT NOT NULL,
  PRIMARY KEY (order_id, product_id),
  FOREIGN KEY (order_id) REFERENCES orders(order_id),
  FOREIGN KEY (product_id) REFERENCES products(product_id)
);

This sets (order_id, product_id) as a composite candidate key and primary key.

🔸 2. Candidate Keys in Rails (ActiveRecord)

Now let’s do the same with Rails models + migrations + validations.

users Migration (with candidate keys)

# db/migrate/xxxxxx_create_users.rb
class CreateUsers < ActiveRecord::Migration[8.0]
  def change
    create_table :users do |t|
      t.string :email, null: false
      t.string :username, null: false
      t.string :phone_number

      t.timestamps
    end

    add_index :users, :email, unique: true
    add_index :users, :username, unique: true
  end
end

User Model

class User < ApplicationRecord
  validates :email, presence: true, uniqueness: true
  validates :username, presence: true, uniqueness: true
end

✅ These are candidate keysemail and username could be primary keys, but we are using id instead.

✅ Composite Key with OrderItem (Join Table)

ActiveRecord doesn’t support composite primary keys natively, but you can enforce uniqueness via a multi-column index:

Migration:

class CreateOrderItems < ActiveRecord::Migration[8.0]
  def change
    create_table :order_items, id: false do |t|
      t.references :order, null: false, foreign_key: true
      t.references :product, null: false, foreign_key: true
      t.integer :quantity, null: false

      t.timestamps
    end

    add_index :order_items, [:order_id, :product_id], unique: true
  end
end

Model:

class OrderItem < ApplicationRecord
  belongs_to :order
  belongs_to :product

  validates :quantity, presence: true
  validates :order_id, uniqueness: { scope: :product_id }
end

🎯 This simulates a composite key behavior: each product can only appear once per order.

➕ Extra: Use composite_primary_keys Gem (Optional)

If you really need true composite primary keys, use:

gem 'composite_primary_keys'

But it’s best to avoid unless your use case demands it — most Rails apps use a surrogate key (id) for simplicity.


to be continued.. 🚀

Writing Perfect Active Record 🗒️ Queries in Ruby on Rails 8

Active Record (AR) is the heart of Ruby on Rails when it comes to database interactions. Writing efficient and readable queries is crucial for application performance and maintainability. This guide will help you master Active Record queries with real-world examples and best practices.


Setting Up a Sample Database

To demonstrate complex Active Record queries, let’s create a Rails app with a sample database structure containing multiple tables.

Generate Models & Migrations

rails new MyApp --database=postgresql
cd MyApp
rails g model User name:string email:string
rails g model Post title:string body:text user:references
rails g model Comment body:text user:references post:references
rails g model Category name:string
rails g model PostCategory post:references category:references
rails g model Like user:references comment:references
rails db:migrate

Database Schema Overview

  • users: Stores user information.
  • posts: Stores blog posts written by users.
  • comments: Stores comments on posts, linked to users and posts.
  • categories: Stores post categories.
  • post_categories: Join table for posts and categories.
  • likes: Stores likes on comments by users.

Basic Active Record Queries

1. Fetching All Records

User.all  # Returns all users (Avoid using it directly on large datasets as it loads everything into memory)

⚠️ User.all can lead to performance issues if the table contains a large number of records. Instead, prefer pagination (User.limit(100).offset(0)) or batch processing (User.find_each).

2. Finding a Specific Record

User.find(1)  # Finds a user by ID
User.find_by(email: 'john@example.com')  # Finds by attribute

3. Filtering with where vs having

Post.where(user_id: 2)  # Fetch all posts by user with ID 2

Difference between where and having:

  • where is used for filtering records before grouping.
  • having is used for filtering after group operations.

Example:

Post.group(:user_id).having('COUNT(id) > ?', 5)  # Users with more than 5 posts

4. Ordering Results

User.order(:name)  # Order users alphabetically
Post.order(created_at: :desc)  # Order posts by newest first

5. Limiting Results

Post.limit(5)  # Get the first 5 posts

6. Selecting Specific Columns

User.select(:id, :name)  # Only fetch ID and name

7. Fetching Users with a Specific Email Domain

User.where("email LIKE ?", "%@gmail.com")

8. Fetching the Most Recent Posts

Post.order(created_at: :desc).limit(5)

9. Using pluck for Efficient Data Retrieval

User.pluck(:email)  # Fetch only emails as an array

10. Checking if a Record Exists Efficiently

User.exists?(email: 'john@example.com')

11. Including Associations (eager loading to avoid N+1 queries)

Post.includes(:comments).where(comments: { body: 'Great post!' })


Advanced Queries with Joins

1. Joining Tables (INNER JOIN)

Post.joins(:user).where(users: { name: 'John' })

2. Self Join Example

A self-join is useful when dealing with hierarchical relationships, such as an employee-manager structure.

Model Setup

class Employee < ApplicationRecord
  belongs_to :manager, class_name: 'Employee', optional: true
  has_many :subordinates, class_name: 'Employee', foreign_key: 'manager_id'
end

Sample Data

idnamemanager_id
1AliceNULL
2Bob1
3Carol1
4Dave2

Query: Find Employees Who Report to Alice

Employee.joins(:manager).where(managers_employees: { name: 'Alice' })

Result:

idnamemanager_id
2Bob1
3Carol1

This query fetches employees who report to Alice (i.e., those where manager_id = 1).

3. Fetching Users with No Posts (LEFT JOIN with NULL check)

User.left_outer_joins(:posts).where(posts: { id: nil })

4. Counting Posts Per User

User.joins(:posts).group('users.id').count

Complex Queries in Active Record

1. Fetching Posts with the Most Comments

Post.joins(:comments)
    .group('posts.id')
    .order('COUNT(comments.id) DESC')
    .limit(1)

2. Fetching Posts with More than 5 Comments

Post.joins(:comments)
    .group(:id)
    .having('COUNT(comments.id) > ?', 5)

3. Finding Users Who Liked the Most Comments

User.joins(comments: :likes)
    .group('users.id')
    .select('users.id, users.name, COUNT(likes.id) AS likes_count')
    .order('likes_count DESC')
    .limit(1)

4. Fetching Posts Belonging to Multiple Categories

Post.joins(:categories).group('posts.id').having('COUNT(categories.id) > ?', 1)

5. Fetching the Last Comment of Each Post

Comment.select('DISTINCT ON (post_id) *').order('post_id, created_at DESC')

6. Fetching Users Who Haven’t Commented on a Specific Post

User.where.not(id: Comment.where(post_id: 10).select(:user_id))

7. Fetching Users Who Have Commented on Every Post

User.joins(:comments).group(:id).having('COUNT(DISTINCT comments.post_id) = ?', Post.count)

8. Finding Posts With No Comments

Post.left_outer_joins(:comments).where(comments: { id: nil })

9. Fetching the User Who Created the Most Posts

User.joins(:posts)
    .group('users.id')
    .select('users.id, users.name, COUNT(posts.id) AS post_count')
    .order('post_count DESC')
    .limit(1)

10. Fetching the Most Liked Comment

Comment.joins(:likes)
    .group('comments.id')
    .order('COUNT(likes.id) DESC')
    .limit(1)

11. Fetching Comments with More than 3 Likes and Their Associated Posts

Comment.joins(:likes, :post)
    .group('comments.id', 'posts.id')
    .having('COUNT(likes.id) > ?', 3)

12. Finding Users Who Haven’t Liked Any Comments

User.left_outer_joins(:likes).where(likes: { id: nil })

13. Fetching Users, Their Posts, and the Count of Comments on Each Post

User.joins(posts: :comments)
    .group('users.id', 'posts.id')
    .select('users.id, users.name, posts.id AS post_id, COUNT(comments.id) AS comment_count')
    .order('comment_count DESC')

Importance of inverse_of in Model Associations

What is inverse_of?

The inverse_of option in Active Record associations helps Rails correctly link objects in memory, avoiding unnecessary database queries and ensuring bidirectional association consistency.

Example Usage

class User < ApplicationRecord
  has_many :posts, inverse_of: :user
end

class Post < ApplicationRecord
  belongs_to :user, inverse_of: :posts
end

Why Use inverse_of?

  • Performance Optimization: Prevents extra queries by using already loaded objects.
  • Ensures Data Consistency: Updates associations without additional database fetches.
  • Enables Nested Attributes: Helps when using accepts_nested_attributes_for.

Example:

user = User.new(name: 'Alice')
post = user.posts.build(title: 'First Post')
post.user == user  # True without needing an additional query

Best Practices to use in Rails Projects

1. Using Scopes for Readability

class Post < ApplicationRecord
  scope :recent, -> { order(created_at: :desc) }
end

Post.recent.limit(10)  # Fetch recent posts

2. Using find_each for Large Datasets

User.find_each(batch_size: 100) do |user|
  puts user.email
end

3. Avoiding SELECT * for Performance

User.select(:id, :name).load

4. Avoiding N+1 Queries with includes

Post.includes(:comments).each do |post|
  puts post.comments.count
end


Conclusion

Mastering Active Record queries is essential for writing performant and maintainable Rails applications. By using joins, scopes, batch processing, and eager loading, you can write clean and efficient queries that scale well.

Do you have any favorite Active Record query tricks? Share them in the comments!

Liferay 7.3: Add service builder to the portlet


In the past, I made the decision to create the portlet and service builder directly within the Eclipse workspace, rather than creating a Liferay workspace project within the Eclipse workspace. However, this approach has caused some challenges when attempting to add the service builder to my portlet, as both of them are located within the Eclipse workspace.

Could not run phased build action using Gradle distribution 'https://services.gradle.org/distributions/gradle-5.6.4-bin.zip'.
Build file '/home/abhilash/eclipse-workspace/register-emailbox/build.gradle' line: 32
A problem occurred evaluating root project 'register-emailbox'.
Project with path ':sitesService:sitesService-api' could not be found in root project 'register-emailbox'.

Several individuals have encountered this particular issue, and you can find detailed guidance on resolving it in the Liferay developer article focused on creating a service builder.

https://liferay.dev/blogs/-/blogs/creating-service-builder-mvc-portlet-in-liferay-7-with-liferay-ide-3-

People reactions:

But no solution mentioned here

Through extensive research, I discovered that the solution to this issue requires creating both a portlet and a service builder within the Liferay workspace, rather than the Eclipse workspace. Specifically, it is essential to create a Liferay workspace project inside the Eclipse workspace to address this problem effectively.

Lets do that this time.

Click File -> New -> Liferay Workspace Project

Provide a Project Name and click on Finish

Next right click on the da-workspace, New -> Liferay Module Project

Provide the Project Name, then it automatically changes the Location

Provide the class name and project name

Deploy this service by clicking on the gradle section of IDE and double click on deploy

Deployed successfully

You can see the module os created inside our new Liferay Workspace: da-workspace

Jar file created

Copy this jar file and paste into the liferay server folder path given below:

~/liferay-ce-portal-tomcat-7.3.0-ga1-20200127150653953/liferay-ce-portal-7.3.0-ga1/deploy

You can see the server log like this:

2020-04-14 09:16:09.299 INFO  [fileinstall-/home/abhilash/liferay-ce-portal-tomcat-7.3.0-ga1-20200127150653953/liferay-ce-portal-7.3.0-ga1/osgi/modules][BundleStartStopLogger:39] STARTED com.emailbox_1.0.0 [1117]

Status -> STARTED

Now delete our old services. Goto the Goshell and uninstal the bundles:

Now goto the liferay and check our newly created portlet

Now lets repeat the steps for creating the service-builder from the previous article. But this time create it from da-workspace

File -> New -> Liferay Module Project

Services are created – For details check the previous article

Folder structure for the portlet and the service builder

Add the details as shown in the below screenshots (If any doubt check the previous article).

Do builder service and deploy

Copy this jar files one by one to the server’s deploy folder. First *api.jar and then *service.jar

Server logs:

liferay-ce-portal-tomcat-7.3.0-ga1-20200127150653953/liferay-ce-portal-7.3.0-ga1/osgi/modules][BundleStartStopLogger:39] STARTED com.siteservice.api_1.0.0 [1118]

liferay-ce-portal-7.3.0-ga1/osgi/modules][BundleStartStopLogger:39] STARTED com.siteservice.service_1.0.0 [1119]


Check the database, you can see the Site_ Table and columns are created.

Now add the service builder dependancy to the portlet

Add this two lines in the build.gradle file

Right click on showEmailBox portlet and gradle -> refresh gradle project

DONE! You are successfully binded the service builder to your portlet.

now add the following to your portal class file above the doView function

@Reference
private SiteLocalService _siteLocalService;

now you can use the following default functions provided by liferay on the service.

_siteLocalService.fetchSite(23);
_siteLocalService.createSite(2344);
_siteLocalService.deleteSite(2233);
_siteLocalService.getSitesCount();
_siteLocalService.updateSite(site);

But what is we needed to fetch suppose some sites which has particular site_id Or fetch all sites which has registered after this time etc?

For all these custom query to mysql db, we needed to create a custom finder methods. So lets create one.

Open service.xml of `siteService-service`

Click on Finders and add Name and Type

Click on Finder column and add the db column to find

Click on Source, you can see the finder is added

Double click on the buildService to build the service

Now we can add custom finder findBySiteId to this service.

Open siteLocalServiceImpl.java

package com.siteservice.service.impl;

import com.liferay.portal.aop.AopService;
import com.siteservice.model.Site;
import com.siteservice.service.base.SiteLocalServiceBaseImpl;

import java.util.List;

import org.osgi.service.component.annotations.Component;


@Component(
	property = "model.class.name=com.siteservice.model.Site",
	service = AopService.class
)
public class SiteLocalServiceImpl extends SiteLocalServiceBaseImpl {

	public List<Site> findBySiteId(long site_id) {
		return sitePersistence.findBySiteId(site_id);
	}
}

Now do the buildService for siteService. Then Gradle -> Refresh and deploy the service. Copy this jar files one by one to the server’s deploy folder. First *api.jar and then *service.jar

Refresh Gradle project for the portlet – showEmailBox

Add the following to the doView function of the portlet

Site site = _siteLocalService.findBySiteId(2233).get(0);
		
System.out.println("We got the site: ---------");
System.out.println(site);

and don’t forget to create a site entry in the database with id: 2233

mysql> insert into Site_ (id_, site_id, name, register_from_date, register_to_date, created_at, updated_at) values (1, 2233, 'Site 2020', '2020-01-01', '2020-06-19', CURDATE(), CURDATE());

deploy the portlet and check you are getting the site in the server console.

that’s it for now, will see in the next article.

Liferay 7.3: Configure IDE

Step 1. Download it from

Download from here:
https://liferay.dev/project/-/asset_publisher/TyF2HQPLV1b5/content/ide-installation-instructions

Extract it to your /home directory

Step 2. run the installer

./LiferayProjectSDKwithDevStudioCommunityEdition-201910152009-linux-x64-installer.run

You can see the steps involving the process below.

Select java runtime

After install Open it. Double click on the following file:

~/LiferayProjectSDKwithDevStudioCommunityEdition/liferay-developer-studio/DeveloperStudio

you can see the welcome page. Close it.



Click on Workbench to start on

This is how Liferay Studio Workspace home page look like

You can create liferay plugins / projects etc from here.

Step 3. Click on right corner first button and open perspective.

select Liferay plugins

Open Perspective

Step 4. Left bottom corner there is ‘Servers’ Tab

right click on it and select ‘New’ -> ‘Server’

Select ‘Liferay inc’ -> Liferay 7.x and click next

Add the tomcat server to your SDK to control the server from SDK. This helps you to see the server logs and other live status in IDE

Select the tomcat server path from your downloaded liferay portal

Make sure you are using the timestamp folder in the path of the server, else it not gonna work. It will show you an error like this:

If you start the tomcat server and try to access it without the timestamp PATH

You can add an existing resources created if available like theme etc to the server. If you don’t have any resources don’t worry. We are going to cover this in next chapter.

Click on the server name on the bottom left corner and you can see the configurations of the server.

Server Started. You can see the server logs in the console

ThinkingSphinx::SphinxError (undefined method `next_result’ for Mysql2 (rails 3)

I got this error after my thinking sphinx updation to the newest version of ‘3.0.5’. The error shows in the exact line of the code Model.search in my Rails controller. I updated my mysql2 version to ‘0.3.13’, and the issue is solved.
So in you Gemfile update the mysql2 version:

gem 'mysql2', '0.3.13'

And do

$ bundle install