Union – The Rails Drop

1. Add the test db and users table: https://railsdrop.com/2025/04/25/rails-8-app-postgresql-heap-vs-mysql-innodb-b-tree-indexing/

2. Add fake data into the table: https://railsdrop.com/2025/04/29/rails-8-app-postgresql-faker-extensions-for-rails/

Start Learn (Premium): https://railsdrop.com/sql-postgresql-queries-bitmap-seq-index-scan-db-clustering/

📌 Summary of all queries

Check: https://railsdrop.com/postgresql-queries-a-summary/

Read – Composite vs Individual indexes (Premium): https://railsdrop.com/sql-postgres-understanding-current-composite-index/

Read Create 1 million sample users: https://railsdrop.com/sql-postgresql-create-1-million-sample-users-data/

👉 1. `SELECT` – Basic Query

🔹 1. Select all columns:

SELECT * FROM users;

This gives you every row and every column in the users table.

🔹 2. Select specific columns:

SELECT username, email FROM users;

This limits the output to only username and email.

👉 2. `ALTER` – Modify Table Structure

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.

🔹 Example 1: Add a new column

Let’s add a column created_at of type timestamp:

ALTER TABLE users 
  ADD COLUMN created_at timestamp;

🔹 Example 2: Rename a column

Let’s rename phone_number to mobile:

ALTER TABLE users
  RENAME COLUMN phone_number TO mobile;

🔹 Example 3: Drop a column

Let’s say you want to remove the created_at column:

ALTER TABLE users
  DROP COLUMN created_at;

🔹 4. Modify specific columns:

UPDATE users
  SET mobile = '123456'
  WHERE mobile IS NULL;

Use UPDATE instead of ALTER when modifying data in a table.
ALTER is used for changing the structure/schema of a table (e.g., adding columns), not for updating data.

👉 3. `DISTINCT` – Remove Duplicate Values

The DISTINCT keyword is used to return only unique (non-duplicate) values in a result set.

🔹 Example 1: Distinct usernames

SELECT DISTINCT username FROM users;

This returns a list of unique usernames, removing any duplicates.

🔹 Example 2: Distinct combinations of username and email

SELECT DISTINCT username, email FROM users;
SELECT DISTINCT username from users WHERE username like '%quin%';
EXPLAIN ANALYSE SELECT DISTINCT username from users WHERE username like '%quin%';

This checks for uniqueness based on both username and email combined.

👉 4. `WHERE` – Filter Records + Major Combine Types (`AND`, `OR`, `NOT`)

The WHERE clause is used to filter records that meet a certain condition.

Let’s look at basic and combined conditions using our users table.

🔹 Example 1: Simple WHERE

SELECT * FROM users WHERE username = 'john_doe';

🔹 Example 2: `AND` – Combine multiple conditions (all must be true)

SELECT * FROM users 
WHERE username = 'quinton' AND email LIKE '%@gmail.com';

🔹 Example 3: `OR` – At least one condition must be true

SELECT * FROM users 
WHERE username = 'quinton' OR username = 'joaquin_hand';

🔹 Example 4: `NOT` – Negate a condition

SELECT * FROM users 
WHERE NOT email LIKE '%@example.com';

🔹 Example 5: Combine `AND`, `OR`, `NOT` (use parentheses!)

SELECT * FROM users 
WHERE (email like '%example%' OR email like '%test%') 
  AND NOT username = 'admin';

👉 5. `ORDER BY` – Sort the Results

And we’ll also look at combined queries afterward.

🔹 Example 1: Order by a single column (ascending)

SELECT * FROM users 
ORDER BY username;

🔹 Example 2: Order by a column (descending)

SELECT * FROM users 
ORDER BY email DESC;

🔹 Example 3: Order by multiple columns

SELECT * FROM users 
ORDER BY username ASC, email DESC;

👉 6. Combined Queries (UNION, INTERSECT, EXCEPT)

✅ These allow you to combine results from multiple SELECT statements.

⚠ Requirements:

Each query must return the same number of columns.
Data types must be compatible.

🔹 `UNION` – Combine results and remove duplicates

SELECT username FROM users WHERE email LIKE '%@example.com'
UNION
SELECT username FROM users WHERE username LIKE 'ton%';

🔹 `UNION ALL` – Combine results and keep duplicates

SELECT username FROM users WHERE email LIKE '%@gmail.com'
UNION ALL
SELECT username FROM users WHERE username LIKE 'test%';

🔹 `INTERSECT` – Return only common results

SELECT username FROM users 
  WHERE email LIKE '%@gmail.com'
INTERSECT
SELECT username FROM users 
  WHERE username LIKE 'test%';

SELECT username FROM users
  WHERE (email like '%example' OR email like '%test')
INTERSECT
SELECT username FROM users
  WHERE username like 'adam';

🔹 `EXCEPT` – Return results from the first query that are not in the second

SELECT username FROM users 
  WHERE email LIKE '%@example'
EXCEPT
SELECT username FROM users 
  WHERE (username like '%ada%' OR username like '%merlin%');

👉 7. `IS NULL` and `IS NOT NULL` – Handling Missing Data

These are used to check if a column contains a NULL value (i.e., no value).

🔹 Example 1: Users with a missing/have an email

# Find users with a missing email
SELECT * FROM users 
WHERE email IS NULL;

# Find 
SELECT * FROM users 
WHERE email IS NOT NULL;

🔹 Example 2: Users with no email and no mobile

SELECT * FROM users 
WHERE email IS NULL AND phone_number IS NULL;

🔹 Example 3: Users with either email or mobile missing

SELECT * FROM users 
WHERE email IS NULL OR phone_number IS NULL;

🔹 Example 4: Users who have an email and username starts with ‘adam’

SELECT * FROM users 
WHERE email IS NOT NULL AND username LIKE 'adam%';

🔹 Example 5: Users with email missing but username is not empty

SELECT * FROM users 
WHERE email IS NULL AND username IS NOT NULL;

🔹 Example 6: Users where email or mobile is null, but not both (exclusive or)

SELECT * FROM users 
WHERE (email IS NULL AND mobile IS NOT NULL)
   OR (email IS NOT NULL AND mobile IS NULL);

👉 8. `LIMIT`, `SELECT TOP`, `SELECT TOP PERCENT` (PostgreSQL-style)

In PostgreSQL, we use LIMIT instead of SELECT TOP.
(PostgreSQL doesn’t support TOP directly like SQL Server.)

🔹 Example 1: Limit number of results (first 10 rows)

SELECT * FROM users 
LIMIT 10;

🔹 Example 2: Combined with `ORDER BY` (top 5 newest usernames)

SELECT username FROM users 
  WHERE username IS NOT NULL
ORDER BY id DESC
LIMIT 5;

🔹 Example 3: Paginate (e.g., 11th to 20th row)

SELECT * FROM users 
ORDER BY id 
OFFSET 10 LIMIT 10;

🔔 Simulating `SELECT TOP` and `SELECT TOP PERCENT` in PostgreSQL

🔹 Example 4: Simulate `SELECT TOP 1`

SELECT * FROM users 
ORDER BY id 
LIMIT 1;

🔹 Example 5: Simulate `SELECT TOP 10 PERCENT`

To get the top 10% of users by id, you can use a subquery:

SELECT * FROM users
ORDER BY id
LIMIT (SELECT CEIL(COUNT(*) * 0.10) FROM users);

🔹 Example 6: Users with Gmail or Yahoo emails, ordered by ID, limit 5

SELECT id, username, email FROM users
WHERE email LIKE '%@gmail.com' OR email LIKE '%@yahoo.com'
AND username IS NOT NULL
ORDER BY id ASC
LIMIT 5;

Note: Without parentheses, AND has higher precedence than OR.

🔹 Better version with correct logic:

SELECT id, username, email FROM users
WHERE (email LIKE '%@gmail.com' OR email LIKE '%@yahoo.com')
  AND username IS NOT NULL
ORDER BY id ASC
LIMIT 5;

👉 9. Aggregation Functions: `MIN`, `MAX`, `COUNT`, `AVG`, `SUM`

These functions help you perform calculations on column values.

🔹 1. `COUNT` – Number of rows

SELECT COUNT(*) FROM users;

✔️ Total number of users.

SELECT COUNT(email) FROM users WHERE email IS NOT NULL;

✔️ Count of users who have an email.

🔹 2. `MIN` and `MAX` – Smallest and largest values

SELECT MIN(id) AS first_user, MAX(id) AS last_user FROM users;

🔹 3. `AVG` – Average (only on numeric fields)

Assuming id is somewhat sequential, we can do:

SELECT AVG(id) AS avg_id FROM users;

🔹 4. `SUM` – Total (again, only on numeric fields)

SELECT SUM(id) AS total_ids FROM users WHERE id < 1000;

Combined Queries with Aggregates

🔹 Example 1: Count users without email and with usernames starting with ‘test’

SELECT COUNT(*) FROM users 
WHERE email IS NULL AND username LIKE 'test%';

🔹 Example 2: Get min, max, avg ID of users with Gmail addresses

SELECT 
  MIN(id) AS min_id,
  MAX(id) AS max_id,
  AVG(id) AS avg_id
FROM users 
WHERE email LIKE '%@gmail.com';

🔹 Example 3: Count how many users per email domain

SELECT 
  SPLIT_PART(email, '@', 2) AS domain,
  COUNT(*) AS total_users
FROM users
WHERE email IS NOT NULL
GROUP BY domain
ORDER BY total_users DESC
LIMIT 5;

♦️ This query breaks email at the @ to group by domain like gmail.com, yahoo.com.

`GROUP BY` Course

Here’s the SQL query to get the maximum mark, minimum mark, and the email (or emails) of users grouped by each course:

Option 1: Basic `GROUP BY` with aggregate functions (only max/min mark per course, not emails)

SELECT
  course,
  MAX(mark) AS max_mark,
  MIN(mark) AS min_mark
FROM users
GROUP BY course;

Option 2: Include emails of users who have the max or min mark per course

(PostgreSQL-specific using subqueries and JOIN)

SELECT u.course, u.email, u.mark
FROM users u
JOIN (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark
  FROM users
  GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;

♦️ This second query shows all users who have the highest or lowest mark in their course, including ties.

Here’s the updated query that includes:

Course name
Emails of users with the maximum or minimum marks
Their marks
Average mark per course

SELECT
  u.course,
  u.email,
  u.mark,
  stats.max_mark,
  stats.min_mark,
  stats.avg_mark
FROM users u
JOIN (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark
  FROM users
  GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;

Notes:

ROUND(AVG(mark), 2) gives the average mark rounded to two decimal places.
Users with the same max or min mark are all included.

Here’s the full query including:

Course
Email
Mark
Max/Min mark
Average mark
User count per course

SELECT
  u.course,
  u.email,
  u.mark,
  stats.max_mark,
  stats.min_mark,
  stats.avg_mark,
  stats.user_count
FROM users u
JOIN (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark,
    COUNT(*) AS user_count
  FROM users
  GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;

♦️ This query gives you a full breakdown of top/bottom performers per course along with stats per group.

Here’s a version that adds the rank of each user within their course based on their mark (highest mark = rank 1), along with:

Course
Email
Mark
Rank (within course)
Max mark, Min mark, Average mark, User count per course

WITH ranked_users AS (
  SELECT
    u.*,
    RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
  FROM users u
),
course_stats AS (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark,
    COUNT(*) AS user_count
  FROM users
  GROUP BY course
)
SELECT
  r.course,
  r.email,
  r.mark,
  r.rank_in_course,
  cs.max_mark,
  cs.min_mark,
  cs.avg_mark,
  cs.user_count
FROM ranked_users r
JOIN course_stats cs ON r.course = cs.course
ORDER BY r.course, r.rank_in_course;

Key features:

Users are ranked per course using RANK() (supports ties).
The output includes all users, not just those with max/min marks.

NOTE: Here we can see output like:

    course    |                   email                   | mark | rank_in_course | max_mark | min_mark | avg_mark | user_count
--------------+-------------------------------------------+------+----------------+----------+----------+----------+------------
 IT           | lisandra.schoen@borer-effertz.test        | 1000 |              1 |     1000 |      100 |   543.04 |        796
 IT           | leona@jaskolski-jaskolski.test            | 1000 |              1 |     1000 |      100 |   543.04 |        796
 IT           | angle@ankunding-sauer.example             |  999 |              3 |     1000 |      100 |   543.04 |        796
 IT           | drucilla_okeefe@monahan.test              |  999 |              3 |     1000 |      100 |   543.04 |        796
 algebra      | natashia.langosh@luettgen.test            | 1000 |              1 |     1000 |      100 |   541.52 |        779
 algebra      | tiffany.tremblay@bergnaum.example         | 1000 |              1 |     1000 |      100 |   541.52 |        779
 algebra      | kristeen.nikolaus@crist.example           |  999 |              3 |     1000 |      100 |   541.52 |        779
 algebra      | domenic@predovic-dare.example             |  999 |              3 |     1000 |      100 |   541.52 |        779
 algebra      | kit@oconner.example                       |  999 |              3 |     1000 |      100 |   541.52 |        779
 architecture | tierra_reilly@botsford-okuneva.test       |  997 |              1 |      997 |      100 |   549.24 |        776
 architecture | celestine_reilly@bayer.example            |  996 |              2 |      997 |      100 |   549.24 |        776
 architecture | carson@kulas.example                      |  995 |              3 |      997 |      100 |   549.24 |        776
 botany       | hassan@towne.test                         | 1000 |              1 |     1000 |      103 |   554.07 |        760
 botany       | shaunna@hudson.test                       | 1000 |              1 |     1000 |      103 |   554.07 |        760
 botany       | sanford_jacobs@johnston.example           |  999 |              3 |     1000 |      103 |   554.07 |        760
 botany       | arnulfo_cremin@ernser.example             |  999 |              3 |     1000 |      103 |   554.07 |        760

The Ranks are not consistent. To avoid this we can use DENSE_RANK().

Here’s the updated query using DENSE_RANK() instead of RANK() — this avoids gaps in rank numbering when there are ties:

WITH ranked_users AS (
  SELECT
    u.*,
    DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
  FROM users u
),
course_stats AS (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark,
    COUNT(*) AS user_count
  FROM users
  GROUP BY course
)
SELECT
  r.course,
  r.email,
  r.mark,
  r.rank_in_course,
  cs.max_mark,
  cs.min_mark,
  cs.avg_mark,
  cs.user_count
FROM ranked_users r
JOIN course_stats cs ON r.course = cs.course
WHERE r.rank_in_course <= 3
ORDER BY r.course, r.rank_in_course;

`DENSE_RANK` difference:

If 2 users tie for 1st place, the next gets rank 2 (not 3 like with RANK).
Ensures consistent top-N output when ties are frequent.

🔥 Boom, Bonus: To export the query result as a CSV file in PostgreSQL, you can use the \copy command in psql (PostgreSQL’s CLI), like this:

🧾 Export Top 3 Students per Course to CSV

\copy (
  WITH ranked_users AS (
    SELECT
      u.*,
      DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
    FROM users u
  ),
  course_stats AS (
    SELECT
      course,
      MAX(mark) AS max_mark,
      MIN(mark) AS min_mark,
      ROUND(AVG(mark), 2) AS avg_mark,
      COUNT(*) AS user_count
    FROM users
    GROUP BY course
  )
  SELECT
    r.course,
    r.email,
    r.mark,
    r.rank_in_course,
    cs.max_mark,
    cs.min_mark,
    cs.avg_mark,
    cs.user_count
  FROM ranked_users r
  JOIN course_stats cs ON r.course = cs.course
  WHERE r.rank_in_course <= 3
  ORDER BY r.course, r.rank_in_course
) TO 'top_students_per_course.csv' WITH CSV HEADER;

✅ Requirements:

Run this in the psql shell.
The file top_students_per_course.csv will be saved in your local working directory (where psql was started).
Make sure PostgreSQL has write permissions to that directory.

👉 10. `LIKE`, `%`, `_` – Pattern Matching in SQL

These are used to filter text using wildcards:

% = matches any sequence of characters (0 or more)
_ = matches exactly one character

🔹 Basic `LIKE` Queries

Example 1: Usernames starting with “admin”

SELECT * FROM users 
WHERE username LIKE 'admin%';

Example 2: Usernames ending with “bot”

SELECT * FROM users 
WHERE username LIKE '%bot';

Example 3: Usernames containing “test”

SELECT * FROM users 
WHERE username LIKE '%test%';

🔹 `_` Single-character Wildcard

Example 4: 5-character usernames

SELECT * FROM users 
WHERE username LIKE '_____';

(Each _ stands for one character.)

Example 5: Emails starting with any single letter + “ohn” (e.g., “john”, “kohn”)

SELECT * FROM users 
WHERE username LIKE '_ohn';

Combined Queries with `LIKE`, `%`, `_`

🔹 Example 6: Users whose username contains “test” and email ends with “gmail.com”

SELECT * FROM users 
WHERE username LIKE '%test%' AND email LIKE '%@gmail.com';

🔹 Example 7: Users with 3-character usernames and missing email

SELECT * FROM users 
WHERE username LIKE '___' AND email IS NULL;

🔹 Example 8: Users with usernames that start with “a” or end with “x” and have a mobile number

SELECT * FROM users 
WHERE (username LIKE 'a%' OR username LIKE '%x') AND mobile IS NOT NULL;

👉 11. `IN`, `NOT IN`, `BETWEEN` – Set & Range Filters

These are used to filter based on a list of values (IN) or a range (BETWEEN).

🔹 1. `IN` – Match any of the listed values

SELECT * FROM users 
WHERE username IN ('admin', 'test_user', 'john_doe');

🔹 2. `NOT IN` – Exclude listed values

SELECT * FROM users 
WHERE username NOT IN ('admin', 'test_user');

🔹 3. `BETWEEN` – Match within a range (inclusive)

SELECT * FROM users 
WHERE id BETWEEN 100 AND 200;

♦️ Equivalent to: id >= 100 AND id <= 200

Combined Queries

🔹 Example 1: Users with username in a list and `id` between 1 and 500

SELECT * FROM users 
WHERE username IN ('alice', 'bob', 'carol') 
  AND id BETWEEN 1 AND 500;

🔹 Example 2: Exclude system users and select a range of IDs

SELECT id, username FROM users 
WHERE username NOT IN ('admin', 'system') 
  AND id BETWEEN 1000 AND 2000;

🔹 Example 3: Top 5 users whose email domains are in a specific list

SELECT * FROM users 
WHERE SPLIT_PART(email, '@', 2) IN ('gmail.com', 'yahoo.com', 'hotmail.com')
ORDER BY id
LIMIT 5;

👉 12. SQL Aliases – Renaming Columns or Tables Temporarily

Aliases help improve readability, especially in joins or when using functions.

🔹 1. Column Aliases

Use AS (optional keyword) to rename a column in the result.

Example 1: Rename `username` to `user_name`

SELECT username AS user_name, email AS user_email 
FROM users;

You can also omit AS:

SELECT username user_name, email user_email 
FROM users;

🔹 2. Table Aliases

Assign a short name to a table (very useful in joins).

Example 2: Simple alias for table

SELECT u.username, u.email 
FROM users u 
WHERE u.email LIKE '%@gmail.com';

🔹 3. Alias with functions

SELECT COUNT(*) AS total_users, MAX(id) AS latest_id 
FROM users;

Combined Query with Aliases

🔹 Example 4: Count Gmail users, alias result and filter

SELECT 
  COUNT(*) AS gmail_users 
FROM users u 
WHERE u.email LIKE '%@gmail.com';

🔹 Example 5: List usernames with shortened table name and domain extracted

SELECT 
  u.username AS name, 
  SPLIT_PART(u.email, '@', 2) AS domain 
FROM users u 
WHERE u.email IS NOT NULL 
ORDER BY u.username
LIMIT 10;

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

📌 Summary of all queries

👉 1. SELECT – Basic Query

👉 2. ALTER – Modify Table Structure

🔹 Example 1: Add a new column

🔹 Example 2: Rename a column

🔹 Example 3: Drop a column

🔹 4. Modify specific columns:

👉 3. DISTINCT – Remove Duplicate Values

🔹 Example 1: Distinct usernames

🔹 Example 2: Distinct combinations of username and email

👉 4. WHERE – Filter Records + Major Combine Types (AND, OR, NOT)

🔹 Example 1: Simple WHERE

🔹 Example 2: AND – Combine multiple conditions (all must be true)

🔹 Example 3: OR – At least one condition must be true

🔹 Example 4: NOT – Negate a condition

🔹 Example 5: Combine AND, OR, NOT (use parentheses!)

👉 5. ORDER BY – Sort the Results