Rails 8 App: Setup Test DB in PostgreSQL | Write SQL Queries

1. Add the test db and users table: https://railsdrop.com/2025/04/25/rails-8-app-postgresql-heap-vs-mysql-innodb-b-tree-indexing/

2. Add fake data into the table: https://railsdrop.com/2025/04/29/rails-8-app-postgresql-faker-extensions-for-rails/

Start Learn (Premium): https://railsdrop.com/sql-postgresql-queries-bitmap-seq-index-scan-db-clustering/

๐Ÿ“Œ Summary of all queries

Check: https://railsdrop.com/postgresql-queries-a-summary/


Read – Composite vs Individual indexes (Premium): https://railsdrop.com/sql-postgres-understanding-current-composite-index/

Read Create 1 million sample users: https://railsdrop.com/sql-postgresql-create-1-million-sample-users-data/

๐Ÿ‘‰ 1. SELECT โ€“ Basic Query

๐Ÿ”น 1. Select all columns:

SELECT * FROM users;

This gives you every row and every column in the users table.

๐Ÿ”น 2. Select specific columns:

SELECT username, email FROM users;

This limits the output to only username and email.

๐Ÿ‘‰ 2. ALTER โ€“ Modify Table Structure

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.

๐Ÿ”น Example 1: Add a new column

Letโ€™s add a column created_at of type timestamp:

ALTER TABLE users 
  ADD COLUMN created_at timestamp;

๐Ÿ”น Example 2: Rename a column

Letโ€™s rename phone_number to mobile:

ALTER TABLE users
  RENAME COLUMN phone_number TO mobile;
๐Ÿ”น Example 3: Drop a column

Letโ€™s say you want to remove the created_at column:

ALTER TABLE users
  DROP COLUMN created_at;
๐Ÿ”น 4. Modify specific columns:
UPDATE users
  SET mobile = '123456'
  WHERE mobile IS NULL;
  • Use UPDATE instead of ALTER when modifying data in a table.
  • ALTER is used for changing the structure/schema of a table (e.g., adding columns), not for updating data.

๐Ÿ‘‰ 3. DISTINCT โ€“ Remove Duplicate Values

The DISTINCT keyword is used to return only unique (non-duplicate) values in a result set.

๐Ÿ”น Example 1: Distinct usernames
SELECT DISTINCT username FROM users;

This returns a list of unique usernames, removing any duplicates.

๐Ÿ”น Example 2: Distinct combinations of username and email
SELECT DISTINCT username, email FROM users;
SELECT DISTINCT username from users WHERE username like '%quin%';
EXPLAIN ANALYSE SELECT DISTINCT username from users WHERE username like '%quin%';

This checks for uniqueness based on both username and email combined.


๐Ÿ‘‰ 4. WHERE โ€“ Filter Records + Major Combine Types (AND, OR, NOT)

The WHERE clause is used to filter records that meet a certain condition.

Letโ€™s look at basic and combined conditions using our users table.

๐Ÿ”น Example 1: Simple WHERE
SELECT * FROM users WHERE username = 'john_doe';

๐Ÿ”น Example 2: AND โ€“ Combine multiple conditions (all must be true)
SELECT * FROM users 
WHERE username = 'quinton' AND email LIKE '%@gmail.com';

๐Ÿ”น Example 3: OR โ€“ At least one condition must be true
SELECT * FROM users 
WHERE username = 'quinton' OR username = 'joaquin_hand';

๐Ÿ”น Example 4: NOT โ€“ Negate a condition
SELECT * FROM users 
WHERE NOT email LIKE '%@example.com';

๐Ÿ”น Example 5: Combine AND, OR, NOT (use parentheses!)
SELECT * FROM users 
WHERE (email like '%example%' OR email like '%test%') 
  AND NOT username = 'admin';


๐Ÿ‘‰ 5. ORDER BY โ€“ Sort the Results

And weโ€™ll also look at combined queries afterward.

๐Ÿ”น Example 1: Order by a single column (ascending)
SELECT * FROM users 
ORDER BY username;

๐Ÿ”น Example 2: Order by a column (descending)
SELECT * FROM users 
ORDER BY email DESC;

๐Ÿ”น Example 3: Order by multiple columns
SELECT * FROM users 
ORDER BY username ASC, email DESC;


๐Ÿ‘‰ 6. Combined Queries (UNION, INTERSECT, EXCEPT)

โœ… These allow you to combine results from multiple SELECT statements.

โš  Requirements:

  • Each query must return the same number of columns.
  • Data types must be compatible.
๐Ÿ”น UNION โ€“ Combine results and remove duplicates
SELECT username FROM users WHERE email LIKE '%@example.com'
UNION
SELECT username FROM users WHERE username LIKE 'ton%';

๐Ÿ”น UNION ALL โ€“ Combine results and keep duplicates
SELECT username FROM users WHERE email LIKE '%@gmail.com'
UNION ALL
SELECT username FROM users WHERE username LIKE 'test%';

๐Ÿ”น INTERSECT โ€“ Return only common results
SELECT username FROM users 
  WHERE email LIKE '%@gmail.com'
INTERSECT
SELECT username FROM users 
  WHERE username LIKE 'test%';

SELECT username FROM users
  WHERE (email like '%example' OR email like '%test')
INTERSECT
SELECT username FROM users
  WHERE username like 'adam';
๐Ÿ”น EXCEPT โ€“ Return results from the first query that are not in the second
SELECT username FROM users 
  WHERE email LIKE '%@example'
EXCEPT
SELECT username FROM users 
  WHERE (username like '%ada%' OR username like '%merlin%');


๐Ÿ‘‰ 7. IS NULL and IS NOT NULL โ€“ Handling Missing Data

These are used to check if a column contains a NULL value (i.e., no value).

๐Ÿ”น Example 1: Users with a missing/have an email
# Find users with a missing email
SELECT * FROM users 
WHERE email IS NULL;

# Find 
SELECT * FROM users 
WHERE email IS NOT NULL;

๐Ÿ”น Example 2: Users with no email and no mobile
SELECT * FROM users 
WHERE email IS NULL AND phone_number IS NULL;

๐Ÿ”น Example 3: Users with either email or mobile missing
SELECT * FROM users 
WHERE email IS NULL OR phone_number IS NULL;

๐Ÿ”น Example 4: Users who have an email and username starts with ‘adam’
SELECT * FROM users 
WHERE email IS NOT NULL AND username LIKE 'adam%';

๐Ÿ”น Example 5: Users with email missing but username is not empty
SELECT * FROM users 
WHERE email IS NULL AND username IS NOT NULL;

๐Ÿ”น Example 6: Users where email or mobile is null, but not both (exclusive or)
SELECT * FROM users 
WHERE (email IS NULL AND mobile IS NOT NULL)
   OR (email IS NOT NULL AND mobile IS NULL);


๐Ÿ‘‰ 8. LIMIT, SELECT TOP, SELECT TOP PERCENT (PostgreSQL-style)

In PostgreSQL, we use LIMIT instead of SELECT TOP.
(PostgreSQL doesn’t support TOP directly like SQL Server.)

๐Ÿ”น Example 1: Limit number of results (first 10 rows)
SELECT * FROM users 
LIMIT 10;

๐Ÿ”น Example 2: Combined with ORDER BY (top 5 newest usernames)
SELECT username FROM users 
  WHERE username IS NOT NULL
ORDER BY id DESC
LIMIT 5;

๐Ÿ”น Example 3: Paginate (e.g., 11th to 20th row)
SELECT * FROM users 
ORDER BY id 
OFFSET 10 LIMIT 10;

๐Ÿ”” Simulating SELECT TOP and SELECT TOP PERCENT in PostgreSQL

๐Ÿ”น Example 4: Simulate SELECT TOP 1
SELECT * FROM users 
ORDER BY id 
LIMIT 1;

๐Ÿ”น Example 5: Simulate SELECT TOP 10 PERCENT

To get the top 10% of users by id, you can use a subquery:

SELECT * FROM users
ORDER BY id
LIMIT (SELECT CEIL(COUNT(*) * 0.10) FROM users);

๐Ÿ”น Example 6: Users with Gmail or Yahoo emails, ordered by ID, limit 5
SELECT id, username, email FROM users
WHERE email LIKE '%@gmail.com' OR email LIKE '%@yahoo.com'
AND username IS NOT NULL
ORDER BY id ASC
LIMIT 5;

Note: Without parentheses, AND has higher precedence than OR.

๐Ÿ”น Better version with correct logic:
SELECT id, username, email FROM users
WHERE (email LIKE '%@gmail.com' OR email LIKE '%@yahoo.com')
  AND username IS NOT NULL
ORDER BY id ASC
LIMIT 5;


๐Ÿ‘‰ 9. Aggregation Functions: MIN, MAX, COUNT, AVG, SUM

These functions help you perform calculations on column values.

๐Ÿ”น 1. COUNT โ€“ Number of rows
SELECT COUNT(*) FROM users;

โœ”๏ธ Total number of users.

SELECT COUNT(email) FROM users WHERE email IS NOT NULL;

โœ”๏ธ Count of users who have an email.

๐Ÿ”น 2. MIN and MAX โ€“ Smallest and largest values
SELECT MIN(id) AS first_user, MAX(id) AS last_user FROM users;

๐Ÿ”น 3. AVG โ€“ Average (only on numeric fields)

Assuming id is somewhat sequential, we can do:

SELECT AVG(id) AS avg_id FROM users;

๐Ÿ”น 4. SUM โ€“ Total (again, only on numeric fields)
SELECT SUM(id) AS total_ids FROM users WHERE id < 1000;

Combined Queries with Aggregates

๐Ÿ”น Example 1: Count users without email and with usernames starting with ‘test’
SELECT COUNT(*) FROM users 
WHERE email IS NULL AND username LIKE 'test%';

๐Ÿ”น Example 2: Get min, max, avg ID of users with Gmail addresses
SELECT 
  MIN(id) AS min_id,
  MAX(id) AS max_id,
  AVG(id) AS avg_id
FROM users 
WHERE email LIKE '%@gmail.com';

๐Ÿ”น Example 3: Count how many users per email domain
SELECT 
  SPLIT_PART(email, '@', 2) AS domain,
  COUNT(*) AS total_users
FROM users
WHERE email IS NOT NULL
GROUP BY domain
ORDER BY total_users DESC
LIMIT 5;

โ™ฆ๏ธ This query breaks email at the @ to group by domain like gmail.com, yahoo.com.

GROUP BY Course

Hereโ€™s the SQL query to get the maximum mark, minimum mark, and the email (or emails) of users grouped by each course:

Option 1: Basic GROUP BY with aggregate functions (only max/min mark per course, not emails)
SELECT
  course,
  MAX(mark) AS max_mark,
  MIN(mark) AS min_mark
FROM users
GROUP BY course;

Option 2: Include emails of users who have the max or min mark per course

(PostgreSQL-specific using subqueries and JOIN)

SELECT u.course, u.email, u.mark
FROM users u
JOIN (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark
  FROM users
  GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;

โ™ฆ๏ธ This second query shows all users who have the highest or lowest mark in their course, including ties.

Here’s the updated query that includes:

  • Course name
  • Emails of users with the maximum or minimum marks
  • Their marks
  • Average mark per course
SELECT
  u.course,
  u.email,
  u.mark,
  stats.max_mark,
  stats.min_mark,
  stats.avg_mark
FROM users u
JOIN (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark
  FROM users
  GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;

Notes:
  • ROUND(AVG(mark), 2) gives the average mark rounded to two decimal places.
  • Users with the same max or min mark are all included.

Here’s the full query including:

  • Course
  • Email
  • Mark
  • Max/Min mark
  • Average mark
  • User count per course
SELECT
  u.course,
  u.email,
  u.mark,
  stats.max_mark,
  stats.min_mark,
  stats.avg_mark,
  stats.user_count
FROM users u
JOIN (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark,
    COUNT(*) AS user_count
  FROM users
  GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;

โ™ฆ๏ธ This query gives you a full breakdown of top/bottom performers per course along with stats per group.

Hereโ€™s a version that adds the rank of each user within their course based on their mark (highest mark = rank 1), along with:

  • Course
  • Email
  • Mark
  • Rank (within course)
  • Max mark, Min mark, Average mark, User count per course
WITH ranked_users AS (
  SELECT
    u.*,
    RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
  FROM users u
),
course_stats AS (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark,
    COUNT(*) AS user_count
  FROM users
  GROUP BY course
)
SELECT
  r.course,
  r.email,
  r.mark,
  r.rank_in_course,
  cs.max_mark,
  cs.min_mark,
  cs.avg_mark,
  cs.user_count
FROM ranked_users r
JOIN course_stats cs ON r.course = cs.course
ORDER BY r.course, r.rank_in_course;

Key features:
  • Users are ranked per course using RANK() (supports ties).
  • The output includes all users, not just those with max/min marks.

NOTE: Here we can see output like:

    course    |                   email                   | mark | rank_in_course | max_mark | min_mark | avg_mark | user_count
--------------+-------------------------------------------+------+----------------+----------+----------+----------+------------
 IT           | lisandra.schoen@borer-effertz.test        | 1000 |              1 |     1000 |      100 |   543.04 |        796
 IT           | leona@jaskolski-jaskolski.test            | 1000 |              1 |     1000 |      100 |   543.04 |        796
 IT           | angle@ankunding-sauer.example             |  999 |              3 |     1000 |      100 |   543.04 |        796
 IT           | drucilla_okeefe@monahan.test              |  999 |              3 |     1000 |      100 |   543.04 |        796
 algebra      | natashia.langosh@luettgen.test            | 1000 |              1 |     1000 |      100 |   541.52 |        779
 algebra      | tiffany.tremblay@bergnaum.example         | 1000 |              1 |     1000 |      100 |   541.52 |        779
 algebra      | kristeen.nikolaus@crist.example           |  999 |              3 |     1000 |      100 |   541.52 |        779
 algebra      | domenic@predovic-dare.example             |  999 |              3 |     1000 |      100 |   541.52 |        779
 algebra      | kit@oconner.example                       |  999 |              3 |     1000 |      100 |   541.52 |        779
 architecture | tierra_reilly@botsford-okuneva.test       |  997 |              1 |      997 |      100 |   549.24 |        776
 architecture | celestine_reilly@bayer.example            |  996 |              2 |      997 |      100 |   549.24 |        776
 architecture | carson@kulas.example                      |  995 |              3 |      997 |      100 |   549.24 |        776
 botany       | hassan@towne.test                         | 1000 |              1 |     1000 |      103 |   554.07 |        760
 botany       | shaunna@hudson.test                       | 1000 |              1 |     1000 |      103 |   554.07 |        760
 botany       | sanford_jacobs@johnston.example           |  999 |              3 |     1000 |      103 |   554.07 |        760
 botany       | arnulfo_cremin@ernser.example             |  999 |              3 |     1000 |      103 |   554.07 |        760

The Ranks are not consistent. To avoid this we can use DENSE_RANK().

Hereโ€™s the updated query using DENSE_RANK() instead of RANK() โ€” this avoids gaps in rank numbering when there are ties:

WITH ranked_users AS (
  SELECT
    u.*,
    DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
  FROM users u
),
course_stats AS (
  SELECT
    course,
    MAX(mark) AS max_mark,
    MIN(mark) AS min_mark,
    ROUND(AVG(mark), 2) AS avg_mark,
    COUNT(*) AS user_count
  FROM users
  GROUP BY course
)
SELECT
  r.course,
  r.email,
  r.mark,
  r.rank_in_course,
  cs.max_mark,
  cs.min_mark,
  cs.avg_mark,
  cs.user_count
FROM ranked_users r
JOIN course_stats cs ON r.course = cs.course
WHERE r.rank_in_course <= 3
ORDER BY r.course, r.rank_in_course;

DENSE_RANK difference:
  • If 2 users tie for 1st place, the next gets rank 2 (not 3 like with RANK).
  • Ensures consistent top-N output when ties are frequent.

๐Ÿ”ฅ Boom, Bonus: To export the query result as a CSV file in PostgreSQL, you can use the \copy command in psql (PostgreSQLโ€™s CLI), like this:

๐Ÿงพ Export Top 3 Students per Course to CSV

\copy (
  WITH ranked_users AS (
    SELECT
      u.*,
      DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
    FROM users u
  ),
  course_stats AS (
    SELECT
      course,
      MAX(mark) AS max_mark,
      MIN(mark) AS min_mark,
      ROUND(AVG(mark), 2) AS avg_mark,
      COUNT(*) AS user_count
    FROM users
    GROUP BY course
  )
  SELECT
    r.course,
    r.email,
    r.mark,
    r.rank_in_course,
    cs.max_mark,
    cs.min_mark,
    cs.avg_mark,
    cs.user_count
  FROM ranked_users r
  JOIN course_stats cs ON r.course = cs.course
  WHERE r.rank_in_course <= 3
  ORDER BY r.course, r.rank_in_course
) TO 'top_students_per_course.csv' WITH CSV HEADER;

โœ… Requirements:
  • Run this in the psql shell.
  • The file top_students_per_course.csv will be saved in your local working directory (where psql was started).
  • Make sure PostgreSQL has write permissions to that directory.


๐Ÿ‘‰ 10. LIKE, %, _ โ€“ Pattern Matching in SQL

These are used to filter text using wildcards:

  • % = matches any sequence of characters (0 or more)
  • _ = matches exactly one character

๐Ÿ”น Basic LIKE Queries

Example 1: Usernames starting with “admin”
SELECT * FROM users 
WHERE username LIKE 'admin%';

Example 2: Usernames ending with “bot”
SELECT * FROM users 
WHERE username LIKE '%bot';

Example 3: Usernames containing “test”
SELECT * FROM users 
WHERE username LIKE '%test%';

๐Ÿ”น _ Single-character Wildcard

Example 4: 5-character usernames
SELECT * FROM users 
WHERE username LIKE '_____';

(Each _ stands for one character.)

Example 5: Emails starting with any single letter + “ohn” (e.g., “john”, “kohn”)
SELECT * FROM users 
WHERE username LIKE '_ohn';

Combined Queries with LIKE, %, _

๐Ÿ”น Example 6: Users whose username contains “test” and email ends with “gmail.com”
SELECT * FROM users 
WHERE username LIKE '%test%' AND email LIKE '%@gmail.com';

๐Ÿ”น Example 7: Users with 3-character usernames and missing email
SELECT * FROM users 
WHERE username LIKE '___' AND email IS NULL;

๐Ÿ”น Example 8: Users with usernames that start with “a” or end with “x” and have a mobile number
SELECT * FROM users 
WHERE (username LIKE 'a%' OR username LIKE '%x') AND mobile IS NOT NULL;


๐Ÿ‘‰ 11. IN, NOT IN, BETWEEN โ€“ Set & Range Filters

These are used to filter based on a list of values (IN) or a range (BETWEEN).

๐Ÿ”น 1. IN โ€“ Match any of the listed values

SELECT * FROM users 
WHERE username IN ('admin', 'test_user', 'john_doe');

๐Ÿ”น 2. NOT IN โ€“ Exclude listed values

SELECT * FROM users 
WHERE username NOT IN ('admin', 'test_user');

๐Ÿ”น 3. BETWEEN โ€“ Match within a range (inclusive)

SELECT * FROM users 
WHERE id BETWEEN 100 AND 200;

โ™ฆ๏ธ Equivalent to: id >= 100 AND id <= 200

Combined Queries

๐Ÿ”น Example 1: Users with username in a list and id between 1 and 500
SELECT * FROM users 
WHERE username IN ('alice', 'bob', 'carol') 
  AND id BETWEEN 1 AND 500;

๐Ÿ”น Example 2: Exclude system users and select a range of IDs
SELECT id, username FROM users 
WHERE username NOT IN ('admin', 'system') 
  AND id BETWEEN 1000 AND 2000;

๐Ÿ”น Example 3: Top 5 users whose email domains are in a specific list
SELECT * FROM users 
WHERE SPLIT_PART(email, '@', 2) IN ('gmail.com', 'yahoo.com', 'hotmail.com')
ORDER BY id
LIMIT 5;


๐Ÿ‘‰ 12. SQL Aliases โ€“ Renaming Columns or Tables Temporarily

Aliases help improve readability, especially in joins or when using functions.

๐Ÿ”น 1. Column Aliases

Use AS (optional keyword) to rename a column in the result.

Example 1: Rename username to user_name
SELECT username AS user_name, email AS user_email 
FROM users;

You can also omit AS:

SELECT username user_name, email user_email 
FROM users;

๐Ÿ”น 2. Table Aliases

Assign a short name to a table (very useful in joins).

Example 2: Simple alias for table
SELECT u.username, u.email 
FROM users u 
WHERE u.email LIKE '%@gmail.com';

๐Ÿ”น 3. Alias with functions

SELECT COUNT(*) AS total_users, MAX(id) AS latest_id 
FROM users;

Combined Query with Aliases

๐Ÿ”น Example 4: Count Gmail users, alias result and filter
SELECT 
  COUNT(*) AS gmail_users 
FROM users u 
WHERE u.email LIKE '%@gmail.com';

๐Ÿ”น Example 5: List usernames with shortened table name and domain extracted
SELECT 
  u.username AS name, 
  SPLIT_PART(u.email, '@', 2) AS domain 
FROM users u 
WHERE u.email IS NOT NULL 
ORDER BY u.username
LIMIT 10;