1. Add the test db and users table: https://railsdrop.com/2025/04/25/rails-8-app-postgresql-heap-vs-mysql-innodb-b-tree-indexing/
2. Add fake data into the table: https://railsdrop.com/2025/04/29/rails-8-app-postgresql-faker-extensions-for-rails/
Start Learn (Premium): https://railsdrop.com/sql-postgresql-queries-bitmap-seq-index-scan-db-clustering/
๐ Summary of all queries
Check: https://railsdrop.com/postgresql-queries-a-summary/
Read – Composite vs Individual indexes (Premium): https://railsdrop.com/sql-postgres-understanding-current-composite-index/
Read Create 1 million sample users: https://railsdrop.com/sql-postgresql-create-1-million-sample-users-data/
๐ 1. SELECT โ Basic Query
๐น 1. Select all columns:
SELECT * FROM users;
This gives you every row and every column in the users table.
๐น 2. Select specific columns:
SELECT username, email FROM users;
This limits the output to only username and email.
๐ 2. ALTER โ Modify Table Structure
The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.
๐น Example 1: Add a new column
Letโs add a column created_at of type timestamp:
ALTER TABLE users
ADD COLUMN created_at timestamp;
๐น Example 2: Rename a column
Letโs rename phone_number to mobile:
ALTER TABLE users
RENAME COLUMN phone_number TO mobile;
๐น Example 3: Drop a column
Letโs say you want to remove the created_at column:
ALTER TABLE users
DROP COLUMN created_at;
๐น 4. Modify specific columns:
UPDATE users
SET mobile = '123456'
WHERE mobile IS NULL;
- Use
UPDATEinstead ofALTERwhen modifying data in a table. ALTERis used for changing the structure/schema of a table (e.g., adding columns), not for updating data.
๐ 3. DISTINCT โ Remove Duplicate Values
The DISTINCT keyword is used to return only unique (non-duplicate) values in a result set.
๐น Example 1: Distinct usernames
SELECT DISTINCT username FROM users;
This returns a list of unique usernames, removing any duplicates.
๐น Example 2: Distinct combinations of username and email
SELECT DISTINCT username, email FROM users;
SELECT DISTINCT username from users WHERE username like '%quin%';
EXPLAIN ANALYSE SELECT DISTINCT username from users WHERE username like '%quin%';
This checks for uniqueness based on both username and email combined.
๐ 4. WHERE โ Filter Records + Major Combine Types (AND, OR, NOT)
The WHERE clause is used to filter records that meet a certain condition.
Letโs look at basic and combined conditions using our users table.
๐น Example 1: Simple WHERE
SELECT * FROM users WHERE username = 'john_doe';
๐น Example 2: AND โ Combine multiple conditions (all must be true)
SELECT * FROM users
WHERE username = 'quinton' AND email LIKE '%@gmail.com';
๐น Example 3: OR โ At least one condition must be true
SELECT * FROM users
WHERE username = 'quinton' OR username = 'joaquin_hand';
๐น Example 4: NOT โ Negate a condition
SELECT * FROM users
WHERE NOT email LIKE '%@example.com';
๐น Example 5: Combine AND, OR, NOT (use parentheses!)
SELECT * FROM users
WHERE (email like '%example%' OR email like '%test%')
AND NOT username = 'admin';
๐ 5. ORDER BY โ Sort the Results
And weโll also look at combined queries afterward.
๐น Example 1: Order by a single column (ascending)
SELECT * FROM users
ORDER BY username;
๐น Example 2: Order by a column (descending)
SELECT * FROM users
ORDER BY email DESC;
๐น Example 3: Order by multiple columns
SELECT * FROM users
ORDER BY username ASC, email DESC;
๐ 6. Combined Queries (UNION, INTERSECT, EXCEPT)
โ
These allow you to combine results from multiple SELECT statements.
โ Requirements:
- Each query must return the same number of columns.
- Data types must be compatible.
๐น UNION โ Combine results and remove duplicates
SELECT username FROM users WHERE email LIKE '%@example.com'
UNION
SELECT username FROM users WHERE username LIKE 'ton%';
๐น UNION ALL โ Combine results and keep duplicates
SELECT username FROM users WHERE email LIKE '%@gmail.com'
UNION ALL
SELECT username FROM users WHERE username LIKE 'test%';
๐น INTERSECT โ Return only common results
SELECT username FROM users
WHERE email LIKE '%@gmail.com'
INTERSECT
SELECT username FROM users
WHERE username LIKE 'test%';
SELECT username FROM users
WHERE (email like '%example' OR email like '%test')
INTERSECT
SELECT username FROM users
WHERE username like 'adam';
๐น EXCEPT โ Return results from the first query that are not in the second
SELECT username FROM users
WHERE email LIKE '%@example'
EXCEPT
SELECT username FROM users
WHERE (username like '%ada%' OR username like '%merlin%');
๐ 7. IS NULL and IS NOT NULL โ Handling Missing Data
These are used to check if a column contains a NULL value (i.e., no value).
๐น Example 1: Users with a missing/have an email
# Find users with a missing email
SELECT * FROM users
WHERE email IS NULL;
# Find
SELECT * FROM users
WHERE email IS NOT NULL;
๐น Example 2: Users with no email and no mobile
SELECT * FROM users
WHERE email IS NULL AND phone_number IS NULL;
๐น Example 3: Users with either email or mobile missing
SELECT * FROM users
WHERE email IS NULL OR phone_number IS NULL;
๐น Example 4: Users who have an email and username starts with ‘adam’
SELECT * FROM users
WHERE email IS NOT NULL AND username LIKE 'adam%';
๐น Example 5: Users with email missing but username is not empty
SELECT * FROM users
WHERE email IS NULL AND username IS NOT NULL;
๐น Example 6: Users where email or mobile is null, but not both (exclusive or)
SELECT * FROM users
WHERE (email IS NULL AND mobile IS NOT NULL)
OR (email IS NOT NULL AND mobile IS NULL);
๐ 8. LIMIT, SELECT TOP, SELECT TOP PERCENT (PostgreSQL-style)
In PostgreSQL, we use LIMIT instead of SELECT TOP.
(PostgreSQL doesn’t support TOP directly like SQL Server.)
๐น Example 1: Limit number of results (first 10 rows)
SELECT * FROM users
LIMIT 10;
๐น Example 2: Combined with ORDER BY (top 5 newest usernames)
SELECT username FROM users
WHERE username IS NOT NULL
ORDER BY id DESC
LIMIT 5;
๐น Example 3: Paginate (e.g., 11th to 20th row)
SELECT * FROM users
ORDER BY id
OFFSET 10 LIMIT 10;
๐ Simulating SELECT TOP and SELECT TOP PERCENT in PostgreSQL
๐น Example 4: Simulate SELECT TOP 1
SELECT * FROM users
ORDER BY id
LIMIT 1;
๐น Example 5: Simulate SELECT TOP 10 PERCENT
To get the top 10% of users by id, you can use a subquery:
SELECT * FROM users
ORDER BY id
LIMIT (SELECT CEIL(COUNT(*) * 0.10) FROM users);
๐น Example 6: Users with Gmail or Yahoo emails, ordered by ID, limit 5
SELECT id, username, email FROM users
WHERE email LIKE '%@gmail.com' OR email LIKE '%@yahoo.com'
AND username IS NOT NULL
ORDER BY id ASC
LIMIT 5;
Note: Without parentheses, AND has higher precedence than OR.
๐น Better version with correct logic:
SELECT id, username, email FROM users
WHERE (email LIKE '%@gmail.com' OR email LIKE '%@yahoo.com')
AND username IS NOT NULL
ORDER BY id ASC
LIMIT 5;
๐ 9. Aggregation Functions: MIN, MAX, COUNT, AVG, SUM
These functions help you perform calculations on column values.
๐น 1. COUNT โ Number of rows
SELECT COUNT(*) FROM users;
โ๏ธ Total number of users.
SELECT COUNT(email) FROM users WHERE email IS NOT NULL;
โ๏ธ Count of users who have an email.
๐น 2. MIN and MAX โ Smallest and largest values
SELECT MIN(id) AS first_user, MAX(id) AS last_user FROM users;
๐น 3. AVG โ Average (only on numeric fields)
Assuming id is somewhat sequential, we can do:
SELECT AVG(id) AS avg_id FROM users;
๐น 4. SUM โ Total (again, only on numeric fields)
SELECT SUM(id) AS total_ids FROM users WHERE id < 1000;
Combined Queries with Aggregates
๐น Example 1: Count users without email and with usernames starting with ‘test’
SELECT COUNT(*) FROM users
WHERE email IS NULL AND username LIKE 'test%';
๐น Example 2: Get min, max, avg ID of users with Gmail addresses
SELECT
MIN(id) AS min_id,
MAX(id) AS max_id,
AVG(id) AS avg_id
FROM users
WHERE email LIKE '%@gmail.com';
๐น Example 3: Count how many users per email domain
SELECT
SPLIT_PART(email, '@', 2) AS domain,
COUNT(*) AS total_users
FROM users
WHERE email IS NOT NULL
GROUP BY domain
ORDER BY total_users DESC
LIMIT 5;
โฆ๏ธ This query breaks email at the @ to group by domain like gmail.com, yahoo.com.
GROUP BY Course
Hereโs the SQL query to get the maximum mark, minimum mark, and the email (or emails) of users grouped by each course:
Option 1: Basic GROUP BY with aggregate functions (only max/min mark per course, not emails)
SELECT
course,
MAX(mark) AS max_mark,
MIN(mark) AS min_mark
FROM users
GROUP BY course;
Option 2: Include emails of users who have the max or min mark per course
(PostgreSQL-specific using subqueries and JOIN)
SELECT u.course, u.email, u.mark
FROM users u
JOIN (
SELECT
course,
MAX(mark) AS max_mark,
MIN(mark) AS min_mark
FROM users
GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;
โฆ๏ธ This second query shows all users who have the highest or lowest mark in their course, including ties.
Here’s the updated query that includes:
- Course name
- Emails of users with the maximum or minimum marks
- Their marks
- Average mark per course
SELECT
u.course,
u.email,
u.mark,
stats.max_mark,
stats.min_mark,
stats.avg_mark
FROM users u
JOIN (
SELECT
course,
MAX(mark) AS max_mark,
MIN(mark) AS min_mark,
ROUND(AVG(mark), 2) AS avg_mark
FROM users
GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;
Notes:
ROUND(AVG(mark), 2)gives the average mark rounded to two decimal places.- Users with the same max or min mark are all included.
Here’s the full query including:
- Course
- Mark
- Max/Min mark
- Average mark
- User count per course
SELECT
u.course,
u.email,
u.mark,
stats.max_mark,
stats.min_mark,
stats.avg_mark,
stats.user_count
FROM users u
JOIN (
SELECT
course,
MAX(mark) AS max_mark,
MIN(mark) AS min_mark,
ROUND(AVG(mark), 2) AS avg_mark,
COUNT(*) AS user_count
FROM users
GROUP BY course
) stats ON u.course = stats.course AND (u.mark = stats.max_mark OR u.mark = stats.min_mark)
ORDER BY u.course, u.mark DESC;
โฆ๏ธ This query gives you a full breakdown of top/bottom performers per course along with stats per group.
Hereโs a version that adds the rank of each user within their course based on their mark (highest mark = rank 1), along with:
- Course
- Mark
- Rank (within course)
- Max mark, Min mark, Average mark, User count per course
WITH ranked_users AS (
SELECT
u.*,
RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
FROM users u
),
course_stats AS (
SELECT
course,
MAX(mark) AS max_mark,
MIN(mark) AS min_mark,
ROUND(AVG(mark), 2) AS avg_mark,
COUNT(*) AS user_count
FROM users
GROUP BY course
)
SELECT
r.course,
r.email,
r.mark,
r.rank_in_course,
cs.max_mark,
cs.min_mark,
cs.avg_mark,
cs.user_count
FROM ranked_users r
JOIN course_stats cs ON r.course = cs.course
ORDER BY r.course, r.rank_in_course;
Key features:
- Users are ranked per course using
RANK()(supports ties). - The output includes all users, not just those with max/min marks.
NOTE: Here we can see output like:
course | email | mark | rank_in_course | max_mark | min_mark | avg_mark | user_count
--------------+-------------------------------------------+------+----------------+----------+----------+----------+------------
IT | lisandra.schoen@borer-effertz.test | 1000 | 1 | 1000 | 100 | 543.04 | 796
IT | leona@jaskolski-jaskolski.test | 1000 | 1 | 1000 | 100 | 543.04 | 796
IT | angle@ankunding-sauer.example | 999 | 3 | 1000 | 100 | 543.04 | 796
IT | drucilla_okeefe@monahan.test | 999 | 3 | 1000 | 100 | 543.04 | 796
algebra | natashia.langosh@luettgen.test | 1000 | 1 | 1000 | 100 | 541.52 | 779
algebra | tiffany.tremblay@bergnaum.example | 1000 | 1 | 1000 | 100 | 541.52 | 779
algebra | kristeen.nikolaus@crist.example | 999 | 3 | 1000 | 100 | 541.52 | 779
algebra | domenic@predovic-dare.example | 999 | 3 | 1000 | 100 | 541.52 | 779
algebra | kit@oconner.example | 999 | 3 | 1000 | 100 | 541.52 | 779
architecture | tierra_reilly@botsford-okuneva.test | 997 | 1 | 997 | 100 | 549.24 | 776
architecture | celestine_reilly@bayer.example | 996 | 2 | 997 | 100 | 549.24 | 776
architecture | carson@kulas.example | 995 | 3 | 997 | 100 | 549.24 | 776
botany | hassan@towne.test | 1000 | 1 | 1000 | 103 | 554.07 | 760
botany | shaunna@hudson.test | 1000 | 1 | 1000 | 103 | 554.07 | 760
botany | sanford_jacobs@johnston.example | 999 | 3 | 1000 | 103 | 554.07 | 760
botany | arnulfo_cremin@ernser.example | 999 | 3 | 1000 | 103 | 554.07 | 760
The Ranks are not consistent. To avoid this we can use DENSE_RANK().
Hereโs the updated query using DENSE_RANK() instead of RANK() โ this avoids gaps in rank numbering when there are ties:
WITH ranked_users AS (
SELECT
u.*,
DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
FROM users u
),
course_stats AS (
SELECT
course,
MAX(mark) AS max_mark,
MIN(mark) AS min_mark,
ROUND(AVG(mark), 2) AS avg_mark,
COUNT(*) AS user_count
FROM users
GROUP BY course
)
SELECT
r.course,
r.email,
r.mark,
r.rank_in_course,
cs.max_mark,
cs.min_mark,
cs.avg_mark,
cs.user_count
FROM ranked_users r
JOIN course_stats cs ON r.course = cs.course
WHERE r.rank_in_course <= 3
ORDER BY r.course, r.rank_in_course;
DENSE_RANK difference:
- If 2 users tie for 1st place, the next gets rank 2 (not 3 like with
RANK). - Ensures consistent top-N output when ties are frequent.
๐ฅ Boom, Bonus: To export the query result as a CSV file in PostgreSQL, you can use the \copy command in psql (PostgreSQLโs CLI), like this:
๐งพ Export Top 3 Students per Course to CSV
\copy (
WITH ranked_users AS (
SELECT
u.*,
DENSE_RANK() OVER (PARTITION BY course ORDER BY mark DESC) AS rank_in_course
FROM users u
),
course_stats AS (
SELECT
course,
MAX(mark) AS max_mark,
MIN(mark) AS min_mark,
ROUND(AVG(mark), 2) AS avg_mark,
COUNT(*) AS user_count
FROM users
GROUP BY course
)
SELECT
r.course,
r.email,
r.mark,
r.rank_in_course,
cs.max_mark,
cs.min_mark,
cs.avg_mark,
cs.user_count
FROM ranked_users r
JOIN course_stats cs ON r.course = cs.course
WHERE r.rank_in_course <= 3
ORDER BY r.course, r.rank_in_course
) TO 'top_students_per_course.csv' WITH CSV HEADER;
โ Requirements:
- Run this in the
psqlshell. - The file
top_students_per_course.csvwill be saved in your local working directory (wherepsqlwas started). - Make sure PostgreSQL has write permissions to that directory.
๐ 10. LIKE, %, _ โ Pattern Matching in SQL
These are used to filter text using wildcards:
%= matches any sequence of characters (0 or more)_= matches exactly one character
๐น Basic LIKE Queries
Example 1: Usernames starting with “admin”
SELECT * FROM users
WHERE username LIKE 'admin%';
Example 2: Usernames ending with “bot”
SELECT * FROM users
WHERE username LIKE '%bot';
Example 3: Usernames containing “test”
SELECT * FROM users
WHERE username LIKE '%test%';
๐น _ Single-character Wildcard
Example 4: 5-character usernames
SELECT * FROM users
WHERE username LIKE '_____';
(Each _ stands for one character.)
Example 5: Emails starting with any single letter + “ohn” (e.g., “john”, “kohn”)
SELECT * FROM users
WHERE username LIKE '_ohn';
Combined Queries with LIKE, %, _
๐น Example 6: Users whose username contains “test” and email ends with “gmail.com”
SELECT * FROM users
WHERE username LIKE '%test%' AND email LIKE '%@gmail.com';
๐น Example 7: Users with 3-character usernames and missing email
SELECT * FROM users
WHERE username LIKE '___' AND email IS NULL;
๐น Example 8: Users with usernames that start with “a” or end with “x” and have a mobile number
SELECT * FROM users
WHERE (username LIKE 'a%' OR username LIKE '%x') AND mobile IS NOT NULL;
๐ 11. IN, NOT IN, BETWEEN โ Set & Range Filters
These are used to filter based on a list of values (IN) or a range (BETWEEN).
๐น 1. IN โ Match any of the listed values
SELECT * FROM users
WHERE username IN ('admin', 'test_user', 'john_doe');
๐น 2. NOT IN โ Exclude listed values
SELECT * FROM users
WHERE username NOT IN ('admin', 'test_user');
๐น 3. BETWEEN โ Match within a range (inclusive)
SELECT * FROM users
WHERE id BETWEEN 100 AND 200;
โฆ๏ธ Equivalent to: id >= 100 AND id <= 200
Combined Queries
๐น Example 1: Users with username in a list and id between 1 and 500
SELECT * FROM users
WHERE username IN ('alice', 'bob', 'carol')
AND id BETWEEN 1 AND 500;
๐น Example 2: Exclude system users and select a range of IDs
SELECT id, username FROM users
WHERE username NOT IN ('admin', 'system')
AND id BETWEEN 1000 AND 2000;
๐น Example 3: Top 5 users whose email domains are in a specific list
SELECT * FROM users
WHERE SPLIT_PART(email, '@', 2) IN ('gmail.com', 'yahoo.com', 'hotmail.com')
ORDER BY id
LIMIT 5;
๐ 12. SQL Aliases โ Renaming Columns or Tables Temporarily
Aliases help improve readability, especially in joins or when using functions.
๐น 1. Column Aliases
Use AS (optional keyword) to rename a column in the result.
Example 1: Rename username to user_name
SELECT username AS user_name, email AS user_email
FROM users;
You can also omit AS:
SELECT username user_name, email user_email
FROM users;
๐น 2. Table Aliases
Assign a short name to a table (very useful in joins).
Example 2: Simple alias for table
SELECT u.username, u.email
FROM users u
WHERE u.email LIKE '%@gmail.com';
๐น 3. Alias with functions
SELECT COUNT(*) AS total_users, MAX(id) AS latest_id
FROM users;
Combined Query with Aliases
๐น Example 4: Count Gmail users, alias result and filter
SELECT
COUNT(*) AS gmail_users
FROM users u
WHERE u.email LIKE '%@gmail.com';
๐น Example 5: List usernames with shortened table name and domain extracted
SELECT
u.username AS name,
SPLIT_PART(u.email, '@', 2) AS domain
FROM users u
WHERE u.email IS NOT NULL
ORDER BY u.username
LIMIT 10;
One thought on “Rails 8 App: Setup Test DB in PostgreSQL | Write SQL Queries”