Understanding the Array Aggregation Function in PostgreSQL (array_agg)

PostgreSQL, also known as Postgres, is a powerful and feature-rich relational database management system. One of its notable features is the array aggregation function, array_agg, which allows you to aggregate values from multiple rows into a single array. In this blog post, we’ll explore how array_agg works, its applications, and considerations for performance.

How Does array_agg Work?

The array_agg function takes an expression as an argument and returns an array containing the values of that expression for all the rows that match the query. Let’s illustrate this with an example.

Consider a table called employees with columns id, name, and department. Suppose we want to aggregate all the names of employees belonging to the “Sales” department into an array. We can achieve this using the following query:

SELECT array_agg(name) AS sales_employees
FROM employees
WHERE department = 'Sales';

The result of this query will be a single row with a column named sales_employees, which contains an array of all the names of employees in the “Sales” department.

Usage of array_agg with Subqueries

The ability to get an array as the output opens up various possibilities, especially when used in subqueries. You can leverage this feature to aggregate data from related tables or filter results based on complex conditions.

For instance, imagine you have two tables, orders and order_items, where each order can have multiple items. You want to retrieve a list of orders along with an array of item names for each order. The following query achieves this:

SELECT o.order_id, (
  SELECT array_agg(oi.item_name)
  FROM order_items oi
  WHERE oi.order_id = o.order_id
) AS item_names
FROM orders o;

In this example, the subquery within the main query’s select list utilizes array_agg to aggregate item names from the order_items table, specific to each order.

Complex Query Example Using array_agg

To demonstrate a more complex scenario, let’s consider a database that stores books and their authors. We have three tables: books, authors, and book_authors (a join table that associates books with their respective authors).

Suppose we want to retrieve a list of books along with an array of author names for each book by alphabetical order. We can achieve this using a query that involves joins and array_agg:

SELECT b.title, array_agg(a.author_name ORDER BY a.author_name ASC) AS authors
FROM books b
JOIN book_authors ba ON b.book_id = ba.book_id
JOIN authors a ON ba.author_id = a.author_id
GROUP BY b.book_id;

In this query, we join the tables based on their relationships and use array_agg to aggregate author names into an array for each book. The GROUP BY clause ensures that each book’s array of author names is grouped correctly.

Performance Considerations

While array_agg is a powerful function, it’s essential to consider its performance implications, especially when working with large datasets. Aggregating values into arrays can be computationally intensive, and the resulting array can consume significant memory.

If you anticipate working with large result sets or complex queries involving array_agg, it’s worth optimizing your database schema, indexing relevant columns, and analyzing query performance using PostgreSQL’s built-in tools.

Additionally, consider whether array_agg is the most efficient solution for your specific use case. Sometimes, alternative approaches, such as using temporary tables or custom aggregate functions, might offer better performance.

Conclusion

The array_agg function in PostgreSQL provides a powerful mechanism for aggregating values into arrays. It offers flexibility and opens up opportunities for various applications, including subqueries and complex data manipulations. However, when working with large datasets, it’s crucial to be mindful of potential performance implications and explore optimization strategies accordingly.

PostgreSQL commands to remember

List of commands to remember using postgres DB managment system.

Login, Create user and password

# login to psql client
psql postgres # OR
psql -U postgres
create database mydb; # create db
create user abhilash with SUPERUSER CREATEDB CREATEROLE encrypted password 'abhilashPass!'; 
grant all privileges on database mydb to myuser; # add privileges

Connect to DB, List tables and users, functions, views, schema

\l # lists all the databases
\c dbname # connect to db
\dt # show tables
\d table_name # Describe a table
\dn # List available schema
\df #  List available functions
\dS [your_table_name] # List triggers
\dv # List available views
\du # lists all user accounts and roles 
\du+ # is the extended version which shows even more information.

Show history, save to file, edit using editor, execution time, help

SELECT version(); # version of psql
\g  # Execute the previous command
\s # Command history
\s filename # save Command history to a file
\i filename # Execute psql commands from a file
\? # help on psql commands
\h ALTER TABLE # To get help on specific PostgreSQL statement
\timing #  Turn on/off query execution time
\e # Edit command in your own editor
\e [function_name] # It is more useful when you edit a function in the editor. Do \df for functions
\o [file_name] # send all next query results to file
    \o out.txt
    \dt 
    \o # switch
    \dt

Change output, Quit psql

# Switch output options
\a command switches from aligned to non-aligned column output.
\H command formats the output to HTML format.
\q # quit psql

Reference: https://www.postgresqltutorial.com/postgresql-administration/psql-commands/

PostgreSQL Cheat Sheet

CREATE DATABASE

CREATE DATABASE dbName;

CREATE TABLE (with auto numbering integer id)

CREATE TABLE tableName (
 id serial PRIMARY KEY,
 name varchar(50) UNIQUE NOT NULL,
 dateCreated timestamp DEFAULT current_timestamp
);

Add a primary key

ALTER TABLE tableName ADD PRIMARY KEY (id);

Create an INDEX

CREATE UNIQUE INDEX indexName ON tableName (columnNames);

Backup a database (command line)

pg_dump dbName > dbName.sql

Backup all databases (command line)

pg_dumpall > pgbackup.sql

Run a SQL script (command line)

psql -f script.sql databaseName

Search using a regular expression

SELECT column FROM table WHERE column ~ 'foo.*';

The first N records

SELECT columns FROM table LIMIT 10;

Pagination

SELECT cols FROM table LIMIT 10 OFFSET 30;

Prepared Statements

PREPARE preparedInsert (int, varchar) AS
  INSERT INTO tableName (intColumn, charColumn) VALUES ($1, $2);
EXECUTE preparedInsert (1,'a');
EXECUTE preparedInsert (2,'b');
DEALLOCATE preparedInsert;

Create a Function

CREATE OR REPLACE FUNCTION month (timestamp) RETURNS integer 
 AS 'SELECT date_part(''month'', $1)::integer;'
LANGUAGE 'sql';

Table Maintenance

VACUUM ANALYZE table;

Reindex a database, table or index

REINDEX DATABASE dbName;

Show query plan

EXPLAIN SELECT * FROM table;

Import from a file

COPY destTable FROM '/tmp/somefile';

Show all runtime parameters

SHOW ALL;

Grant all permissions to a user

GRANT ALL PRIVILEGES ON table TO username;

Perform a transaction

BEGIN TRANSACTION 
 UPDATE accounts SET balance += 50 WHERE id = 1;
COMMIT;

Basic SQL

Get all columns and rows from a table

SELECT * FROM table;

Add a new row

INSERT INTO table (column1,column2)
VALUES (1, 'one');

Update a row

UPDATE table SET foo = 'bar' WHERE id = 1;

Delete a row

DELETE FROM table WHERE id = 1;

From: https://www.petefreitag.com/cheatsheets/postgresql/


PostgreSQL 9.3 : Installation on ubuntu 14.04

Hi guys, I just started installing postgres on my ubuntu VM. I referred some docs, and followed this one: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-14-04

Its pretty much explained in this page. But just explaining here the important things.

You can install postgres by ubuntu’s own apt packaging system. Update local apt repository.

$ sudo apt-get update
$ sudo apt-get install postgresql postgresql-contrib

Postgres uses role based access for the unix users. After the installation a default role called ‘postgres’ will be created. You can login to postgres account and start using or creating new roles with Postgres.

Sign in as postgres user

$ sudo -i -u postgres

Access the postgres console by

$ psql

But i cannot enter into the console and I got the following error:

postgres@8930a29k5d05:/home/rails/my_project$ psql
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

What could be the reason for this error?

So just gone through Postgres doc (http://www.postgresql.org/docs/9.3/static/server-start.html). You can see the same error under the section 17.3.2. Client Connection Problems. But the solution is not mentioned.

Original Reason: PostgreSQL Server was not running after the installation.

I tried rebooting the system and via init script the server should run automatically. But the server is not running again. I understood that something prevents postgres from running the server. What is it?

Just check your postgres server is running or not

$ sudo -aux | grep post
postgres@8930a29k5d05:/home/rails/my_project$ ps -aux | grep postgres
root       136  0.0  0.2  47124  3056 ?        S    06:10   0:00 sudo -u postgres -s
postgres   137  0.0  0.3  18164  3220 ?        S    06:10   0:00 /bin/bash
postgres   140  0.0  0.2  15572  2192 ?        R+   06:10   0:00 ps -aux
postgres   141  0.0  0.0   4892   336 ?        R+   06:10   0:00 grep post

The server is not running.

Run the server manually by

root@8930a29k5d05:/home/rails/my_project#  /etc/init.d/postgresql start
 * Starting PostgreSQL 9.3 database server
                                                                                                                                                         [ OK ] 
root@8930a29k5d05:/home/rails/my_project# ps aux | grep postgres
postgres   158  0.1  2.0 244928 20752 ?        S    06:28   0:00 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
postgres   160  0.0  0.3 244928  3272 ?        Ss   06:28   0:00 postgres: checkpointer process

postgres   161  0.0  0.4 244928  4176 ?        Ss   06:28   0:00 postgres: writer process

postgres   162  0.0  0.3 244928  3272 ?        Ss   06:28   0:00 postgres: wal writer process

postgres   163  0.0  0.5 245652  6000 ?        Ss   06:28   0:00 postgres: autovacuum launcher process

postgres   164  0.0  0.3 100604  3336 ?        Ss   06:28   0:00 postgres: stats collector process

root       178  0.0  0.0   8868   884 ?        S+   06:28   0:00 grep --color=auto post
root@8930a29k5d05:/home/rails/my_project#

Now the server starts running. If still not works, then try to reconfigure your locales as mentioned here

$ dpkg-reconfigure locales

It is strange that, after installing such a popular database software, it doesn’t provide any information regarding the failure of its own server. It should give the developers some clue so that they can save their precious time.

The reason of this failure, what I concluded is
1. After installation we have to run the server manually
OR
2. I tried resetting the locales (So if no locales set in the machine may prevented the postgres from starting automatically?)