Joins and Aggregation of Records

·

4 min read

Joins and Aggregation of Records

Hi folks, welcome back to my PostgreSQL notebook.

Joins

  • Produces values by merging rows from different related tables together

  • Use a JOIN when you’re asked to find data involving multiple resources.

Aggregation

  • Looks at many rows and calculates a single value

  • Words like ‘Most’, ‘Average’ and ‘Least’ are a sign that you need to use aggregation.

Joining Data From Different Tables

Get back to the social media database.

Look at these 2 example tables

Comments

idcontentsuser_idphoto_id
1et sit abc43
2sdaf123312354

Users

idusername
4Gerad Morgan
5Fred Hoge

Let’s say, now I want to query all the comment contents associated with the user.

So we need data from both users comments tables.

Specifically, the username column from Users table, and the contents column from the Comments table.

That’s where JOIN comes into play.

Let’s write this query

SELECT contents, username
FROM comments
JOIN users 
ON users.id = comments.user_id;

Here we want to select two columns, which are contents and username

From the data sources comments and users tables.

From the comments table, it has a reference foreign key back to the record of the user who created this particular comment, it is the user_id

We JOIN them together on the condition users.id = comments.user_id

Quick Notes on JOIN

  • Table order between FROM and JOIN frequently makes a difference.

  • We MUST give context if column names collide

  • Tables can be renamed using the AS keyword

  • There are a few kinds of joins!

4 Kinds Of Joins

Inner Join

Left Outer Join

Right Outer Join

Full Join

Image Source: Stephen Grider


Grouping

  • Reduces many rows down to fewer rows.

  • Done by using GROUP BY keyword

  • Visualizing the result is the key to use

Aggregates

  • Reduces many values down to one

  • Done by using aggregate functions

Understanding Group By

For instance, we have this query

SELECT user_id 
FROM comments
GROUP BY user_id;

What is the problem here?

We can have multiple comments created by a single user.

So why does each user appear only once in the query result?

Here is what GROUP BY user_id did

  • Find the set of all UNIQUE user_id’s

  • Take each row and assign it to a group based on its user_id

  • One caveat: We can only select the grouped by column, If we attempt to select any other column, we will get an error.

Aggregate Functions

  • COUNT() → Returns the number of values in a group of values

  • SUM() → Find the sum of a group of numbers

  • AVG() → Find the average of a group of numbers

  • MIN() → Returns the minimum value from a group of numbers

  • MAX() → Returns the maximum value from a group of numbers

Let’s put grouping and aggregating together

Let’s say we want to calculate the total number of comments a user has created.

SELECT user_id, COUNT(id) AS num_comments_created
FROM comments
GROUP BY user_id

Gotcha with COUNT

Whenever you do a COUNT on a column, null values are not counted.

In case you want to count a total number of rows (records) and not care about is it has a null value or not.

Just simply do

SELECT COUNT(*) FROM photos;

Filtering Methods

  • FROMSpecifies starting set of rows to work with.

  • JOIN → Merges in data from additional tables

  • WHERE → Filters the set of ROWS

  • GROUP BY → Groups rows by a UNIQUE set of values

  • HAVING → Filters the set of GROUPS

Reference sources: Stephen Grider.