Hi folks, welcome back to my PostgreSQL notebook.
Joins
Produces values by merging rows from different related tables together
Use a JOIN when you’re asked to find data involving multiple resources.
Aggregation
Looks at many rows and calculates a single value
Words like ‘Most’, ‘Average’ and ‘Least’ are a sign that you need to use aggregation.
Joining Data From Different Tables
Get back to the social media database.
Look at these 2 example tables
Comments
id | contents | user_id | photo_id |
1 | et sit abc | 4 | 3 |
2 | sdaf1233123 | 5 | 4 |
Users
id | username |
4 | Gerad Morgan |
5 | Fred Hoge |
Let’s say, now I want to query all the comment contents associated with the user.
So we need data from both users
comments
tables.
Specifically, the username
column from Users
table, and the contents
column from the Comments
table.
That’s where JOIN comes into play.
Let’s write this query
SELECT contents, username
FROM comments
JOIN users
ON users.id = comments.user_id;
Here we want to select two columns, which are contents
and username
From the data sources comments
and users
tables.
From the comments
table, it has a reference foreign key
back to the record of the user who created this particular comment, it is the user_id
We JOIN
them together on the condition users.id = comments.user_id
Quick Notes on JOIN
Table order between
FROM
andJOIN
frequently makes a difference.We MUST give context if column names collide
Tables can be renamed using the
AS
keywordThere are a few kinds of joins!
4 Kinds Of Joins
Inner Join
Left Outer Join
Right Outer Join
Full Join
Image Source: Stephen Grider
Grouping
Reduces many rows down to fewer rows.
Done by using
GROUP BY
keywordVisualizing the result is the key to use
Aggregates
Reduces many values down to one
Done by using
aggregate
functions
Understanding Group By
For instance, we have this query
SELECT user_id
FROM comments
GROUP BY user_id;
What is the problem here?
We can have multiple comments created by a single user.
So why does each user appear only once in the query result?
Here is what GROUP BY user_id
did
Find the set of all UNIQUE user_id’s
Take each row and assign it to a group based on its user_id
One caveat: We can only select the grouped by column, If we attempt to select any other column, we will get an error.
Aggregate Functions
COUNT() → Returns the number of values in a group of values
SUM() → Find the sum of a group of numbers
AVG() → Find the average of a group of numbers
MIN() → Returns the minimum value from a group of numbers
MAX() → Returns the maximum value from a group of numbers
Let’s put grouping and aggregating together
Let’s say we want to calculate the total number of comments a user has created.
SELECT user_id, COUNT(id) AS num_comments_created
FROM comments
GROUP BY user_id
Gotcha with COUNT
Whenever you do a COUNT on a column, null
values are not counted.
In case you want to count a total number of rows (records) and not care about is it has a null value or not.
Just simply do
SELECT COUNT(*) FROM photos;
Filtering Methods
FROM → Specifies starting set of rows to work with.
JOIN → Merges in data from additional tables
WHERE → Filters the set of ROWS
GROUP BY → Groups rows by a UNIQUE set of values
HAVING → Filters the set of GROUPS
Reference sources: Stephen Grider.