Grouping Data


SQL is a very flexible language and what of the things it allows you to do is grouping data. To retrieve the total of all Population column values you can use this SQL expression:

SELECT SUM(Population)
FROM Countries

The SUM is an aggregate function which returns the sum of all values for the column argument. The SUM function works only with numeric columns. What would you do if you want to select the total Population for each Continent in our Countries table? Would something like this work:

SELECT Continent, SUM(Population)
FROM Countries

The answer is no, you will get an error. The SQL engine will return an error saying that you are missing a “GROUP BY” for the Continent column. In order to fix this problem we will add a GROUP BY at the end of the statement above:

SELECT Continent, SUM(Population)
FROM Countries
GROUP BY Continent

Now you will get the following result:

Continent Population
Europe 118,742,662
North America 438,992,672

By adding the GROUP BY clause at the end of our statement, we instructed the SQL engine to apply any aggregate functions (in our case the SUM) for each Continent, and we are getting one record per Continent with the sum of the Population for only this Continent.

There are other aggregate functions we can use with GROUP BY like AVG, MIN, MAX and COUNT.

The AVG function gives you the average value for a numeric column:

SELECT Continent, AVG(Population)
FROM Countries
GROUP BY Continent

The MIN function gives you the minimum value for a numeric column:

SELECT Continent, MIN(Population)
FROM Countries
GROUP BY Continent

The MAX function gives you the maximum value for a numeric column:

SELECT Continent, MAX(Population)
FROM Countries
GROUP BY Continent

The COUNT function returns the number of items in a group. For example to count how many countries do we have for each continent in our Countries table, we can do the following:

SELECT Continent, COUNT(*)
FROM Countries
GROUP BY Continent