groupwise max

February 15, 2007

… or “How to solve the same problem in 10 different ways”.

One of the common problems to solve in SQL is “Get row with the group-wise maximum”. Getting just the maximum for the group is simple, getting the full row which is belonging to the maximum is the interesting step.

SELECT MAX(population), continent
  FROM Country
 GROUP BY continent;

+-----------------+---------------+
| MAX(population) | continent     |
+-----------------+---------------+
|      1277558000 | Asia          |
|       146934000 | Europe        |
|       278357000 | North America |
|       111506000 | Africa        |
|        18886000 | Oceania       |
|               0 | Antarctica    |
|       170115000 | South America |
+-----------------+---------------+

We use the ‘world’ database from the MySQL manual for the examples.

The next step is to find the countries which have the population and the continent of our gathered data.

SELECT continent, name, population
  FROM Country
 WHERE population = 1277558000
   AND continent = 'Asia';

+-----------+-------+------------+
| continent | name  | population |
+-----------+-------+------------+
| Asia      | China | 1277558000 |
+-----------+-------+------------+

Instead of doing this row by row we just do a JOIN between the two by using a temporary table:

CREATE TEMPORARY TABLE co2
SELECT continent, MAX(population) AS maxpop
  FROM Country
 GROUP BY continent;

SELECT co1.continent, co1.name, co1.population
  FROM Country AS co1, co2
 WHERE co2.continent = co1.continent
   AND co1.population = co2.maxpop;
+---------------+----------------------------------------------+------------+
| continent     | name                                         | population |
+---------------+----------------------------------------------+------------+
| Oceania       | Australia                                    |   18886000 |
| South America | Brazil                                       |  170115000 |
| Asia          | China                                        | 1277558000 |
| Africa        | Nigeria                                      |  111506000 |
| Europe        | Russian Federation                           |  146934000 |
| North America | United States                                |  278357000 |
| Antarctica    | Antarctica                                   |          0 |
| Antarctica    | Bouvet Island                                |          0 |
| Antarctica    | South Georgia and the South Sandwich Islands |          0 |
| Antarctica    | Heard Island and McDonald Islands            |          0 |
| Antarctica    | French Southern territories                  |          0 |
+---------------+----------------------------------------------+------------+

DROP TEMPORARY TABLE co2;

Instead of using a temporary table as internal steps we can write the same also as simple sub-query which is creating a temporary table internally.

SELECT co1.continent, co1.name, co1.population
  FROM Country AS co1,
       (SELECT continent, MAX(population) AS maxpop
          FROM Country
         GROUP BY continent) AS co2
  WHERE co2.continent = co1.continent
        and co1.population = co2.maxpop;
+---------------+----------------------------------------------+------------+
| continent     | name                                         | population |
+---------------+----------------------------------------------+------------+
| Oceania       | Australia                                    |   18886000 |
| South America | Brazil                                       |  170115000 |
| Asia          | China                                        | 1277558000 |
| Africa        | Nigeria                                      |  111506000 |
| Europe        | Russian Federation                           |  146934000 |
| North America | United States                                |  278357000 |
| Antarctica    | Antarctica                                   |          0 |
| Antarctica    | Bouvet Island                                |          0 |
| Antarctica    | South Georgia and the South Sandwich Islands |          0 |
| Antarctica    | Heard Island and McDonald Islands            |          0 |
| Antarctica    | French Southern territories                  |          0 |
+---------------+----------------------------------------------+------------+

The sub-query is executed in the exact same way as the temporary table we created by hand. Instead of JOINing against the temporary table we JOIN against the result of the sub-query.

Hmm, this was too simple ? Let’s take a look at the alternatives:

SELECT co1.continent, co1.name, co1.population
  FROM Country AS co1
 WHERE co1.population =
       (SELECT MAX(population) AS maxpop
          FROM Country AS co2
         WHERE co2.continent = co1.continent);

To be read as: ‘Get the countries which have the same population as the maximum population of the current country’. Using such a sub-qeury results in more readable sub-queries. BUT … they a ‘DEPENDENT’ as the inner query is refering to a field of the outer query. This means that for each row of the outer query the inner query is executed.

The same query can be written in two other ways:

SELECT continent, name, population
  FROM Country
 WHERE ROW(population, continent) IN (
       SELECT MAX(population), continent
         FROM Country
        GROUP BY continent);

SELECT co1.continent, co1.name, co1.population
  FROM country as co1
 WHERE co1.population >= ALL
       (SELECT co2.population
          FROM country AS co2
         WHERE co2.continent = co1.continent);

If you don’t want to use sub-queries and prefer pure JOINs perhaps there are for you:

SELECT co1.continent, co1.name, co1.population
  FROM country AS co1 LEFT JOIN country AS co2
       ON co1.population < co2.population AND
          co1.continent = co2.continent
 WHERE co2.population is NULL;

SELECT co1.Continent, co1.Name
  FROM Country AS co1 JOIN Country AS co2
       ON co2.Continent = co1.Continent AND
          co1.Population <= co2.Population
 GROUP BY co1.Continent, co1.Name
HAVING COUNT(*) = 1

## added 2005-05-28 as no. 11, sent in by rudy@r937.com
SELECT co1.continent, co1.name
  FROM Country AS co1 JOIN Country AS co2
       ON co1.continent = co2.continent
 GROUP BY co1.continent, co1.name
HAVING co1.population = MAX(co2.population)

Now you already know 8 ways. The last two shall only give you some more ideas. First of all a way that doesn’t work (yet).

SELECT co2.continent, MAX(co2.population) AS maxpop,
       (SELECT name
          FROM Country
         WHERE population = maxpop AND
               continent = co2.continent)
  FROM Country AS co2
 GROUP BY co2.continent;
ERROR 1247 (42S22): Reference 'maxpop' not supported (reference to group function)

And as the last one the modified max-concat example from the manual.

SELECT continent,
       SUBSTRING( MAX( CONCAT(LPAD(population,10,'0'),name) ), 10+1) AS name,
       MAX( population ) AS population
  FROM Country
 GROUP BY continent;

The result is slightly different but it is more about the idea.

mysql

Comments

Enable javascript to load comments.