Date Tags mysql

When you ask someone how to optimize SQL queries you will always get the answer

  • enable the slow query log
  • set the long-query-time to 1-2 seconds
  • enable the logging of query which aren't using an index
  • run EXPLAIN on the queries that are shown and optimize them with proper indexes

But how to understand what the EXPLAIN is telling you ?

I picked a query from the ORDER BY RAND article as they became interesting enough to show the different aspects of of a query. They contain:

  • Sub-Queries
  • Unions
  • Special cases
  • JOINs
  • ORDER BY + LIMIT

If you want to understand how this query was developed, check the original article about ORDER BY RAND .

> EXPLAIN EXTENDED
(<b>SELECT r1.name FROM random AS r1
   JOIN (SELECT (RAND() *
                (SELECT MAX(id) FROM random)) AS id
        ) AS r2
  WHERE r1.id >= r2.id
  ORDER BY r1.id ASC
  LIMIT 1</b>)
UNION ALL
(<b>SELECT r1.name FROM random AS r1
   JOIN (SELECT (RAND() *
                (SELECT MAX(id) FROM random)) AS id
        ) AS r2
  WHERE r1.id >= r2.id
  ORDER BY r1.id ASC
  LIMIT 1</b>);

+----+--------------+------------+--------+---------------+---------+---------+------+---------+----------+------------------------------+
| id | select_type  | table      | type   | possible_keys | key     | key_len | ref  | rows    | filtered | Extra                        |
+----+--------------+------------+--------+---------------+---------+---------+------+---------+----------+------------------------------+
|  1 | PRIMARY      | <derived2> | system | NULL          | NULL    | NULL    | NULL |       1 |   100.00 | Using filesort               |
|  1 | PRIMARY      | r1         | ALL    | PRIMARY       | NULL    | NULL    | NULL | 1000000 |   100.00 | Using where                  |
|  2 | DERIVED      | NULL       | NULL   | NULL          | NULL    | NULL    | NULL |    NULL |     NULL | No tables used               |
|  3 | SUBQUERY     | NULL       | NULL   | NULL          | NULL    | NULL    | NULL |    NULL |     NULL | Select tables optimized away |
|  4 | UNION        | <derived5> | system | NULL          | NULL    | NULL    | NULL |       1 |   100.00 |                              |
|  4 | UNION        | r1         | range  | PRIMARY       | PRIMARY | 4       | NULL |  153726 |   100.00 | Using where                  |
|  5 | DERIVED      | NULL       | NULL   | NULL          | NULL    | NULL    | NULL |    NULL |     NULL | No tables used               |
|  6 | SUBQUERY     | NULL       | NULL   | NULL          | NULL    | NULL    | NULL |    NULL |     NULL | Select tables optimized away |
| NULL | UNION RESULT | <union1,4> | ALL  | NULL          | NULL    | NULL    | NULL |    NULL |     NULL |                              |
+----+--------------+------------+--------+---------------+---------+---------+------+---------+----------+------------------------------+

The query is made up of two SELECTs in a UNION. The UNION RESULT is made up of the union of id 1 and 4 (union1,4) as you see in the last line.

id 1 is the JOIN between the temp-table derived2 and random. As long as the id stays the same the query is usually part of a JOIN. derived2 is the result of id 2 (DERIVED) with its SUBQUERY of id 3.

We used some nice tricks to get the optimizer to the idea that he doesn't need to read any data from the disk to answer some parts of the query:

  • MAX(id) can be optimized away as it is the last value in the index. That's a single lookup and needs no sorting, grouping, ... (id 3 and 6)
  • we splitted the calculation of RAND() * MAX(id) into a sub-query to make sure that the MAX() optimization really can take place (id 2 and 5)
  • we used ORDER BY + LIMIT to get the optimizer into a INDEX READ and stop after seeing one row. (id 1 and 4)

the costs

With a check on the SHOW STATUS counters we can prove our theory:

> FLUSH STATUS;
> SELECT ...;
> SHOW SESSION STATUS;

The FLUSH STATUS is resetting the counters for the current session, the SHOW SESSION STATUS gives us some counters:

| Select_full_join                  | 0         |
| Select_full_range_join            | 0         |
| Select_range                      | 2         |
| Select_range_check                | 0         |
| Select_scan                       | 1         |

We have 2 range scans (the ORDER BY + LIMIT), one for each side of our UNION and 1 table scan to read the result of the UNION and send it up to the client.

To prove that there is no sorting involved we check the Sort-counters:

| Sort_merge_passes                 | 0         |
| Sort_range                        | 0         |
| Sort_rows                         | 0         |
| Sort_scan                         | 0         |

The table we test on has 1mio rows, but we only read a few lines from it:

## our temp tables
| Created_tmp_tables                | 3         | 2 derived tables + the UNION RESULT
## we wrote only to the temp-tables here
| Handler_write                     | 4         | (1 rows + 1 rows) for the derived tables + 2 rows for the UNION result
## the MAX(id) did a Index-lookup for the _last_ id
| Handler_read_first                | 2         | 
## ... and we looked up one row by the random-id in the JOIN
| Handler_read_key                  | 2         |
| Handler_read_next                 | 0         |
| Handler_read_prev                 | 0         |
| Handler_read_rnd                  | 0         |
## all the reads from the temp-tables where not indexed
| Handler_read_rnd_next             | 5         |
## the virtual cost is ... foobars
| Last_query_cost                   | 10.499000 |

using the old ORDER BY RAND

Just for fun we look at the classic query again:

> FLUSH STATUS;
> SELECT name FROM random ORDER BY RAND() LIMIT 1;
> SHOW SESSION STATUS;
> EXPLAIN SELECT name FROM random ORDER BY RAND() LIMIT 1;
+----+-------------+--------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows    | Extra                           |
+----+-------------+--------+------+---------------+------+---------+------+---------+---------------------------------+
|  1 | SIMPLE      | random | ALL  | NULL          | NULL | NULL    | NULL | 1000000 | Using temporary; Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+---------+---------------------------------+

The query is executed as. ALL rows are sorted in a temp-table and ONE row is returned.

## the cost is a bit higher :)
| Last_query_cost                   | 210744.186500 |
## we only have one temp-table which got some more rows
| Created_tmp_tables                | 1             |
| Handler_write                     | 1349207       |
## we executed one BIG table-scan to fill the temp-table
| Select_scan                       | 1             |
| Handler_read_rnd_next             | 2349209       |
## we really had to sort
| Sort_merge_passes                 | 19            |
| Sort_range                        | 0             |
| Sort_rows                         | 2             |
| Sort_scan                         | 1             |

Comments

Enable javascript to load comments.