In one of the last commits I added a SQL Tokenizer which understands the basic tokens of (My)SQL:

  • Strings
  • Literals
  • Numbers
  • Comments
  • Operators

With this basic understanding we can normalize Queries and build statistics over similar queries.

The idea is simple and already implemented in mysqldumpslow:

  /* login.php:37 */SELECT * FROM tbl WHERE id = 1243 AND name = "jan"

is turned into

  SELECT * FROM `tbl` WHERE `id` = ? AND `name` = ?

The queries look like prepared statements now and can be used the characterize queries of the same kind.

  • comments are removed
  • whitespaces are stripped
  • literals are quoted
  • constants are replaced with ?

Taking the famous world-db and executing some simple queries like:

root@127.0.0.1:4040 [world]> select * from City where Id = 1;
+----+-------+-------------+----------+------------+
| ID | Name  | CountryCode | District | Population |
+----+-------+-------------+----------+------------+
|  1 | Kabul | AFG         | Kabol    |    1780000 |
+----+-------+-------------+----------+------------+

Now you can fetch the stats from the proxy:

root@127.0.0.1:4040 [world]> select * from histogram.tables;
+------------+-------+--------+
| table      | reads | writes |
+------------+-------+--------+
| world.City |     5 |      0 |
+------------+-------+--------+
1 row in set (0.00 sec)

root@127.0.0.1:4040 [world]> select * from histogram.queries;
+--------------------------------------+-------+----------------+----------------+
| query                                | count | max_query_time | avg_query_time |
+--------------------------------------+-------+----------------+----------------+
| SELECT DATABASE ( )                  |     1 |            432 |            432 |
| SHOW `tables`                        |     1 |            589 |            589 |
| SELECT * FROM `City` WHERE `Id` = ?  |     5 |         147410 |        30063.6 |
+--------------------------------------+-------+----------------+----------------+

max_query_time and avg_query_time are both in micro-seconds.

With this data you can now start analyzing the Queries which took most of the time or try to find the tables which would make sense to cache, because they have a high read/write ratio.

Internal SQL

There are some more statements already implemented to maintain those stat-counters:

  • delete from histogram.tables to clean the stats
  • set GLOBAL histogram.tables = 0 to disable the tracing

All those statements are passed to the proxy and answered by it directly.

Next steps

As side-effect of this query-stats I started to write some lua-libraries which will make handling the packets a lot easier. But that's for another article.


Comments

Enable javascript to load comments.