In one of the last commits I added a SQL Tokenizer which understands the basic tokens of (My)SQL:
- Strings
- Literals
- Numbers
- Comments
- Operators
With this basic understanding we can normalize Queries and build statistics over similar queries.
The idea is simple and already implemented in mysqldumpslow
:
/* login.php:37 */SELECT * FROM tbl WHERE id = 1243 AND name = "jan"
is turned into
SELECT * FROM `tbl` WHERE `id` = ? AND `name` = ?
The queries look like prepared statements now and can be used the characterize queries of the same kind.
- comments are removed
- whitespaces are stripped
- literals are quoted
- constants are replaced with ?
Taking the famous world-db and executing some simple queries like:
root@127.0.0.1:4040 [world]> select * from City where Id = 1;
+----+-------+-------------+----------+------------+
| ID | Name | CountryCode | District | Population |
+----+-------+-------------+----------+------------+
| 1 | Kabul | AFG | Kabol | 1780000 |
+----+-------+-------------+----------+------------+
Now you can fetch the stats from the proxy:
root@127.0.0.1:4040 [world]> select * from histogram.tables;
+------------+-------+--------+
| table | reads | writes |
+------------+-------+--------+
| world.City | 5 | 0 |
+------------+-------+--------+
1 row in set (0.00 sec)
root@127.0.0.1:4040 [world]> select * from histogram.queries;
+--------------------------------------+-------+----------------+----------------+
| query | count | max_query_time | avg_query_time |
+--------------------------------------+-------+----------------+----------------+
| SELECT DATABASE ( ) | 1 | 432 | 432 |
| SHOW `tables` | 1 | 589 | 589 |
| SELECT * FROM `City` WHERE `Id` = ? | 5 | 147410 | 30063.6 |
+--------------------------------------+-------+----------------+----------------+
max_query_time
and avg_query_time
are both in micro-seconds.
With this data you can now start analyzing the Queries which took most of the time or try to find the tables which would make sense to cache, because they have a high read/write ratio.
Internal SQL
There are some more statements already implemented to maintain those stat-counters:
delete from histogram.tables
to clean the statsset GLOBAL histogram.tables = 0
to disable the tracing
All those statements are passed to the proxy and answered by it directly.
Next steps
As side-effect of this query-stats I started to write some lua-libraries which will make handling the packets a lot easier. But that's for another article.
Comments