MySQL Proxy

Over the last weeks I wrote a mysql-proxy which changes the way you operate with the MySQL Server.

client to server

A proxy can operate as Man in the Middle and pass through to network packets to the MySQL Server, but it also change the packets when needed. This opens the several possibilities like a

pseudo server done
injection proxy done
load-balancing proxy done
connection pool
XA Transaction Manager
replication client
replication filter
replication off-loader

Some of them are already implemented, some are only ideas for the future.

http://forge.mysql.com/wiki/MySQL_Proxy has the source and the binaries
http://forums.mysql.com/list.php?146 is the open forum on the proxy

Admin Server

The admin-server is the most basic implementation of the MySQL server protocol and can respond to some basic queries. It implements:

socket handling
the life-cycle of a connection
mysql wire-protocol
len-encoding of some fields
field-types
result-sets

While the design is based on the ideas from lighttpd in the way that it is using non-blocking network-io the network-protocol is based on the information available in the internals document from dev.mysql.com

The admin-servers implements 2 basic queries which are issued by the mysql command-line client:

select @@version_comment LIMIT 1;
select USER();

Using the admin server you can implement the functionality in a way that every mysql client (php, jdbc, odbc, perl, ...) can execute them.

We use it to export the current config and to track the open connections:

> select * from proxy_connections;
+------+--------+-------+-------+
| id   | type   | state | db    |
+------+--------+-------+-------+
|    2 | proxy  | 8     | world |
|    3 | server | 8     |       |
+------+--------+-------+-------+

and the config:

> select * from proxy_config;
+---------------------------------+----------------+
| option                          | value          |
+---------------------------------+----------------+
| admin.address                   | :4041          |
| proxy.address                   | :4040          |
| proxy.backend_addresses[0]      | 127.0.0.1:3306 |
| proxy.backend_addresses[1]      | 127.0.0.1:3307 |
| proxy.fix_bug_25371             | 0              |
| repclient.master_address        |                |
+---------------------------------+----------------+

Transparent Proxy

Based on the simple admin server a transparent proxy was implemented:

client to proxy to server

It allows a a whole range of new applications:

load balancer
fail-over
query injection (see Detecting slow Queries below)
query rewriting
on the fly bug-fixing #25371

The implementation was tested with:

the test-suite of ext/mysqlnd (using PHP5)
the test-suite of Connector/J

As the proxy sits in the connection between app and server it can:

check the query-time with milli-second resolution
check status counters
check the SHOW PROFILE
check for used indexes

If the user wants to ask for

SELECT user FROM mysql.user;

we enhance the query a bit:

SHOW SESSION STATUS;
SELECT user FROM mysql.user;
SHOW SESSION STATUS;

and only send the result-set for the SELECT back to the user. The diff'ed STATUS counters are written to a log-file:

$ mysql --port=4040 --hostname=127.0.0.1 --user=... 
$ mysql-proxy --proxy-lua-script=balance.lua --proxy-profiling

{ "command" : "Query", "timestamp" : "2007-05-22T07:39:00.141568", "query" : "SELECT MAX(population), continent   FROM Country  GROUP BY continent", "thread_id" : "378069", "exec_time" : "33.266", "tags" : [ "query-cost" ], "status" : { "Table_locks_immediate" : "1", "Bytes_received" : "111", "Com_select" : "1", "Key_reads" : "4", "Handler_read_first" : "1", "Last_query_cost" : "17.199000", "Questions" : "1", "Select_range" : "1", "Key_read_requests" : "32", "Bytes_sent" : "283", "Handler_read_key" : "14", "Key_blocks_unused" : "-4" } }

EXPLAIN for index usage statistics

But we can use EXPLAIN for something else ... Index Usage.

Up to now the MySQL doesn't expose any statistics on the index usage. Knowing which indexes are used gives an idea which indexes are maintained, but never user. Dropping them would increase the INSERT/UPDATE/DELETE speed.

root@localhost [world]> EXPLAIN SELECT * FROM City WHERE id = 1;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | City  | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+

The key column tells us the used index. By maintaining a list of used index and mapping it against the available indexes you can see which of them is used most how often.

root@127.0.0.1 [(none)]> select * from index_usage;
+-------------------------+------+------------------+------------------+
| name                    | used | max_used_key_len | avg_used_key_len |
+-------------------------+------+------------------+------------------+
| world.Country.PRIMARY   |    1 |                3 |           3.0000 |
| world.City.PRIMARY      |    1 |                4 |           4.0000 |
| world.Country.Continent |    1 |                1 |           1.0000 |
+-------------------------+------+------------------+------------------+

Or to show a real application using indexes:

root@127.0.0.1 [(none)]> select * from index_usage;
+------------------------------------------------------------------+------+------------------+------------------+
| name                                                             | used | max_used_key_len | avg_used_key_len |
+------------------------------------------------------------------+------+------------------+------------------+
| world.Country.PRIMARY                                            |    1 |                3 |           3.0000 |
| merlin.inv_agents.PRIMARY                                        |   16 |                4 |           4.0000 |
| merlin.monitor_history_times.adh_scan_time                       |    2 |                8 |           8.0000 |
| merlin.dc_schedule.dc_schedule_fk1                               |   14 |                4 |           4.0000 |
| merlin.monitors.PRIMARY                                          |    8 |                4 |           4.0000 |
| merlin.monitor_history.PRIMARY                                   |    6 |                4 |           4.0000 |
| world.City.PRIMARY                                               |    1 |                4 |           4.0000 |
| merlin.graph_variables.graph_variables_fk1                       |   45 |                4 |           4.0000 |
| merlin.graph_designs.graph_designs_idx                           |    7 |              767 |         767.0000 |
| merlin.monitor_history_times_variables.history_id_alarm_time_idx |   26 |                8 |           8.0000 |
| merlin.monitor_alarm_level.PRIMARY                               |    4 |                4 |           4.0000 |
| merlin.monitor_schedule.PRIMARY                                  |    2 |                4 |           4.0000 |
| merlin.monitor_history.monitor_history_idx                       |    4 |                9 |           9.0000 |
| world.Country.Continent                                          |    1 |                1 |           1.0000 |
| merlin.inv_servers.PRIMARY                                       |   18 |                4 |           4.0000 |
| merlin.monitor_category.PRIMARY                                  |    2 |                4 |           4.0000 |
| merlin.monitor_history.monitor_history_monitor_history_fk1       |   12 |                4 |           4.0000 |
| merlin.dc_schedule.dc_schedule_fk1,dc_schedule_fk3               |    6 |                4 |           4.0000 |
| merlin.monitor_history_times.PRIMARY                             |   18 |                4 |           4.0000 |
| merlin.dc_items.PRIMARY                                          |   14 |                4 |           4.0000 |
| merlin.inv_servers.server_name                                   |    2 |              386 |         386.0000 |
| merlin.graph_series.graph_series_fk                              |   31 |                4 |           4.0000 |
| merlin.monitor_schedule.monitor_schedule_idx                     |   15 |                8 |           8.0000 |
+------------------------------------------------------------------+------+------------------+------------------+

For UPDATE and DELETE statements you can usually rewrite them into SELECT statements to make them EXPLAINable.

Load Balancing

Load Balancing selects one backend out of a set of backends to be used as mysql-server. We use SQF (shortest queue first) to distribute the load across the backends equally. Each backend will get the same number of connections.

load balancing

$ ./mysql-proxy  \
  --proxy-backend-addresses=127.0.0.1:3306 \
  --proxy-backend-addresses=127.0.0.2:3306 \
  --proxy-backend-addresses=127.0.0.3:3306

Fail Over

Load Balancing and Fail Over are very similar. Both have to detect of a host is reachable and both have to forward connections to one of the backends. Using a different node-selection strategy you get Fail-Over:

pick the first available host from a list

As long as the first host is up, it will be used for all queries.

You can put the HA proxy next to the application and let it forward the requests to the real load balancers:

a HA LB-proxy

Latency

If you want to put a application like the proxy in the middle of your application you want sure that:

it is stable
it adds as little latency as possible

To measure the latency a small C-API based app was written which is testing:

prepared statements
plain SELECTs
large Queries

via the proxy and directly against the database. To get comparable results we used simple queries for the latency checks and verified them against larger queries. The expected latency should be nearly constant if possible:

    /* warmup run */
    test_server("127.0.0.1", 3306);

    /* real benchmarks */
    for (rounds = 0; rounds < 3; rounds++) {
      test_server("127.0.0.1", 3306);
      test_server("127.0.0.1", 4040);
    }

and each test run wrote a INSERT statement into a text-file with the test-run and its exec-time. Spikes got removed to get useful results on the averages:

+----------------+-----------------------------+--------+--------+-----+
| benchname      | testname                    | max    | avg    | run |
+----------------+-----------------------------+--------+--------+-----+
| 127.0.0.1:3306 |   mysql_execute()           |  4.082 |  0.123 | 150 |
| 127.0.0.1:3306 |   mysql_real_query(large)   | 19.195 |   7.73 | 147 |
| 127.0.0.1:3306 |   mysql_real_query(small)   |  4.087 |    0.1 | 150 |
| 127.0.0.1:3306 |   mysql_stmt_bind_result()  |  0.002 |  0.001 | 150 |
| 127.0.0.1:3306 |   mysql_stmt_fetch()        |  0.003 |  0.001 | 150 |
| 127.0.0.1:3306 |   mysql_stmt_prepare()      |  0.176 |  0.128 |  30 |
| 127.0.0.1:3306 |   mysql_store_result(large) |   0.02 |   0.01 | 150 |
| 127.0.0.1:3306 |   mysql_store_result(small) |  0.003 |  0.001 | 150 |
| 127.0.0.1:3306 |  mysql_real_connect()       |  16.22 |  1.038 |  30 |
| 127.0.0.1:3306 |  test_prep_stmt()           |  4.638 |  0.936 |  30 |
| 127.0.0.1:3306 |  test_select()              |  4.461 |  0.622 |  30 |
| 127.0.0.1:3306 |  test_select_merlin()       | 53.883 | 38.561 |  27 |
| 127.0.0.1:3306 | benchmark done              | 56.824 | 41.361 |  27 |
| 127.0.0.1:4040 |   mysql_execute()           |  4.232 |   0.15 | 150 |
| 127.0.0.1:4040 |   mysql_real_query(large)   | 15.182 |  7.626 | 147 |
| 127.0.0.1:4040 |   mysql_real_query(small)   |  4.191 |  0.184 | 150 |
| 127.0.0.1:4040 |   mysql_stmt_bind_result()  |  0.002 |  0.001 | 150 |
| 127.0.0.1:4040 |   mysql_stmt_fetch()        |  0.003 |  0.001 | 150 |
| 127.0.0.1:4040 |   mysql_stmt_prepare()      |  0.275 |  0.185 |  30 |
| 127.0.0.1:4040 |   mysql_store_result(large) |  0.017 |   0.01 | 150 |
| 127.0.0.1:4040 |   mysql_store_result(small) |  0.003 |  0.001 | 150 |
| 127.0.0.1:4040 |  mysql_real_connect()       |  4.549 |  0.769 |  30 |
| 127.0.0.1:4040 |  test_prep_stmt()           |  5.053 |  1.131 |  30 |
| 127.0.0.1:4040 |  test_select()              |  4.769 |  1.042 |  30 |
| 127.0.0.1:4040 |  test_select_merlin()       | 57.647 | 39.239 |  29 |
| 127.0.0.1:4040 | benchmark done              | 59.558 | 42.265 |  29 |
+----------------+-----------------------------+--------+--------+-----+

If you compare the benchmark results:

test_prep_stmt()       0.936 vs.  1.131 ms (+ .2ms)
test_select()          0.622 vs.  1.042 ms (+ .4ms)
test_select_merlin()  38.561 vs. 39.239 ms (+ .7ms)
benchmark done:       41.361 vs. 42.265 ms (+ .9ms)

The was measured with all profiling options in the proxy disable. This is the setup you would use the load balancing, fail-over, ...

Delay Adaptive Balancing

partially implemented The state of the slaves is checked every few seconds to track the availability and the delay of the slaves. If the slave is unreachable or far behind it is taken out of the proxy-pool.

There are two ways to check the delay of the slave:

check SHOW SLAVE STATUS for Seconds_behind_master
send a ping into the replication tree

The ping is a UPDATE against a MEMORY table on the master and a SELECT on the slave to see if the ID is known:

master> CREATE TABLE ping ( id INT ) ENGINE=MEMORY;
master> UPDATE ping SET @id := id = id+1;
master> SELECT @id;

As we use connection-vars we can do this without running into a race-condition. On the slave we check for:

slave> SELECT id FROM ping WHERE id >= ?;

This is an simple check if the slave is up to date or not. If the update is executed it the end of each transaction we can track the progress of the replication based on the transaction level instead of byte-level (master_bin_log_pos) or time-level (seconds_behind_master).

Thanks to the statement injection we can inject this into the normal stream of operations transparent for the application. We try to keep it free of side-effects.

LUA

As a small side note: all the above features are implemented through a embedded lua engine.

Connection Pool

not implemented yet. Load Balancing isn't implemented yet, but will allow to all applications to utilize a connection pool which is maintained by the mysql-proxy. It will reduce the setup time for connections a bit.

The normal connection setup is:

client connects to server
server sends hand-shake
client sends auth packet
servers sends OK or ERR

with the connection pooling the connections between proxy and server are kept alive and are only reset when a client disconnects.

The proxy keeps a small pool of connections open and the initial hand-shake is answered without talking to the server. Depending on various factors we might be able to shadow the cost of a connection-setup.

Statement Load Balancing

Based on the connection pool we can use a single MySQL connection can multiplex multiple client connections against it. As long as the client is only using stateless SQL statements we can send the queries across any of the available connections we have active in the pool. This has the neat benefit that we can maintain a lot more connections to the client than we use to the server.

As soon as the client uses statements like:

SET @var = 1;
SHOW SESSION STATUS
FLUSH ...
LOCK ...
BEGIN
CREATE|DROP TEMPORARY TABLE ...

... and similar the connection gets a state and we can't use it for balancing any more. As automatic detection of such statements is tricky and will fail any we will add a command-line option we just gives the option to the user to allow multiplexing of client connections.

We will maintain transactions and LOCKs, but all other statements should just not be used in multiplexing mode.

XA Transaction Manager

not implemented yet The proxy could change transactions in the way they commit against a server. For HA reasons you might set up two masters which both get the same statements at the same time without using the mysql internal replication.

By replacing the BEGIN with a XA START "trx_name" and the COMMIT by a XA END ..., XA PREPARE .. and XA COMMIT we can use the two-phase commit support by innodb to commit a transaction against two or more databases.

This is giving the application synchronous replication for free.

Change Management

The proxy can also operate in a reverse mode where it receives the statement as a replication client. It allows the proxy to track all statements which cause a change to the DB objects in the server.

replication client

For now it is filtering:

CREATE|ALTER|DROP TABLE
CREATE|DROP INDEX
CREATE|ALTER|DROP DATABASE

including time-stamp and user-name (if available). Base on this information you can track changes on your DB-objects, versionize the schema automaticly and build a DDL-diff.

$ mysql-proxy  --repclient-port=3306

For now you need a account:

CREATE USER repl@127.0.0.1;
GRANT SUPER ON *.* TO repl@127.0.0.1;

Username repl, No password. The password-scramling isn't implemented yet.

$ ./agent/src/network-mysqld --mode=reverse
reading binlog from: binlog-file: hostname-bin.000009, binlog-pos: 366
(DDL) thread-id = 2, exec_time = 0, error-code = 0000
  QUERY: drop table t1
(DDL) thread-id = 2, exec_time = 0, error-code = 0000
  QUERY: create table t1 ( id int not null )
(DDL) thread-id = 2, exec_time = 0, error-code = 0000
  QUERY: alter table t1 add primary key (id)

Replication Off-Loading

not implemented yet. Similar to a MySQL server with blackhole is default storage engine a proxy can forward the binlog to multiple mysqld-slave with a lot less load than a full mysql-server.

replication off-loading