Async MySQL Queries with C-API

September 9th, 2008

I was wondering for quite a while if MySQLs C-API (libmysqlclient) can really do async queries and where it is lacking support.

Scanning through the source (sql/client.c) you see:

int mysql_real_query(MYSQL *mysql, const char *query, ulong length)
{

  if (mysql_send_query(mysql,query,length))
    return 1;

  return((int) (*mysql->methods->read_query_result)(mysql));
}

Digging a bit deeper you discover that mysql_send_query() and mysql_read_query_result() are public, but undocumented. Well, but that doesn't stop us.

When I talk about async queries I mean:

  • being able to issue several queries in parallel without using threads

That is usually what you need in event-driven applications and allows you to create parallel application without having to use too many system resources (memory, threads, ...).

As long as you use mysql_real_query() you can't get there. It waits until the resultset is received and you can not do anything in this application until the result-set is received. The usual way is "create a thread, do the work in the other thread". In the event-driven model you send your query to the server and ask the kernel (with select(), poll(), ...) to tell you when the server has something for you. In the meantime to you do something with the application (like send more queries) and check from time to time if something was returned from the server.

As an example I would expect a output like below:

async-clib.c:53: connected m->net.fd = 9
async-clib.c:53: connected m->net.fd = 10
async-clib.c:53: connected m->net.fd = 11
async-clib.c:53: connected m->net.fd = 14
async-clib.c:73: sending 'SELECT SLEEP(0.1 * 4)' on m->net.fd = 9
async-clib.c:73: sending 'SELECT SLEEP(0.1 * 3)' on m->net.fd = 10
async-clib.c:73: sending 'SELECT SLEEP(0.1 * 2)' on m->net.fd = 11
async-clib.c:73: sending 'SELECT SLEEP(0.1 * 1)' on m->net.fd = 14
async-clib.c:15: a result for m->net.fd = 14
async-clib.c:15: a result for m->net.fd = 11
async-clib.c:15: a result for m->net.fd = 10
async-clib.c:15: a result for m->net.fd = 9
  • 4 connections are opened
  • each sends a query with a different cost
  • we read the result back as it returns

The Code

 #include <string.h>
 #include <mysql.h>
 #include <glib.h>
 #include <event.h>

We want to use glib2 for some data structures (matter of taste) and libevent for the event-handling to make it portable (also a matter of taste).

To get everything going, we init libevent and setup some structures.

int main(int argc, char **argv) {
    GPtrArray *connections;
    GPtrArray *events;
    int i;
    struct event_base *evbase;

    evbase = event_base_new();

    connections = g_ptr_array_new();
    events = g_ptr_array_new();

... and start up the connections

    /* init all the connections */
    for (i = 0; i < 4; i++) {
            MYSQL *m;

            m = mysql_init(NULL);
            m->free_me = 1;

            if (!mysql_real_connect(m, "127.0.0.1", "root", "password", NULL, 3306, NULL, 0)) {
                    g_error("%s: %s (%d)",
                                    G_STRLOC,
                                    mysql_error(m),
                                    mysql_errno(m));
            }

            g_print("%s: connected m->net.fd = %d\n", G_STRLOC, m->net.fd);
            g_ptr_array_add(connections, m);
    }

... all are connected now and we can actually run queries on them:

    /* send a query to all connections
     *
     * the proof that it works, we give each of the query decreasing cost:
     * - 1st connection waits 4ms
     * - 2nd connection waits 3ms ...
     *
     * we should see that the queries return in reverse order
     */
    for (i = 0; i < connections->len; i++) {
            MYSQL *m = connections->pdata[i];
            struct event *ev;
            gchar *q = g_strdup_printf("SELECT SLEEP(0.1 * %d)", (connections->len - i));

            mysql_send_query(m, q, strlen(q));

            /* register the fd for this fd */
            ev = g_new0(struct event, 1);
            event_set(ev, m->net.fd, EV_READ, read_result, m);
            event_base_set(evbase, ev);

            event_add(ev, NULL); /* poke us when the server sends us a resultset */
            g_ptr_array_add(events, ev);

            g_print("%s: sending '%s' on m->net.fd = %d\n",
                            G_STRLOC,
                            q,
                            m->net.fd);

            g_free(q);
    }

... we call mysql_send_query() to send the query to the server and register the event in the event-pool. In case we can read something from the mysql-socket we want libevent to call our hook.

Everything is setup nicely. Let's wait until the server sends something back:

    /* the event-loop */
    while (0 == event_base_loop(evbase, 0)); /* event_base_loop() returns 1 no events are registered */

If all queries are handled, we just leave the loop, cleanup and exit:

    /* cleanup */
    for (i = 0; i < connections->len; i++) mysql_close(g_ptr_array_index(connections, i));
    g_ptr_array_free(connections, TRUE);

    for (i = 0; i < events->len; i++) g_free(g_ptr_array_index(events, i));
    g_ptr_array_free(events, TRUE);

    event_base_free(evbase);

    return 0;
}

Our callback now only has to read the query result and do something with it. As a dummy we store it and free it right away.

/**
 * handle the result-sets we receive from the server
 */
static void read_result(int fd, short event, void *_userdata) {
    MYSQL *m = _userdata;

    if (0 == mysql_read_query_result(m)) {
            MYSQL_RES *res = mysql_store_result(m);

            g_print("%s: a result for m->net.fd = %d\n",
                            G_STRLOC,
                            m->net.fd);

            mysql_free_result(res);
    } else {
            g_print("%s: mysql_read_query_result() failed for fd = %d: %s (%d)\n",
                            G_STRLOC,
                            m->net.fd,
                            mysql_error(m),
                            mysql_errno(m));
    }
}

That's it. All the pieces are there to create a nice little benchmarking tool that can create several thousand connections in parallel without eating too many resources.

For 10 connections the RSS is 1949k, for 100 connections it is 3212k.

Conclusion

Async Queries are possible with the mysqlclient lib and they are very efficient to write something like a load-generator.

But the API isn't complete at all. Several pieces are missing to make it fully async:

  • mysql_connect() is blocking only
  • mysql_send_query() doesn't seem to handle EAGAIN (send buffer is full)
  • I don't know if mysql_use_result() can be return EAGAIN easily

Even without the above available in the full non-blocking way, you can also get pretty far with mysql_send_query().

6 Responses to “Async MySQL Queries with C-API”

  1. Eric Day Says:
    Hi Jan! I've looked into hacking async I/O into the client as well, but decided to start from scratch for a few different reasons (unnecessary layers like vio, poor buffer management, the ability to use callbacks, ...). The new library is actually part of the drizzle project, but will be fully backwards compatible with recent MySQL versions as well. If the protocol ends up changing for drizzle, the library will still support old versions as well. I'm hoping to have something to test with in a couple weeks!
  2. Roland Bouman Says:
    heh, great post! now, the obvious question is: what are the chances you are going to hack this into mysql proxy? :p Roland
  3. Jesse Farmer Says:
    This is exactly how how MySQLPlus works: http://github.com/oldmoe/mysqlplus/tree/master See http://github.com/oldmoe/mysqlplus/tree/master/ext/mysql.c specifically.
  4. Jan Kneschke Says:
    Roland, the proxy is fully async already. What we could do is putting a API wrapper on top of the code and expose it as another libmysqlclient.
  5. Anonymous Coward Says:
    Let's not forget the fact that mysql_read_query_result might block while reading the response. The only advantage here is that it's not necessary to wait for a response from the server when the query is sent.
  6. roger Says:
    Yeah the benefit this gives is that you don't have to 'block' while waiting for the response to return. It still might block [as anonymous mentioned] while the results are being read in. Any thoughts on how to avoid this? Currently the only thing I do [in Ruby anyway] is set query_with_result to false and that then allows me to parse the results as they are read in off the wire, one at a time [the default is store_result, which reads them all in THEN allows you to parse them]. The parsing and conversion to Ruby objects takes longer than the IO wait, anyway, so you end up not wasting any cpu time, but...it's still not evented. Thoughts? -=R

Sorry, comments are closed for this article.