A binlog storage engine was talked about many times already. Being able to JOIN against the binlogs, the just SELECT it, search for entries … is not possible with current SQL commands in the MySQL Server.
As you have read the previous articles here, I wrote a binlog encoder/decoder for MySQL Proxy and wrapped with with a Lua interface. Some years ago I also wrote a scriptable storage engine which can call lua scripts to return data for SELECTs. Putting the two together gives us: a Binlog Storage Engine.
In the datadir we just create
$ ls -la test/binlog_*.*
-rw-r--r-- 1 jan staff 1342 4 Mai 12:30 test/binlog_000016.lua
mysql> select * from test.binlog_000016;
Query OK, 0 rows affected (0.00 sec)
+----------------------------------+------------+----------------+
| type | event_size | next_event_pos |
+----------------------------------+------------+----------------+
| FORMAT_DESCRIPTION_EVENT | 102 | 106 |
| QUERY_EVENT | 71 | 177 |
| QUERY_EVENT | 71 | 248 |
That binlog_000016 table is a .lua script that the scriptable storage engine can load:
local binlog = assert(require("mysql.binlog"))
local binlog_table = { }
-- Lua does OOP and and inheritance ...
function binlog_table:new(o)
o = o or {}
setmetatable(o, self)
self.__index = self
return o
end
As we don’t want to mess with CREATE TABLE statements, we just create the .frm files on the fly by providing a discover() function.
-- return a the table structure
function binlog_table:discover()
return {
{ name = "type", type = 254, length = 32 },
{ name = "event_size", type = 3, length = 32 },
{ name = "next_event_pos", type = 3, length = 32 }
}
end
After SELECTing from the table the first time the mysql-server will create the .frm files for us automaticly:
$ ls -la test/binlog_*.*
-rw-rw---- 1 jan staff 8652 4 Mai 12:29 test/binlog_000016.frm
-rw-r--r-- 1 jan staff 1342 4 Mai 12:30 test/binlog_000016.lua
The most basic command is get row for a table-scan. Our :next() method sets up the iterator and returns the iterator-function that is called until it returns nil.
-- returns a iterator
function binlog_table:next()
-- set the iterator
self.f:seek(4)
self.f_iter = self.f:next()
if not self.f_iter then return nil end
-- return the iterator function
return function ()
self.event = self.f_iter()
if not self.event then return nil end
return { self.event.type, self.event.event_size, self.event.log_pos }
end
end
To create our binlog table we just inherit from that binlog_table class and return it:
-- this table is a binlog-table
local tbl = binlog_table:new({ })
function tbl:open()
self.f = binlog.open("/Users/jan/sandboxes/msb_5_1_35/data/msandbox.000016")
return (self.f ~= nil)
end
return tbl
The last piece is the storage engine plugin:
$ mysql <<EOF
FLUSH TABLES;
UNINSTALL PLUGIN lua;
EOF
$ make install
$ mysql << EOF
INSTALL PLUGIN lua SONAME 'libwormhole_engine.so';
EOF
If everything works out nicely you should see:
$ mysql
mysql> show plugins;
+------------+--------+----------------+-----------------------+---------+
| Name | Status | Type | Library | License |
+------------+--------+----------------+-----------------------+---------+
...
| LUA | ACTIVE | STORAGE ENGINE | libwormhole_engine.so | GPL |
...
Update
Properly normalized you can now decode the different events by just JOINing the ..._query table based on the event_pos
> select * from test.test_query AS q join test.test AS e using (event_pos) where e.type = "QUERY_EVENT" and event_pos = 106;
+-----------+-------------+-----------+--------------+-----------+-------------+------------+-----------------+------------+
| event_pos | type | thread_id | query | server_id | type | event_size | next_event_posH | timestamp |
+-----------+-------------+-----------+--------------+-----------+-------------+------------+-----------------+------------+
| 106 | QUERY_EVENT | 2 | FLUSH TABLES | 1 | QUERY_EVENT | 71 | 177 | 1241432054 |
+-----------+-------------+-----------+--------------+-----------+-------------+------------+-----------------+------------+
And that is supposed to be fast as event_pos is our PRIMARY KEY:
> explain select * from test.test_query AS q join test.test AS e using (event_pos) where e.type = "QUERY_EVENT" and event_pos = 106;
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | q | ref | PRIMARY | PRIMARY | 5 | const | 10 | Using where |
| 1 | SIMPLE | e | ref | PRIMARY | PRIMARY | 5 | const | 10 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
On the implementation side the ..._query table just inherits from the base-table and overwrites the discover (for the new table-structure) and the get_row (to return the corresponding information) methods. Everything else stays the same.
local binlog_table = assert(require("wormhole.binlog_table"))
-- this table is a binlog-table
local tbl = binlog_table:new()
function tbl:discover()
return {
{ name = "event_pos", type = 3, is_nullable = false, is_primary = true },
{ name = "type", type = 254, length = 64 * 3, is_nullable = false },
{ name = "thread_id", type = 3, is_nullable = false },
{ name = "query", type = 254, length = 256, is_nullable = false },
}
end
function tbl:get_row()
local e = self.event
if e.type == "QUERY_EVENT" then
return { e.log_pos - e.event_size,
e.type,
e.query.thread_id,
e.query.query,
}
else
return { e.log_pos - e.event_size,
e.type,
nil,
nil
}
end
end
function tbl:open()
self.f = binlog.open("/Users/jan/sandboxes/msb_5_1_35/data/msandbox.000016")
return (self.f ~= nil)
end
return tbl