A binlog storage engine was talked about many times already. Being able to JOIN against the binlogs, the just SELECT it, search for entries ... is not possible with current SQL commands in the MySQL Server.
As you have read the previous articles here, I wrote a binlog encoder/decoder for MySQL Proxy and wrapped with with a Lua interface. Some years ago I also wrote a scriptable storage engine which can call lua scripts to return data for SELECTs. Putting the two together gives us: a Binlog Storage Engine.
In the datadir we just create
$ ls -la test/binlog_*.*
-rw-r--r-- 1 jan staff 1342 4 Mai 12:30 test/binlog_000016.lua
mysql> select * from test.binlog_000016;
Query OK, 0 rows affected (0.00 sec)
+----------------------------------+------------+----------------+
| type | event_size | next_event_pos |
+----------------------------------+------------+----------------+
| FORMAT_DESCRIPTION_EVENT | 102 | 106 |
| QUERY_EVENT | 71 | 177 |
| QUERY_EVENT | 71 | 248 |
That binlog_000016
table is a .lua
script that the scriptable storage engine can load:
local binlog = assert(require("mysql.binlog"))
local binlog_table = { }
-- Lua does OOP and and inheritance ...
function binlog_table:new(o)
o = o or {}
setmetatable(o, self)
self.__index = self
return o
end
As we don't want to mess with CREATE TABLE
statements, we just create the .frm
files on the fly by providing a discover()
function.
-- return a the table structure
function binlog_table:discover()
return {
{ name = "type", type = 254, length = 32 },
{ name = "event_size", type = 3, length = 32 },
{ name = "next_event_pos", type = 3, length = 32 }
}
end
After SELECT
ing from the table the first time the mysql-server will create the .frm
files for us automaticly:
$ ls -la test/binlog_*.*
-rw-rw---- 1 jan staff 8652 4 Mai 12:29 test/binlog_000016.frm
-rw-r--r-- 1 jan staff 1342 4 Mai 12:30 test/binlog_000016.lua
The most basic command is get row for a table-scan
. Our :next()
method sets up the iterator and returns the iterator-function that is called until it returns nil
.
-- returns a iterator
function binlog_table:next()
-- set the iterator
self.f:seek(4)
self.f_iter = self.f:next()
if not self.f_iter then return nil end
-- return the iterator function
return function ()
self.event = self.f_iter()
if not self.event then return nil end
return { self.event.type, self.event.event_size, self.event.log_pos }
end
end
To create our binlog table we just inherit from that binlog_table
class and return it:
-- this table is a binlog-table
local tbl = binlog_table:new({ })
function tbl:open()
self.f = binlog.open("/Users/jan/sandboxes/msb_5_1_35/data/msandbox.000016")
return (self.f ~= nil)
end
return tbl
The last piece is the storage engine plugin:
$ mysql <<EOF
FLUSH TABLES;
UNINSTALL PLUGIN lua;
EOF
$ make install
$ mysql << EOF
INSTALL PLUGIN lua SONAME 'libwormhole_engine.so';
EOF
If everything works out nicely you should see:
$ mysql
mysql> show plugins;
+------------+--------+----------------+-----------------------+---------+
| Name | Status | Type | Library | License |
+------------+--------+----------------+-----------------------+---------+
...
| LUA | ACTIVE | STORAGE ENGINE | libwormhole_engine.so | GPL |
...
Update
Properly normalized you can now decode the different events by just JOIN
ing the ..._query
table based on the event_pos
> select * from test.test_query AS q join test.test AS e using (event_pos) where e.type = "QUERY_EVENT" and event_pos = 106;
+-----------+-------------+-----------+--------------+-----------+-------------+------------+-----------------+------------+
| event_pos | type | thread_id | query | server_id | type | event_size | next_event_posH | timestamp |
+-----------+-------------+-----------+--------------+-----------+-------------+------------+-----------------+------------+
| 106 | QUERY_EVENT | 2 | FLUSH TABLES | 1 | QUERY_EVENT | 71 | 177 | 1241432054 |
+-----------+-------------+-----------+--------------+-----------+-------------+------------+-----------------+------------+
And that is supposed to be fast as event_pos
is our PRIMARY KEY
:
> explain select * from test.test_query AS q join test.test AS e using (event_pos) where e.type = "QUERY_EVENT" and event_pos = 106;
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | q | ref | PRIMARY | PRIMARY | 5 | const | 10 | Using where |
| 1 | SIMPLE | e | ref | PRIMARY | PRIMARY | 5 | const | 10 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
On the implementation side the ..._query
table just inherits from the base-table and overwrites the discover
(for the new table-structure) and the get_row
(to return the corresponding information) methods. Everything else stays the same.
local binlog_table = assert(require("wormhole.binlog_table"))
-- this table is a binlog-table
local tbl = binlog_table:new()
function tbl:discover()
return {
{ name = "event_pos", type = 3, is_nullable = false, is_primary = true },
{ name = "type", type = 254, length = 64 * 3, is_nullable = false },
{ name = "thread_id", type = 3, is_nullable = false },
{ name = "query", type = 254, length = 256, is_nullable = false },
}
end
function tbl:get_row()
local e = self.event
if e.type == "QUERY_EVENT" then
return { e.log_pos - e.event_size,
e.type,
e.query.thread_id,
e.query.query,
}
else
return { e.log_pos - e.event_size,
e.type,
nil,
nil
}
end
end
function tbl:open()
self.f = binlog.open("/Users/jan/sandboxes/msb_5_1_35/data/msandbox.000016")
return (self.f ~= nil)
end
return tbl
Comments