In moving a relatively large table from MySQL to Redis,
you may find that extracting, transforming and loading a row at a time
can be excruciatingly slow. Here’s a quick trick you can use that pipes the output
of the mysql command directly to redis-cli, bypassing middleware and
allowing both data stores to operate at their peak speed.
Using this technique I was able to reduce my load time of ~8 million rows from 90 minutes down to two. Shabby? I think not.
Redis Protocol Output from MySQL
redis-cli has a mass-insert mode, specially
designed to execute bulk commands as quickly as possible. Input is expected to
be formatted in raw redis protocol,
which is simple enough to be generated from MySQL SELECT queries. Here we go!
My stats table:
CREATE TABLE events_all_time (
id int(11) unsigned NOT NULL AUTO_INCREMENT,
action varchar(255) NOT NULL,
count int(11) NOT NULL DEFAULT 0,
PRIMARY KEY (id),
UNIQUE KEY uniq_action (action)
);
Desired Redis command to be run for each row:
HSET events_all_time [action] [count]
SQL statement to produce Redis-protocol-formatted output:
-- events_to_redis.sql
SELECT CONCAT(
"*4\r\n",
'$', LENGTH(redis_cmd), '\r\n',
redis_cmd, '\r\n',
'$', LENGTH(redis_key), '\r\n',
redis_key, '\r\n',
'$', LENGTH(hkey), '\r\n',
hkey, '\r\n',
'$', LENGTH(hval), '\r\n',
hval, '\r'
)
FROM (
SELECT
'HSET' as redis_cmd,
'events_all_time' AS redis_key,
action AS hkey,
count AS hval
FROM events_all_time
) AS t
Tying it all together:
mysql stats_db --skip-column-names --raw < events_to_redis.sql | redis-cli --pipe
It’s important to use the --raw switch so that MySQL doesn’t rewrite the
crucial newline characters, and --skip-column-names keeps the formatted output
clear of column headers.