mk-parallel-restore - Load files into MySQL in parallel.
mk-parallel-restore /path/to/files mk-parallel-restore --tab /path/to/files
Do not rely on mk-parallel-restore for your backups unless you have tested it. You have been warned.
mk-parallel-restore is a way to load SQL or delimited-file dumps into MySQL
in parallel at high speed. It is especially designed for restoring files
dumped by mk-parallel-dump. It automatically
detects whether a file contains SQL or delimited data from the filename
extension, and either shells out to mysql or executes LOAD DATA INFILE
with the file. On UNIX-like systems, it will even make a FIFO to decompress
gzipped files for LOAD DATA INFILE.
By default it discovers all files in the directory you specify on the command line. It uses the file's parent directory as the database name and the file's name (up to the first dot) as the table name. It can deal with files named like the following:
dir/tbl.sql dir/tbl.txt dir/tbl.csv dir/tbl.trg
It is also happy with files that look like this, where EXT is one of the
extensions just listed.
dir/tbl.EXT.000 dir/tbl.EXT.000.gz
By default, it loads SQL files first, if they exist, then loads CSV or
TXT files next, in order of the numbers in the filename extension as just
shown. This makes it easy for you to reload a table's definition followed by
its data, in case you dumped them into separate files (as happens with
mysqldump's --tab option). It loads TRG files, which create
triggers, last. Creating the triggers before loading data might keep the data
from being restored correctly. Files that are named 00_views.sql are loaded
even later, after all the parallel restores are finished; dependencies among
views and tables make them hard to restore one at a time. See
mk-parallel-dump for details on how data is dumped.
Exit status is 0 if everything went well, 1 if any files failed, and any other value indicates an internal error.
Output depends on verbosity. When --dry-run is given, output includes commands that would be executed.
When --verbose is 0, there is normally no output unless there's an error.
When --verbose is 1, there is one line of output for the entire job, showing how many tables were processed, how many files were loaded with what status, how much time elapsed, and how much time the parallel load jobs added up to. If any files were skipped, the filenames are printed to the output.
When --verbose is 2, there's one line of output per table, showing extra data such as how many threads were running when each table finished loading:
DATABASE TABLE FILES TIME STATUS THREADS sakila language 2 0.07 0 2 sakila film_actor 2 0.07 0 2 sakila actor 2 0.06 0 2 sakila payment 2 0.07 0 2 sakila transport_backup 2 0.05 0 2 sakila country 2 0.08 0 2 sakila film 2 0.05 0 2 sakila rental 2 0.07 0 2
User-contributed benchmarks are welcome. See http://www.paragon-cs.com/wordpress/?p=52 for one user's experiences.
You can download Maatkit from Google Code at http://code.google.com/p/maatkit/, or you can get any of the tools easily with a command like the following:
wget http://www.maatkit.org/get/toolname or wget http://www.maatkit.org/trunk/toolname
Where toolname can be replaced with the name (or fragment of a name) of any
of the Maatkit tools. Once downloaded, they're ready to run; no installation is
needed. The first URL gets the latest released version of the tool, and the
second gets the latest trunk code from Subversion.
default: yes
Treat chunks as atomic when resuming restore.
By default mk-parallel-restore resumes restoration from the first chunk that
is missing all its rows. For dumps of transactionally-safe tables (InnoDB),
it cannot happen that a chunk is only partially restored. Therefore, restoring
from the first missing chunk is safe.
However, for dumps of non-transactionally safe tables, it is possible that a
chunk can be only partially restored. In such cases, the chunk will wrongly
appear to be fully restored. Therefore, you must specify --no-atomic-resume
so that the partially restored chunk is fully restored.
type: string
Directory where FIFO files will be created.
default: yes
Restore the biggest tables first for highest concurrency.
default: yes
Enable binary logging (SET SQL_LOG_BIN=1).
Restore operations are replicated by default (SQL_LOG_BIN=1) except for --tab restores which are not replicated by default (SQL_LOG_BIN=0). This prevents large loads from being logged to the server's binary log.
The value given on the command line overrides the defaults. Therefore,
specifying --bin-log with --tab will allow the --tab restore
to replicate.
type: int
Set bulk_insert_buffer_size before each LOAD DATA INFILE.
Has no effect without --tab.
short form: -A; type: string; default: BINARY
Sets the connection, database, and LOAD DATA INFILE character set.
The default is BINARY, which is the safest value to use for LOAD DATA
INFILE. Has no effect without --tab.
LOAD DATA INFILE.
type: Array
Read this comma-separated list of config files; if specified, this must be the first option on the command line.
Files are in CSV format (implies --tab).
Changes --tab options so the following LOAD DATA INFILE statement is used:
LOAD DATA INFILE <filename> INTO TABLE <table> FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';
short form: -D; type: string
Load all files into this database.
Overrides the database which is normally specified by the directory in which the files live. Does not specify a default database for the connection.
short form: -d; type: hash
Restore only this comma-separated list of databases.
type: string
Restore only databases whose names match this regex.
type: string; default: gzip -d -c
Command used to decompress and print .gz files to STDOUT (like zcat).
short form: -F; type: string
Only read mysql options from the given file. You must give an absolute pathname.
ALTER TABLE DISABLE KEYS before each table.
default: yes
Stream files into a FIFO for --tab.
Load compressed tab-separated files by piping them into a FIFO and using the
FIFO with LOAD DATA INFILE, instead of by decompressing the files on disk.
Sets --umask to 0.
default: yes
Set FOREIGN_KEY_CHECKS=1 before LOAD DATA INFILE.
short form: -h; type: string
Connect to host.
IGNORE modifier to LOAD DATA INFILE.
type: Hash
Ignore this comma-separated list of databases.
type: Hash
Ignore this comma-separated list of table names.
Table names may be qualified with the database name.
Uses the LOCAL option to LOAD DATA INFILE.
If you enable this option, the files are read locally by the client library, not by the server.
LOAD DATA INFILE.
NO_AUTO_VALUE_ON_ZERO before LOAD DATA INFILE.
short form: -p; type: string
Password to use when connecting.
short form: -P; type: int
Port number to use for connection.
Display progress messages.
Progress is displayed each time a table finishes loading. Progress is calculated by measuring the size of each file to be loaded, and assuming all bytes are created equal. The output is the completed and total size, the percent completed, estimated time remaining, and estimated completion time.
short form: -q
Sets --verbose to 0.
REPLACE modifier to LOAD DATA INFILE.
default: yes
Resume the restore from a previously incomplete restore.
By default, mk-parallel-restore checks each table's chunks for existing
rows and restores only from the point where a previous restore stopped.
Specify --no-resume to disable restore resumption and fully restores every
table.
Restore resumption does not work with tab-separated files or dumps that were not chunked.
type: string; default: wait_timeout=10000
Set these MySQL variables. Immediately after connecting to MySQL, this string will be appended to SET and executed.
short form: -S; type: string
Socket file to use for connection.
Load tab-separated files with LOAD DATA INFILE.
This is similar to what mysqlimport does, but more flexible.
The following options are enabled unless they are specifically disabled on the command line:
L<--commit> L<--[no]disable-keys> L<--[no]no-auto-value-on-0>
And the following options are disabled (--no-bin-log, etc.) unless they
are specifically enabled on the command line:
L<--[no]bin-log> L<--[no]unique-checks> L<--[no]foreign-key-checks>
short form: -t; type: hash
Restore only this comma-separated list of table names.
Table names may be qualified with the database name.
type: string
Restore only tables whose names match this regex.
type: int; default: 2
Specifies the number of parallel processes to run.
The default is 2 (this is mk-parallel-restore after all -- 1 is not parallel). On GNU/Linux machines, the default is the number of times 'processor' appears in /proc/cpuinfo. On Windows, the default is read from the environment. In any case, the default is at least 2, even when there's only a single processor.
Run TRUNCATE TABLE before LOAD DATA INFILE.
This will delete all rows from a table before loading the first tab-delimited file into it.
type: string
Set the program's umask to this octal value.
This is useful when you want created files (such as FIFO files) to be readable or writable by other users (for example, the MySQL server itself).
default: yes
Set UNIQUE_CHECKS=1 before LOAD DATA INFILE.
short form: -u; type: string
User for login if not current user.
short form: -v; cumulative: yes; default: 1
Verbosity; can specify multiple times.
Repeatedly specifying it increments the verbosity. Default is 1 if not specified. See "OUTPUT".
short form: -w; type: time; default: 5m
Wait limit when server is down.
If the MySQL server crashes during loading, waits until the server comes back
and then continues with the rest of the files. mk-parallel-restore will
check the server every second until this time is exhausted, at which point it
will give up and exit.
The environment variable MKDEBUG enables verbose debugging output in all of
the Maatkit tools:
MKDEBUG=1 mk-....
You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any reasonably new version of Perl.
Please use Google Code Issues and Groups to report bugs or request support: http://code.google.com/p/maatkit/. You can also join #maatkit on Freenode to discuss Maatkit.
Please include the complete command-line used to reproduce the problem you are
seeing, the version of all MySQL servers involved, the complete output of the
tool when run with --version, and if possible, debugging output produced by
running with the MKDEBUG=1 environment variable.
This program is copyright 2007-2009 Baron Schwartz. Feedback and improvements are welcome.
THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; OR the Perl Artistic License. On UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic' to read these licenses.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
Baron Schwartz.
See also mk-parallel-dump.
This manual page documents Ver 1.0.15 Distrib 4047 $Revision: 4045 $.