[摘要] 因为机器突然断电,mysql innodb crash 了。数据文件损坏导致mysql进程起不来,几经折腾,终于把服务起来了,但是解决过程并不完美,下面还是把处理经过记录一下。
Mysql版本:5.6.36,
操作系统:CentOS release 6.8
先贴出来mysql.log 日志的报错情况:
information that should help you find out what is causing the crash. 2018-01-05 08:19:44 13811 [Note] Plugin 'FEDERATED' is disabled. 2018-01-05 08:19:44 13811 [Note] InnoDB: Using atomics to ref count buffer pool pages 2018-01-05 08:19:44 13811 [Note] InnoDB: The InnoDB memory heap is disabled 2018-01-05 08:19:44 13811 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-01-05 08:19:44 13811 [Note] InnoDB: Memory barrier is not used 2018-01-05 08:19:44 13811 [Note] InnoDB: Compressed tables use zlib 1.2.3 2018-01-05 08:19:44 13811 [Note] InnoDB: Using Linux native AIO 2018-01-05 08:19:44 13811 [Note] InnoDB: Using CPU crc32 instructions 2018-01-05 08:19:44 13811 [Note] InnoDB: Initializing buffer pool, size = 8.0G 2018-01-05 08:19:44 13811 [Note] InnoDB: Completed initialization of buffer pool 2018-01-05 08:19:44 13811 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-05 08:19:44 13811 [Note] InnoDB: Log scan progressed past the checkpoint lsn 229983134766 2018-01-05 08:19:44 13811 [Note] InnoDB: Database was not shutdown normally! 2018-01-05 08:19:44 13811 [Note] InnoDB: Starting crash recovery. 2018-01-05 08:19:44 13811 [Note] InnoDB: Reading tablespace information from the .ibd files... 2018-01-05 08:19:44 13811 [Note] InnoDB: Restoring possible half-written data pages 2018-01-05 08:19:44 13811 [Note] InnoDB: from the doublewrite buffer... InnoDB: Doing recovery: scanned up to log sequence number 229983251745 InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 691. InnoDB: You may have to recover from a backup. 2018-01-05 08:19:45 7f3dd3ed1720 InnoDB: Page dump in ascii and hex (16384 bytes): InnoDB: End of page dump 2018-01-05 08:19:45 7f3dd3ed1720 InnoDB: uncompressed page, stored checksum in field1 1575536212, calculated checksums for field1: crc32 1100691290, innodb 1575536212, none 3735928559, stored checksum in field2 2238042318, calculated checksums for field2: crc32 1100691290, innodb 3402961953, none 3735928559, page LSN 53 2349971314, low 4 bytes of LSN at page end 2349884494, page number (if stored to page already) 691, space id (if created with >= MySQL-4.1.1 and stored already) 0 InnoDB: Page may be an update undo log page InnoDB: Database page corruption on disk or a failed InnoDB: file read of page 691. InnoDB: You may have to recover from a backup. InnoDB: It is also possible that your operating InnoDB: system has corrupted its own file cache InnoDB: and rebooting your computer removes the InnoDB: error. InnoDB: If the corrupt page is an index page InnoDB: you can also try to fix the corruption InnoDB: by dumping, dropping, and reimporting InnoDB: the corrupt table. You can use CHECK InnoDB: TABLE to scan your table for corruption. InnoDB: See also http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. InnoDB: Ending processing because of a corrupt database page. 2018-01-05 08:19:45 7f3dd3ed1720 InnoDB: Assertion failure in thread 139903525263136 in file buf0buf.cc line 4295 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 00:19:45 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=0 max_threads=700 thread_count=0 connection_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 285949 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. 8388608 + 131072 * 700 8388608 + 91750400 = 97792 Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8e5d85] /usr/sbin/mysqld(handle_fatal_signal+0x494)[0x6692b4] /lib64/libpthread.so.0[0x35ea00f790] /lib64/libc.so.6(gsignal+0x35)[0x35e9c32625] /lib64/libc.so.6(abort+0x175)[0x35e9c33e05] /usr/sbin/mysqld[0xa4fc37] /usr/sbin/mysqld[0xa65454] /usr/sbin/mysqld[0xa6589b] /usr/sbin/mysqld[0xa53851] /usr/sbin/mysqld[0xa1dadb] /usr/sbin/mysqld[0xa12a45] /usr/sbin/mysqld[0xa12dc6] /usr/sbin/mysqld[0xa1527e] /usr/sbin/mysqld[0x9ff480] /usr/sbin/mysqld[0x942b5d] /usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x5ac828] /usr/sbin/mysqld[0x6f3171] /usr/sbin/mysqld(_Z11plugin_initPiPPci+0xbb6)[0x6f6fd6] /usr/sbin/mysqld[0x59ea28] /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x42d)[0x5a3cdd] /lib64/libc.so.6(__libc_start_main+0xfd)[0x35e9c1ed5d] /usr/sbin/mysqld[0x595349] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.
根据http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html的提示,需要修复数据库和表。
由于mysql现在仍无法启动,所以需要增加如下:
[mysqld]
innodb_force_recovery = 4
关于innodb_force_recovery的几个值介绍:
As a safety measure, InnoDB prevents INSERT, UPDATE, or DELETE operations when innodb_force_recovery is greater than 0. 1 (SRV_FORCE_IGNORE_CORRUPT) Lets the server run even if it detects a corrupt page. Tries to make SELECT * FROM tbl_name jump over corrupt index records and pages, which helps in dumping tables. 2 (SRV_FORCE_NO_BACKGROUND) Prevents the master thread and any purge threads from running. If a crash would occur during the purge operation, this recovery value prevents it. 3 (SRV_FORCE_NO_TRX_UNDO) Does not run transaction rollbacks after crash recovery. 4 (SRV_FORCE_NO_IBUF_MERGE) Prevents insert buffer merge operations. If they would cause a crash, does not do them. Does not calculate table statistics. This value can permanently corrupt data files. After using this value, be prepared to drop and recreate all secondary indexes. 5 (SRV_FORCE_NO_UNDO_LOG_SCAN) Does not look at undo logs when starting the database: InnoDB treats even incomplete transactions as committed. This value can permanently corrupt data files. 6 (SRV_FORCE_NO_LOG_REDO) Does not do the redo log roll-forward in connection with recovery. This value can permanently corrupt data files. Leaves database pages in an obsolete state, which in turn may introduce more corruption into B-trees and other database structures.
在my.cnf中添加完 innodb_force_recovery 配置后,mysql可以启动了,然后尝试修复所有的数据库:
mysqlcheck -u root -p --auto-repair --check --all-databases common_schema.metadata OK common_schema.numbers OK mysql.columns_priv OK mysql.db OK mysql.event OK mysql.func OK mysql.general_log OK mysql.help_category OK mysql.help_keyword OK
全部修复完成,去掉innodb_force_recovery,再重启mysql,仍然报错。
使用xtrabackup的操作中发现了这样一个提示:
xtrabackup: Error: failed to read page after 10 retries. File ./ibdata1 seems to be corrupted.
于是尝试在 innodb_force_recovery = 4 的情况下,使用xtrabackup备份一下数据库,然后重建mysql。
这个过程中 innobackupex 备份是成功的,重建mysql也是成功的,重新apply-log和copy-back也是成功,但是恢复之后的数据库仍然无法启动,跟之前的情况类似,已经很无语了。
无奈之下选择使用了innobackupex最近的一次备份进行恢复,完成后重启mysql服务成功!
但是问题是最近的备份距现在已经过去了20个小时,innodb crash 这段时间的更新会丢失,由于在innodb_force_recovery = 4的状态下,mysql是可以启动的,所以在原来的数据基础上启动了新的端口:3307,然后手动比较3307和3306两套数据的变化!
唉,太挫了~