GTID情形中手动修复主从故障生机勃勃例

 

Preface

 

    In my last test of pt-heartbeat,both of master and slave were out of disk.And the mysql client was hang.In order to resolve the issue,I've tryed to fix the replicaiton environment without using mysqldump to reconfigure the slave.Let's see the details.

 

Procedure

 

I dropped test tables in database "sysbench" to release the disk space on master.

 1 [root@zlm2 08:56:51 /data/mysql/mysql3306/logs]
 2 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables=1 --table-size=10000000 --mysql-storage-engine=innodb cleanup
 3 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
 4 
 5 Dropping table 'sbtest1'...
 6 
 7 (zlm@192.168.1.101 3306)[(none)]>use sysbench;
 8 Reading table information for completion of table and column names
 9 You can turn off this feature to get a quicker startup with -A
10 
11 Database changed
12 (zlm@192.168.1.101 3306)[sysbench]>show tables;
13  -------------------- 
14 | Tables_in_sysbench |
15  -------------------- 
16 | hb                 |
17 | sbtest2            |
18 | sbtest3            |
19 | sbtest4            |
20 | sbtest5            |
21  -------------------- 
22 5 rows in set (0.00 sec)
23 
24 //Only sbtest1 was deleted.It's not enough.
25 
26 [root@zlm2 08:59:04 ~/sysbench-1.0/src/lua]
27 #sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port=3306 --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables=5 --table-size=10000000 --mysql-storage-engine=innodb cleanup
28 sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
29 
30 Dropping table 'sbtest1'...
31 Dropping table 'sbtest2'...
32 Dropping table 'sbtest3'...
33 Dropping table 'sbtest4'...
34 Dropping table 'sbtest5'...
35 
36 [root@zlm2 08:59:09 ~/sysbench-1.0/src/lua]
37 #df -h
38 Filesystem               Size  Used Avail Use% Mounted on
39 /dev/mapper/centos-root  8.4G  6.9G  1.5G  83% / //I'd got 27% free space.
40 devtmpfs                 488M     0  488M   0% /dev
41 tmpfs                    497M     0  497M   0% /dev/shm
42 tmpfs                    497M  6.6M  491M   2% /run
43 tmpfs                    497M     0  497M   0% /sys/fs/cgroup
44 /dev/sda1                497M  118M  379M  24% /boot
45 none                      87G   80G  6.6G  93% /vagrant
46 
47 (zlm@192.168.1.101 3306)[(none)]>drop database sysbench;
48 Query OK, 1 row affected (0.04 sec)
49 
50 //Further more,I dropped the "sysbench".

 

The slave hung still and disk space was full.

 1 [root@zlm3 08:55:17 ~]
 2 #df -h
 3 Filesystem               Size  Used Avail Use% Mounted on
 4 /dev/mapper/centos-root  8.4G  8.4G   20K 100% /
 5 devtmpfs                 488M     0  488M   0% /dev
 6 tmpfs                    497M     0  497M   0% /dev/shm
 7 tmpfs                    497M  6.5M  491M   2% /run
 8 tmpfs                    497M     0  497M   0% /sys/fs/cgroup
 9 /dev/sda1                497M  118M  379M  24% /boot
10 none                      87G   80G  6.6G  93% /vagrant
11 
12 (zlm@192.168.1.102 3306)[(none)]>show slave statusG
13 ^C^C -- query aborted
14 
15 
16 ^Z
17 [6]   Stopped                 mysql
18 
19 [root@zlm3 08:55:58 ~]
20 #pkill mysqld
21 
22 [root@zlm3 08:56:04 ~]
23 #./mysqld.sh 
24 
25 [root@zlm3 08:56:07 ~]
26 #mysql
27 ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.1.102' (111)
28 
29 [root@zlm3 09:00:35 ~]
30 #cd /data/mysql/mysql3306/data
31 
32 [root@zlm3 09:00:46 /data/mysql/mysql3306/data]
33 #cat error.log |tail -n 30
34 2018-07-19T08:57:02.581937 01:00 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 17039436409
35 2018-07-19T08:57:02.581958 01:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 17039436418
36 2018-07-19T08:57:02.581963 01:00 0 [Note] InnoDB: Database was not shutdown normally!
37 2018-07-19T08:57:02.581965 01:00 0 [Note] InnoDB: Starting crash recovery.
38 2018-07-19T08:57:02.696292 01:00 0 [Note] InnoDB: Transaction 31700139 was in the XA prepared state.
39 2018-07-19T08:57:02.700688 01:00 0 [Note] InnoDB: Transaction 31700139 was in the XA prepared state.
40 2018-07-19T08:57:02.700814 01:00 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
41 2018-07-19T08:57:02.700821 01:00 0 [Note] InnoDB: Trx id counter is 31700480
42 2018-07-19T08:57:02.701719 01:00 0 [Note] InnoDB: Last MySQL binlog file position 0 99139927, file name mysql-bin.000016
43 2018-07-19T08:57:02.805965 01:00 0 [Note] InnoDB: Ignoring tablespace `zlm`.`sbtest2` because the DISCARD flag is set .
44 2018-07-19T08:57:02.806462 01:00 0 [Note] InnoDB: Creating shared tablespace for temporary tables
45 2018-07-19T08:57:02.807316 01:00 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
46 2018-07-19T08:57:02.807568 01:00 0 [Note] InnoDB: Starting in background the rollback of uncommitted transactions
47 2018-07-19T08:57:02.807594 01:00 0 [Note] InnoDB: Rollback of non-prepared transactions completed
48 2018-07-19T08:57:02.871396 01:00 0 [Warning] InnoDB: Retry attempts for writing partial data failed.
49 2018-07-19T08:57:02.871423 01:00 0 [ERROR] InnoDB: Write to file ./ibtmp1failed at offset 11534336, 1048576 bytes should have been written, only 0 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
50 2018-07-19T08:57:02.871441 01:00 0 [ERROR] InnoDB: Error number 28 means 'No space left on device'
51 2018-07-19T08:57:02.871446 01:00 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
52 2018-07-19T08:57:02.871451 01:00 0 [ERROR] InnoDB: Could not set the file size of './ibtmp1'. Probably out of disk space
53 2018-07-19T08:57:02.871456 01:00 0 [ERROR] InnoDB: Unable to create the shared innodb_temporary
54 2018-07-19T08:57:02.871459 01:00 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
55 2018-07-19T08:57:03.273011 01:00 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
56 2018-07-19T08:57:03.273029 01:00 0 [ERROR] Plugin 'InnoDB' init function returned error.
57 2018-07-19T08:57:03.273033 01:00 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
58 2018-07-19T08:57:03.273037 01:00 0 [ERROR] Failed to initialize builtin plugins.
59 2018-07-19T08:57:03.273040 01:00 0 [ERROR] Aborting
60 
61 2018-07-19T08:57:03.273046 01:00 0 [Note] Binlog end
62 2018-07-19T08:57:03.273389 01:00 0 [Note] mysqld: Shutdown complete
63 
64 //The mysqld process could not run again because of no free disk space.

 

**I decided to drop all the binlogs on slave to release the disk space.**

 1 [root@zlm3 09:01:34 /data/mysql/mysql3306]
 2 #cd logs
 3 
 4 [root@zlm3 09:01:35 /data/mysql/mysql3306/logs]
 5 #ls -l
 6 total 2691436
 7 -rw-r----- 1 mysql mysql   4333745 Jul 13 11:07 mysql-bin.000001
 8 -rw-r----- 1 mysql mysql       653 Jul 16 09:28 mysql-bin.000002
 9 -rw-r----- 1 mysql mysql 268435663 Jul 16 10:03 mysql-bin.000003
10 -rw-r----- 1 mysql mysql 268435798 Jul 16 10:22 mysql-bin.000004
11 -rw-r----- 1 mysql mysql 268435798 Jul 16 10:32 mysql-bin.000005
12 -rw-r----- 1 mysql mysql 268435798 Jul 16 10:48 mysql-bin.000006
13 -rw-r----- 1 mysql mysql 268435798 Jul 16 11:29 mysql-bin.000007
14 -rw-r----- 1 mysql mysql  35214498 Jul 16 11:59 mysql-bin.000008
15 -rw-r----- 1 mysql mysql       249 Jul 16 11:59 mysql-bin.000009
16 -rw-r----- 1 mysql mysql 199908536 Jul 18 11:27 mysql-bin.000010
17 -rw-r----- 1 mysql mysql       249 Jul 18 11:27 mysql-bin.000011
18 -rw-r----- 1 mysql mysql 268474318 Jul 19 04:20 mysql-bin.000012
19 -rw-r----- 1 mysql mysql 268817026 Jul 19 04:24 mysql-bin.000013
20 -rw-r----- 1 mysql mysql 268802986 Jul 19 04:28 mysql-bin.000014
21 -rw-r----- 1 mysql mysql 268826152 Jul 19 04:32 mysql-bin.000015
22 -rw-r----- 1 mysql mysql  99422208 Jul 19 08:56 mysql-bin.000016
23 -rw-r----- 1 mysql mysql       704 Jul 19 04:32 mysql-bin.index
24 
25 [root@zlm3 09:01:36 /data/mysql/mysql3306/logs]
26 #rm -f *
27 
28 [root@zlm3 09:01:38 /data/mysql/mysql3306/logs]
29 #ls -l
30 total 0
31 
32 [root@zlm3 09:01:48 ~]
33 #df -h
34 Filesystem               Size  Used Avail Use% Mounted on
35 /dev/mapper/centos-root  8.4G  4.5G  4.0G  53% / //The free disk space had been reduced to 47%.
36 devtmpfs                 488M     0  488M   0% /dev
37 tmpfs                    497M     0  497M   0% /dev/shm
38 tmpfs                    497M  6.5M  491M   2% /run
39 tmpfs                    497M     0  497M   0% /sys/fs/cgroup
40 /dev/sda1                497M  118M  379M  24% /boot
41 none                      87G   80G  6.6G  93% /vagrant

 

**Ran the mysqld again and dropped the database "sysbench" on slave.**

 1 [root@zlm3 09:01:42 /data/mysql/mysql3306/logs]
 2 #sh /root/mysqld.sh
 3 
 4 [root@zlm3 09:01:51 /data/mysql/mysql3306/logs]
 5 #ps aux|grep mysqld
 6 mysql     4346  7.0 17.8 1109984 181576 pts/0  Sl   09:01   0:00 mysqld --defaults-file=/data/mysql/mysql3306/my.cnf
 7 root      4380  0.0  0.0 112640   960 pts/0    R    09:01   0:00 grep --color=auto mysqld
 8 
 9 [root@zlm3 09:01:55 /data/mysql/mysql3306/logs]
10 #mysql
11 Welcome to the MySQL monitor.  Commands end with ; or g.
12 Your MySQL connection id is 2
13 Server version: 5.7.21-log MySQL Community Server (GPL)
14 
15 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
16 
17 Oracle is a registered trademark of Oracle Corporation and/or its
18 affiliates. Other names may be trademarks of their respective
19 owners.
20 
21 Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
22 
23 (zlm@192.168.1.102 3306)[(none)]>drop database sysbench;
24 Query OK, 6 rows affected (0.11 sec)

 

Started the replication threads of slave.**

 1 (zlm@192.168.1.102 3306)[(none)]>start slave;
 2 Query OK, 0 rows affected (0.00 sec)
 3 
 4 (zlm@192.168.1.102 3306)[(none)]>show slave statusG
 5 *************************** 1. row ***************************
 6                Slave_IO_State: Waiting for master to send event
 7                   Master_Host: 192.168.1.101
 8                   Master_User: repl
 9                   Master_Port: 3306
10                 Connect_Retry: 60
11               Master_Log_File: mysql-bin.000019
12           Read_Master_Log_Pos: 155765801
13                Relay_Log_File: relay-bin.000068
14                 Relay_Log_Pos: 398
15         Relay_Master_Log_File: mysql-bin.000019
16              Slave_IO_Running: Yes
17             Slave_SQL_Running: No
18               Replicate_Do_DB: 
19           Replicate_Ignore_DB: 
20            Replicate_Do_Table: 
21        Replicate_Ignore_Table: 
22       Replicate_Wild_Do_Table: 
23   Replicate_Wild_Ignore_Table: 
24                    Last_Errno: 1146
25                    Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
26                  Skip_Counter: 0
27           Exec_Master_Log_Pos: 99148246
28               Relay_Log_Space: 56618146
29               Until_Condition: None
30                Until_Log_File: 
31                 Until_Log_Pos: 0
32            Master_SSL_Allowed: No
33            Master_SSL_CA_File: 
34            Master_SSL_CA_Path: 
35               Master_SSL_Cert: 
36             Master_SSL_Cipher: 
37                Master_SSL_Key: 
38         Seconds_Behind_Master: NULL
39 Master_SSL_Verify_Server_Cert: No
40                 Last_IO_Errno: 0
41                 Last_IO_Error: 
42                Last_SQL_Errno: 1146
43                Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist' //Since the database had been droppted.This error was notable.
44   Replicate_Ignore_Server_Ids: 
45              Master_Server_Id: 1013306
46                   Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
47              Master_Info_File: mysql.slave_master_info
48                     SQL_Delay: 0
49           SQL_Remaining_Delay: NULL
50       Slave_SQL_Running_State: 
51            Master_Retry_Count: 86400
52                   Master_Bind: 
53       Last_IO_Error_Timestamp: 
54      Last_SQL_Error_Timestamp: 180719 09:02:07
55                Master_SSL_Crl: 
56            Master_SSL_Crlpath: 
57            Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:3714549-3730021 //It was stuck on transaction 3714549(which contained error).
58             Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3714548,
59 5c77c31b-4add-11e8-81e2-080027de0e0e:1-168
60                 Auto_Position: 1
61          Replicate_Rewrite_DB: 
62                  Channel_Name: 
63            Master_TLS_Version: 
64 1 row in set (0.00 sec)
65 
66 [root@zlm3 09:10:42 ~]
67 #perror 1146
68 MySQL error code 1146 (ER_NO_SUCH_TABLE): Table '%-.192s.%-.192s' doesn't exist
69 
70 //Error 1146 indicated the absence of table "sbtest1" in "sysbench" database.
71 //Obviously,the slave was replaying the operations relevant to this table on master.The table even the database had been dropped.
72 //How could I do next step?Do I have to generate a new mysqldump file and reconfigure the slave again?
73 //There's One thing I'm rather sure that there were no other transactions generated in the whole course except the operations on "sysbench" database.
74 //Since I'd drop "sysbentch" database on both master and slave.Maybe I can fix the issue easily.

 

Checked the Executed_Gtid_Set on master.

1 (zlm@192.168.1.101 3306)[(none)]>show master status;
2  ------------------ ----------- -------------- ------------------ ------------------------------------------------ 
3 | File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                              |
4  ------------------ ----------- -------------- ------------------ ------------------------------------------------ 
5 | mysql-bin.000019 | 155765801 |              |                  | 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021 |
6  ------------------ ----------- -------------- ------------------ ------------------------------------------------ 
7 1 row in set (0.00 sec)
8 
9 //The executed gtid was upto "3730021".

 

Tryed to fix the replica of master.

  1 (zlm@192.168.1.102 3306)[(none)]>show slave statusG
  2 *************************** 1. row ***************************
  3                Slave_IO_State: Waiting for master to send event
  4                   Master_Host: 192.168.1.101
  5                   Master_User: repl
  6                   Master_Port: 3306
  7                 Connect_Retry: 60
  8               Master_Log_File: mysql-bin.000019
  9           Read_Master_Log_Pos: 155765801
 10                Relay_Log_File: relay-bin.000068
 11                 Relay_Log_Pos: 398
 12         Relay_Master_Log_File: mysql-bin.000019
 13              Slave_IO_Running: Yes
 14             Slave_SQL_Running: No
 15               Replicate_Do_DB: 
 16           Replicate_Ignore_DB: 
 17            Replicate_Do_Table: 
 18        Replicate_Ignore_Table: 
 19       Replicate_Wild_Do_Table: 
 20   Replicate_Wild_Ignore_Table: 
 21                    Last_Errno: 1146
 22                    Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
 23                  Skip_Counter: 0
 24           Exec_Master_Log_Pos: 99148246
 25               Relay_Log_Space: 56618627
 26               Until_Condition: None
 27                Until_Log_File: 
 28                 Until_Log_Pos: 0
 29            Master_SSL_Allowed: No
 30            Master_SSL_CA_File: 
 31            Master_SSL_CA_Path: 
 32               Master_SSL_Cert: 
 33             Master_SSL_Cipher: 
 34                Master_SSL_Key: 
 35         Seconds_Behind_Master: NULL
 36 Master_SSL_Verify_Server_Cert: No
 37                 Last_IO_Errno: 0
 38                 Last_IO_Error: 
 39                Last_SQL_Errno: 1146
 40                Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
 41   Replicate_Ignore_Server_Ids: 
 42              Master_Server_Id: 1013306
 43                   Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
 44              Master_Info_File: mysql.slave_master_info
 45                     SQL_Delay: 0
 46           SQL_Remaining_Delay: NULL
 47       Slave_SQL_Running_State: 
 48            Master_Retry_Count: 86400
 49                   Master_Bind: 
 50       Last_IO_Error_Timestamp: 
 51      Last_SQL_Error_Timestamp: 180719 09:02:57
 52                Master_SSL_Crl: 
 53            Master_SSL_Crlpath: 
 54            Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:3714549-3730021
 55             Executed_Gtid_Set: 
 56                 Auto_Position: 1
 57          Replicate_Rewrite_DB: 
 58                  Channel_Name: 
 59            Master_TLS_Version: 
 60 1 row in set (0.00 sec)
 61 
 62 (zlm@192.168.1.102 3306)[(none)]>reset master;
 63 Query OK, 0 rows affected (0.02 sec)
 64 
 65 (zlm@192.168.1.102 3306)[(none)]>set @@global.gtid_purged='1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021';
 66 Query OK, 0 rows affected (0.00 sec)
 67 
 68 //On account of surely knowing there were no other transactions at all.I set the "gtid_purged" variable to the value of "gtid_executed" on master.
 69 //It means I guised that all the transactions generated on master had been replayed on slave already.The slave could retrieve new GTID at the moment.
 70 
 71 (zlm@192.168.1.102 3306)[(none)]>start slave sql_thread;
 72 Query OK, 0 rows affected (0.02 sec)
 73 
 74 (zlm@192.168.1.102 3306)[(none)]>show slave statusG
 75 *************************** 1. row ***************************
 76                Slave_IO_State: Waiting for master to send event
 77                   Master_Host: 192.168.1.101
 78                   Master_User: repl
 79                   Master_Port: 3306
 80                 Connect_Retry: 60
 81               Master_Log_File: mysql-bin.000019
 82           Read_Master_Log_Pos: 155765801
 83                Relay_Log_File: relay-bin.000069
 84                 Relay_Log_Pos: 438
 85         Relay_Master_Log_File: mysql-bin.000019
 86              Slave_IO_Running: Yes
 87             Slave_SQL_Running: Yes //The sql_thread became "Yes".
 88               Replicate_Do_DB: 
 89           Replicate_Ignore_DB: 
 90            Replicate_Do_Table: 
 91        Replicate_Ignore_Table: 
 92       Replicate_Wild_Do_Table: 
 93   Replicate_Wild_Ignore_Table: 
 94                    Last_Errno: 0
 95                    Last_Error: 
 96                  Skip_Counter: 0
 97           Exec_Master_Log_Pos: 155765801
 98               Relay_Log_Space: 56618434
 99               Until_Condition: None
100                Until_Log_File: 
101                 Until_Log_Pos: 0
102            Master_SSL_Allowed: No
103            Master_SSL_CA_File: 
104            Master_SSL_CA_Path: 
105               Master_SSL_Cert: 
106             Master_SSL_Cipher: 
107                Master_SSL_Key: 
108         Seconds_Behind_Master: 0
109 Master_SSL_Verify_Server_Cert: No
110                 Last_IO_Errno: 0
111                 Last_IO_Error: 
112                Last_SQL_Errno: 0
113                Last_SQL_Error: 
114   Replicate_Ignore_Server_Ids: 
115              Master_Server_Id: 1013306
116                   Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
117              Master_Info_File: mysql.slave_master_info
118                     SQL_Delay: 0
119           SQL_Remaining_Delay: NULL
120       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
121            Master_Retry_Count: 86400
122                   Master_Bind: 
123       Last_IO_Error_Timestamp: 
124      Last_SQL_Error_Timestamp: 
125                Master_SSL_Crl: 
126            Master_SSL_Crlpath: 
127            Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:3714549-3730021
128             Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021 //The slave had skipped those GTID(which contained error 1146) of master and waited for newer GTID.The replica had been fixed up.
129                 Auto_Position: 1
130          Replicate_Rewrite_DB: 
131                  Channel_Name: 
132            Master_TLS_Version: 
133 1 row in set (0.00 sec)

 

 Summary

  • The variable "gtid_purged" cannot be set if "gtid_executed" is not empty.
  • Caution,"reset master" can only be used on slave.Keep in mind that don't do it on master anytime.
  • This case can be followed only in test environment 'cause you cannot guarantee whether all the transactions are really replayed on slave.

 

本文由星彩网app下载发布于星彩彩票app下载,转载请注明出处:GTID情形中手动修复主从故障生机勃勃例

TAG标签:
Ctrl+D 将本页面保存为书签,全面了解最新资讯,方便快捷。