mysql死锁(deadlock)分析及解决

恰逢这两天线上还有点量,问题就找到了,是deadlock,死锁。

所谓的死锁是指两个或者多个事务在同一资源上相互占用,并请求锁定对方占用的资源,从而导致恶性循环的现象。

问题1: 怎么会产生死锁那?

通过 show engine innodb status,捕获到了这死锁详情

------------------------
LATEST DETECTED DEADLOCK
------------------------
151128 12:39:46
*** (1) TRANSACTION:
TRANSACTION 20F580, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 2
MySQL thread id 3996143, OS thread handle 0x7f1750213700, query id 16311053 10.173.16.74 ruanyunflow update
INSERT INTO channel_sub_order_0 (`suborderid`,`orderid`,`subtransno`) VALUES(53566,57502,'')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 191 page no 427 n bits 184 index `PRIMARY` of table `flow`.`channel_sub_order_0` trx id 20F580 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 34 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
 0: len 4; hex 35333537; asc 5357;;
 1: len 6; hex 00000016db65; asc      e;;
 2: len 7; hex ed000001a22343; asc      #C;;
 3: len 5; hex 3130303133; asc 10013;;
 4: SQL NULL;
 5: SQL NULL;
 6: SQL NULL;
 7: len 13; hex 31353237323630313631372020; asc 15272601617  ;;
 8: len 3; hex 503133; asc P13;;
 9: len 9; hex 800000000000002832; asc        (2;;
 10: len 9; hex 800000000000002532; asc        %2;;
 11: len 4; hex 5555e8c8; asc UU  ;;
 12: len 4; hex 5555e8e5; asc UU  ;;
 13: len 4; hex 5555e93f; asc UU ?;;
 14: SQL NULL;
 15: len 7; hex 73756363657373; asc success;;
 16: len 28; hex ;;
 17: len 2; hex 3030; asc 00;;
 18: len 3; hex 313030; asc 100;;
 19: len 0; hex ; asc ;;
 20: SQL NULL;
 21: len 4; hex 80000000; asc     ;;
 22: len 4; hex 5555e93f; asc UU ?;;

*** (2) TRANSACTION:
TRANSACTION 20F57D, ACTIVE 0 sec fetching rows, thread declared inside InnoDB 417
mysql tables in use 1, locked 1
843 lock struct(s), heap size 96696, 102512 row lock(s), undo log entries 2
MySQL thread id 3996140, OS thread handle 0x7f1750180700, query id 16311048 10.173.16.74 ruanyunflow Updating
UPDATE channel_order_0 SET `ordercode`=4 , `updatetime`='2015-11-28 12:39:46' , `completiontime`='2015-11-28 12:39:46' where orderid=57484
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 191 page no 427 n bits 184 index `PRIMARY` of table `flow`.`channel_sub_order_0` trx id 20F57D lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
...
...
...
...
Record lock, heap no 113 PHYSICAL RECORD: n_fields 23; compact format; info bits 0
 0: len 5; hex 3533353438; asc 53548;;
 1: len 6; hex 00000020f57d; asc      };;
 2: len 7; hex 35000001b6140a; asc 5      ;;
 3: len 5; hex 3537343834; asc 57484;;
 4: len 0; hex ; asc ;;
 5: SQL NULL;
 6: SQL NULL;
 7: len 13; hex 31333339333235333539322020; asc 13393253592  ;;
 8: len 3; hex 503337; asc P37;;
 9: len 9; hex 800000000000000732; asc         2;;
 10: len 9; hex 800000000000000a00; asc          ;;
 11: len 4; hex 56592fdc; asc VY/ ;;
 12: len 4; hex 56592fdc; asc VY/ ;;
 13: len 4; hex 56593012; asc VY0 ;;
 14: SQL NULL;
 15: SQL NULL;
 16: SQL NULL;
 17: len 5; hex 3030303030; asc 00000;;
 18: len 3; hex 313030; asc 100;;
 19: len 0; hex ; asc ;;
 20: len 30; hex; (total 68 bytes);
 21: len 4; hex 80000000; asc     ;;
 22: len 4; hex 56593012; asc VY0 ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 201 page no 281 n bits 248 index `PRIMARY` of table `flow`.`channel_order_0` trx id 20F57D lock_mode X waiting
Record lock, heap no 180 PHYSICAL RECORD: n_fields 18; compact format; info bits 0
 0: len 5; hex 3537353032; asc 57502;;
 1: len 6; hex 00000020f580; asc       ;;
 2: len 7; hex b8000001fd0110; asc        ;;
 3: len 5; hex 3533353636; asc 53566;;
 4: len 3; hex 433534; asc C54;;
 5: len 1; hex 00; asc  ;;
 6: len 30; hex(total 32 bytes);
 7: SQL NULL;
 8: len 9; hex 800000000000000132; asc         2;;
 9: len 9; hex 800000000000000200; asc          ;;
 10: len 4; hex 80000000; asc     ;;
 11: len 4; hex 56593012; asc VY0 ;;
 12: len 4; hex 56593012; asc VY0 ;;
 13: len 4; hex 56593012; asc VY0 ;;
 14: len 1; hex 33; asc 3;;
 15: len 9; hex 800000000000000000; asc          ;;
 16: SQL NULL;
 17: len 0; hex ; asc ;;

*** WE ROLL BACK TRANSACTION (1)

注:子订单表:channel_sub_order_0, 子订单号:suborderid,并且是主键
主订单表:channel_order_0,主订单号:orderid,并且是主键

从上面信息中可知,
事务1,在插入子订单的时候,想获得suborderid=53566的锁,而这个锁事务2已经占有了;
事务2,在更新主订单的时候想获得orderid=57484的锁,而这个锁事务1已经占用了。
最后mysql 回滚了事务1,让事务2得以顺利进行。

从而可以得出死锁原因:

事务1和事务2以不同的顺序锁定了资源。比如事务1需要行A和行B的锁,事务2需要行B和行A的锁,事务1占有了行A的锁,事务2占有了行B的锁,这时,事务1请求占有行B的锁(在事务2那里还未释放),事务2请求占用行A的锁(在事务1那里还未释放),这就造成了相互等待的现象(死锁)。

可知原先的程序有问题:

插入订单的事务是这样的:先插入主订单,再插入子订单
更新订单的事务是这样的:先更新子订单,再插入主订单

问题复现:

步骤session1(插入)session2(更新)
1set autocommit = 0;set autocommit = 0;
2INSERT INTO channel_order_0 (orderid,suborderids,) VALUES(58266,54330);
3update channel_sub_order_0 set ordercode=100 where suborderid=54322;
4INSERT INTO channel_sub_order_0 (suborderid,orderid) VALUES(58266,54330);需要suborderid = 58266的锁
5UPDATE channel_order_0 SET ordercode=4 , updatetime='2015-11-28 12:39:46' , completiontime='2015-11-28 12:39:46' where orderid=58258;需要orderid=58258的锁
6ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

解决方法: 调整更新订单事务的执行顺序,改为先更新主订单再更新子订单。

问题2:插入的都是新订单,更新的都是老订单,订单号都不一样,为什么会在更新的时候已经锁定了插入新订单号的锁那?难道Innodb不是仅锁定了一行?

那我来执行一条更新语句,通过show engine innodb status查看锁定的行数:

sql: UPDATE channel_order_0 SET `ordercode`=4 , `updatetime`='2015-11-28 12:39:46' , `completiontime`='2015-11-28 12:39:46' where orderid=58258;
TRANSACTIONS
------------
Trx id counter 7CB502
Purge done for trx's n:o < 7CB35E undo n:o < 0
History list length 1238
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 5, OS thread handle 0x7f97591d4700, query id 105 localhost root
show engine innodb status
---TRANSACTION 7CB501, ACTIVE 9 sec
10055 lock struct(s), heap size 997816, 938445 row lock(s)
MySQL thread id 3, OS thread handle 0x7f9759205700, query id 103 localhost root

竟然锁定了 938445行

突然想到,我们的orderid是varchar类型,索引的应该都是字符,如果我用where orderid=58258 更新,理论上这样是用不到主键索引的,所以他会扫描全表,锁的机制是,会把所有扫描过的行都锁定

所以sql更换成:

UPDATE channel_order_0 SET `ordercode`=4 , `updatetime`='2015-11-28 12:39:46' , `completiontime`='2015-11-28 12:39:46' where orderid=‘58258’;

在看看show engine innodb status

------------
TRANSACTIONS
------------
Trx id counter 7CB501
Purge done for trx's n:o < 7CB35E undo n:o < 0
History list length 1238
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 4, OS thread handle 0x7f97591d4700, query id 99 localhost root
show engine innodb status
---TRANSACTION 7CB500, ACTIVE 8 sec
2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 3, OS thread handle 0x7f9759205700, query id 97 localhost root

看到就只锁定了一行。果然见效。

问题3: 即使没有用到主键索引从而锁定了全表,插入的时候用的是新的订单号,怎么会锁定新的订单号那?

这时想到了 间隙锁(Next-Key锁)

当我们用范围条件而不是相等条件检索数据,并请求共享或排他锁时,InnoDB会给符合条件的已有数据记录的 索引项加锁;对于键值在条件范围内但并不存在的记录,叫做“间隙(GAP)”,InnoDB也会对这个“间隙”加锁,这种锁机制就是所谓的间隙锁 (Next-Key锁)。

举例来说,假如emp表中只有101条记录,其empid的值分别是 1,2,...,100,101,下面的SQL:

select * from  emp where empid > 100 for update;

是一个范围条件的检索,InnoDB不仅会对符合条件的empid值为101的记录加锁,也会对empid大于101(这些记录并不存在)的“间隙”加锁。

InnoDB使用间隙锁的目的,一方面是为了防止幻读,以满足相关隔离级别的要求,对于上面的例子,要是不使用间隙锁,如果其他事务插入了empid大于100的任何记录,那么本事务如果再次执行上述语句,就会发生幻读;另外一方面,是为了满足其恢复和复制的需要。

很显然,在使用范围条件检索并锁定记录时,InnoDB这种加锁机制会阻塞符合条件范围内键值的并发插入,这往往会造成严重的锁等待。因此,在实际应用开发中,尤其是并发插入比较多的应用,我们要尽量优化业务逻辑,尽量使用相等条件来访问更新数据,避免使用范围条件。

还要特别说明的是,InnoDB除了通过范围条件加锁时使用间隙锁外,如果使用相等条件请求给一个不存在的记录加锁,InnoDB也会使用间隙锁!

验证
0: 已知:锁定了 938445 行
1: 查看一下表的总行数:

select count(*) from channel_order_0; //928391行

2: 查看:最大的orderid :

select orderid from channel_order_0 order by orderid+0 desc limit 1; //999387     

从而得知用到了间隙锁

从死锁日志中可知,事务1和事务2放生死锁,最后roll back了事务1

*** WE ROLL BACK TRANSACTION (1)

问题4: innodb如何处理死锁?

Innodb目前处理死锁的方法是:将持有最少行级排他锁的事务进行回滚。

mysql死锁(deadlock)分析及解决

  • qq_43638135
    妲己再美究为妃: 博主没有想过自己接一些私活干吗?我现在还没毕业,但是我也确实听说外挂市场自动化游戏脚本市场挺火热的,并且报酬也很丰厚,但是具体的我也不是很清楚,求解答。 (1个月前 #47楼) 查看回复(2) 举报 回复
    22