[Haifux] Login console freezes: Eli's weekly riddle

Eli Billauer eli at billauer.co.il
Fri Oct 29 00:44:59 MSD 2010


guy keren wrote:

> why not strace sshd and find what it's doing?
That turned out to be a brilliant idea. I was more aggressive in my 
approach than you suggested, since I need to know what happens before I 
can log in, so I injected an strace into the service startup script 
(with a nomination to the world's ugliest hack). Then I started the 
service and opened a session while the system behaved normally, and 
rebooted to try it out during those few minutes of unresponsiveness.

I went for the -ff option, which means that I got a file for each fork 
(and there are oh so many). With a little Perl script to pick up the 
most time consuming commands, I reached the critical snippet, which is 
actually the beginning of one of the processes.

What we can see, is that trying to open /dev/tty (and fail)  took 30 + 
epsilon seconds, and then opening /dev/pts/0 (successfully) took another 
30 + epsilon seconds. Makes me wonder what timeout is set to 30 seconds 
in the kernel. Needless to say, in the proper strace, these operations 
took a fraction of a second, like one would expect. Interestingly, it's 
exactly the same chain of events, only different timing (give or take 
which pts was opened).

Looks like I exchanged one kernel bug for another. Maybe I should report 
this to LKML...? And still, if this really is a kernel bug, how could it 
escape the simplest test before the kernel was released?

   Eli

-----
Trace for freshly booted system follows:

21:35:21.131094 close(12)               = 0 <0.000046>
21:35:21.131177 close(13)               = 0 <0.000052>
21:35:21.131259 close(4)                = 0 <0.000059>
21:35:21.131382 close(8)                = 0 <0.000027>
21:35:21.131436 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such 
device or address) <*30.022532*>
21:35:51.154269 setsid()                = 2333 <0.000155>
21:35:51.154639 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such 
device or address) <0.020142>
21:35:51.174931 ioctl(9, TIOCSCTTY)     = 0 <0.000030>
21:35:51.175049 rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0 
<0.000028>
21:35:51.175176 rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER, 
0x7fad91c31740}, NULL, 8) = 0 <0.000028>
21:35:51.175309 vhangup()               = -1 EPERM (Operation not 
permitted) <0.000026>
21:35:51.175407 rt_sigaction(SIGHUP, NULL, {SIG_IGN, [], SA_RESTORER, 
0x7fad91c31740}, 8) = 0 <0.000027>
21:35:51.175524 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 
0x7fad91c31740}, NULL, 8) = 0 <0.000028>
21:35:51.175642 open("/dev/pts/0", O_RDWR) = 4 <*30.063213*>
21:36:21.238952 close(9)                = 0 <0.000033>
21:36:21.239057 open("/dev/tty", O_WRONLY) = 8 <0.021415>
21:36:21.260555 close(8)                = 0 <0.000034>
21:36:21.260660 dup2(4, 0)              = 0 <0.000021>
21:36:21.260786 dup2(4, 1)              = 1 <0.000027>
21:36:21.260882 dup2(4, 2)              = 2 <0.000026>
21:36:21.260973 close(4)                = 0 <0.000023>

(and so on...)

And here's the parallel trace for a warm and cosy system:

21:28:19.163952 close(12)               = 0 <0.000052>
21:28:19.164125 close(13)               = 0 <0.000051>
21:28:19.164295 close(4)                = 0 <0.000043>
21:28:19.164435 close(8)                = 0 <0.000055>
21:28:19.164577 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such 
device or address) <*0.000068*>
21:28:19.164891 setsid()                = 19458 <0.000022>
21:28:19.164990 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such 
device or address) <0.000066>
21:28:19.165132 ioctl(9, TIOCSCTTY)     = 0 <0.000024>
21:28:19.165230 rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0 
<0.000043>
21:28:19.165379 rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER, 
0x7f78bf8fe740}, NULL, 8) = 0 <0.000034>
21:28:19.165502 vhangup()               = -1 EPERM (Operation not 
permitted) <0.000033>
21:28:19.165604 rt_sigaction(SIGHUP, NULL, {SIG_IGN, [], SA_RESTORER, 
0x7f78bf8fe740}, 8) = 0 <0.000048>
21:28:19.165828 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 
0x7f78bf8fe740}, NULL, 8) = 0 <0.000038>
21:28:19.165937 open("/dev/pts/2", O_RDWR) = 4 <*0.000027*>
21:28:19.166011 close(9)                = 0 <0.000017>
21:28:19.166065 open("/dev/tty", O_WRONLY) = 8 <0.000021>
21:28:19.166134 close(8)                = 0 <0.000014>
21:28:19.166191 dup2(4, 0)              = 0 <0.000014>
21:28:19.166246 dup2(4, 1)              = 1 <0.000014>
21:28:19.166301 dup2(4, 2)              = 2 <0.000014>
21:28:19.166355 close(4)                = 0 <0.000014>

-- 
Web: http://www.billauer.co.il

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://haifux.org/pipermail/haifux/attachments/20101028/b8b3943b/attachment.html 


More information about the Haifux mailing list