[Haifux] Login console freezes: Eli's weekly riddle
Eli Billauer
eli at billauer.co.il
Fri Oct 29 00:44:59 MSD 2010
guy keren wrote:
> why not strace sshd and find what it's doing?
That turned out to be a brilliant idea. I was more aggressive in my
approach than you suggested, since I need to know what happens before I
can log in, so I injected an strace into the service startup script
(with a nomination to the world's ugliest hack). Then I started the
service and opened a session while the system behaved normally, and
rebooted to try it out during those few minutes of unresponsiveness.
I went for the -ff option, which means that I got a file for each fork
(and there are oh so many). With a little Perl script to pick up the
most time consuming commands, I reached the critical snippet, which is
actually the beginning of one of the processes.
What we can see, is that trying to open /dev/tty (and fail) took 30 +
epsilon seconds, and then opening /dev/pts/0 (successfully) took another
30 + epsilon seconds. Makes me wonder what timeout is set to 30 seconds
in the kernel. Needless to say, in the proper strace, these operations
took a fraction of a second, like one would expect. Interestingly, it's
exactly the same chain of events, only different timing (give or take
which pts was opened).
Looks like I exchanged one kernel bug for another. Maybe I should report
this to LKML...? And still, if this really is a kernel bug, how could it
escape the simplest test before the kernel was released?
Eli
-----
Trace for freshly booted system follows:
21:35:21.131094 close(12) = 0 <0.000046>
21:35:21.131177 close(13) = 0 <0.000052>
21:35:21.131259 close(4) = 0 <0.000059>
21:35:21.131382 close(8) = 0 <0.000027>
21:35:21.131436 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such
device or address) <*30.022532*>
21:35:51.154269 setsid() = 2333 <0.000155>
21:35:51.154639 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such
device or address) <0.020142>
21:35:51.174931 ioctl(9, TIOCSCTTY) = 0 <0.000030>
21:35:51.175049 rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0
<0.000028>
21:35:51.175176 rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER,
0x7fad91c31740}, NULL, 8) = 0 <0.000028>
21:35:51.175309 vhangup() = -1 EPERM (Operation not
permitted) <0.000026>
21:35:51.175407 rt_sigaction(SIGHUP, NULL, {SIG_IGN, [], SA_RESTORER,
0x7fad91c31740}, 8) = 0 <0.000027>
21:35:51.175524 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER,
0x7fad91c31740}, NULL, 8) = 0 <0.000028>
21:35:51.175642 open("/dev/pts/0", O_RDWR) = 4 <*30.063213*>
21:36:21.238952 close(9) = 0 <0.000033>
21:36:21.239057 open("/dev/tty", O_WRONLY) = 8 <0.021415>
21:36:21.260555 close(8) = 0 <0.000034>
21:36:21.260660 dup2(4, 0) = 0 <0.000021>
21:36:21.260786 dup2(4, 1) = 1 <0.000027>
21:36:21.260882 dup2(4, 2) = 2 <0.000026>
21:36:21.260973 close(4) = 0 <0.000023>
(and so on...)
And here's the parallel trace for a warm and cosy system:
21:28:19.163952 close(12) = 0 <0.000052>
21:28:19.164125 close(13) = 0 <0.000051>
21:28:19.164295 close(4) = 0 <0.000043>
21:28:19.164435 close(8) = 0 <0.000055>
21:28:19.164577 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such
device or address) <*0.000068*>
21:28:19.164891 setsid() = 19458 <0.000022>
21:28:19.164990 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such
device or address) <0.000066>
21:28:19.165132 ioctl(9, TIOCSCTTY) = 0 <0.000024>
21:28:19.165230 rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0
<0.000043>
21:28:19.165379 rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER,
0x7f78bf8fe740}, NULL, 8) = 0 <0.000034>
21:28:19.165502 vhangup() = -1 EPERM (Operation not
permitted) <0.000033>
21:28:19.165604 rt_sigaction(SIGHUP, NULL, {SIG_IGN, [], SA_RESTORER,
0x7f78bf8fe740}, 8) = 0 <0.000048>
21:28:19.165828 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER,
0x7f78bf8fe740}, NULL, 8) = 0 <0.000038>
21:28:19.165937 open("/dev/pts/2", O_RDWR) = 4 <*0.000027*>
21:28:19.166011 close(9) = 0 <0.000017>
21:28:19.166065 open("/dev/tty", O_WRONLY) = 8 <0.000021>
21:28:19.166134 close(8) = 0 <0.000014>
21:28:19.166191 dup2(4, 0) = 0 <0.000014>
21:28:19.166246 dup2(4, 1) = 1 <0.000014>
21:28:19.166301 dup2(4, 2) = 2 <0.000014>
21:28:19.166355 close(4) = 0 <0.000014>
--
Web: http://www.billauer.co.il
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://haifux.org/pipermail/haifux/attachments/20101028/b8b3943b/attachment.html
More information about the Haifux
mailing list