<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html style="direction: ltr;">
<head>
<meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
</head>
<body style="direction: ltr;" bgcolor="#ffffff" text="#000000">
<p style="margin-bottom: 0cm; margin-top: 0pt;">guy keren wrote:<br>
</p>
<blockquote cite="mid:4CC9A6F7.3040605@actcom.co.il" type="cite">why
not strace sshd and find what it's doing?
<br>
</blockquote>
That turned out to be a brilliant idea. I was more aggressive in my
approach than you suggested, since I need to know what happens before I
can log in, so I injected an strace into the service startup script
(with a nomination to the world's ugliest hack). Then I started the
service and opened a session while the system behaved normally, and
rebooted to try it out during those few minutes of unresponsiveness.<br>
<br>
I went for the -ff option, which means that I got a file for each fork
(and there are oh so many). With a little Perl script to pick up the
most time consuming commands, I reached the critical snippet, which is
actually the beginning of one of the processes.<br>
<br>
What we can see, is that trying to open /dev/tty (and fail) took 30 +
epsilon seconds, and then opening /dev/pts/0 (successfully) took
another 30 + epsilon seconds. Makes me wonder what timeout is set to 30
seconds in the kernel. Needless to say, in the proper strace, these
operations took a fraction of a second, like one would expect.
Interestingly, it's exactly the same chain of events, only different
timing (give or take which pts was opened).<br>
<br>
Looks like I exchanged one kernel bug for another. Maybe I should
report this to LKML...? And still, if this really is a kernel bug, how
could it escape the simplest test before the kernel was released?<br>
<br>
Eli<br>
<br>
-----<br>
Trace for freshly booted system follows:<br>
<br>
21:35:21.131094 close(12) = 0 <0.000046><br>
21:35:21.131177 close(13) = 0 <0.000052><br>
21:35:21.131259 close(4) = 0 <0.000059><br>
21:35:21.131382 close(8) = 0 <0.000027><br>
21:35:21.131436 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such
device or address) <<font color="#ff0000"><b>30.022532</b></font>><br>
21:35:51.154269 setsid() = 2333 <0.000155><br>
21:35:51.154639 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such
device or address) <0.020142><br>
21:35:51.174931 ioctl(9, TIOCSCTTY) = 0 <0.000030><br>
21:35:51.175049 rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0
<0.000028><br>
21:35:51.175176 rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER,
0x7fad91c31740}, NULL, 8) = 0 <0.000028><br>
21:35:51.175309 vhangup() = -1 EPERM (Operation not
permitted) <0.000026><br>
21:35:51.175407 rt_sigaction(SIGHUP, NULL, {SIG_IGN, [], SA_RESTORER,
0x7fad91c31740}, 8) = 0 <0.000027><br>
21:35:51.175524 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER,
0x7fad91c31740}, NULL, 8) = 0 <0.000028><br>
21:35:51.175642 open("/dev/pts/0", O_RDWR) = 4 <<b><font
color="#ff0000">30.063213</font></b>><br>
21:36:21.238952 close(9) = 0 <0.000033><br>
21:36:21.239057 open("/dev/tty", O_WRONLY) = 8 <0.021415><br>
21:36:21.260555 close(8) = 0 <0.000034><br>
21:36:21.260660 dup2(4, 0) = 0 <0.000021><br>
21:36:21.260786 dup2(4, 1) = 1 <0.000027><br>
21:36:21.260882 dup2(4, 2) = 2 <0.000026><br>
21:36:21.260973 close(4) = 0 <0.000023><br>
<br>
(and so on...)<br>
<br>
And here's the parallel trace for a warm and cosy system:<br>
<br>
21:28:19.163952 close(12) = 0 <0.000052><br>
21:28:19.164125 close(13) = 0 <0.000051><br>
21:28:19.164295 close(4) = 0 <0.000043><br>
21:28:19.164435 close(8) = 0 <0.000055><br>
21:28:19.164577 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such
device or address) <<font color="#ff0000"><b>0.000068</b></font>><br>
21:28:19.164891 setsid() = 19458 <0.000022><br>
21:28:19.164990 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such
device or address) <0.000066><br>
21:28:19.165132 ioctl(9, TIOCSCTTY) = 0 <0.000024><br>
21:28:19.165230 rt_sigaction(SIGHUP, NULL, {SIG_DFL, [], 0}, 8) = 0
<0.000043><br>
21:28:19.165379 rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER,
0x7f78bf8fe740}, NULL, 8) = 0 <0.000034><br>
21:28:19.165502 vhangup() = -1 EPERM (Operation not
permitted) <0.000033><br>
21:28:19.165604 rt_sigaction(SIGHUP, NULL, {SIG_IGN, [], SA_RESTORER,
0x7f78bf8fe740}, 8) = 0 <0.000048><br>
21:28:19.165828 rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER,
0x7f78bf8fe740}, NULL, 8) = 0 <0.000038><br>
21:28:19.165937 open("/dev/pts/2", O_RDWR) = 4 <<b><font
color="#ff0000">0.000027</font></b>><br>
21:28:19.166011 close(9) = 0 <0.000017><br>
21:28:19.166065 open("/dev/tty", O_WRONLY) = 8 <0.000021><br>
21:28:19.166134 close(8) = 0 <0.000014><br>
21:28:19.166191 dup2(4, 0) = 0 <0.000014><br>
21:28:19.166246 dup2(4, 1) = 1 <0.000014><br>
21:28:19.166301 dup2(4, 2) = 2 <0.000014><br>
21:28:19.166355 close(4) = 0 <0.000014><br>
<br>
<pre class="moz-signature" cols="72">--
Web: <a class="moz-txt-link-freetext" href="http://www.billauer.co.il">http://www.billauer.co.il</a>
</pre>
</body>
</html>