[Haifux] Kernel oops, so what?

Oron Peled oron at actcom.co.il
Sun Jan 17 11:11:19 MSK 2010


OK, I'll join in...

On Friday, 15 בJanuary 2010 16:27:34 Eli Billauer wrote:
> Now I get an oops warning every now and then, but nothing really 
> happens. And I wonder what is going on? Has the dreaded oops become 
> something one can live with?

Nothing really changed that much. An Oops is caused by illegal operation
in kernel space (e.g: null pointer dereference, division by zero etc.)
It always had two possible results:
 1. If it happened from an interrupt context -- a panic (prints a
    backtrace on the console and freeze the machine).
 2. If it happened from a process context (e.g: in a system call), than
    it's a "normal" Oops -- Don't PANIC ;-)
    - The calling process is killed with a segfault (as we cannot continue
      any valid computation).
    - A normal context switch happens.
    - Everything continue normally, with the *hope* that there the bug
      did not cause big collateral damage (e.g: memory corruption)
      outside of the failed computation.

For many years, there were several kernel macros for triggering such
events from the kernel programmer side:
 - BUG() - causes an immediate panic
 - BUG_ON(condition) - think of it as a kernel "assert()"
 - WARN() - just the trace, no other ill effects
 - WARN_ON() - you guessed it.

[for an un-favorite example, try to create a work-queue with a name longer
 than ~10 characters -- you'll get an immediate panic, you can see the
 code somewhere in kernel/workqueue.c]

> And then there's this site which collects 
> oops reports (http://www.kerneloops.org/) which, judging by its sluggish 
> response, is a pretty busy project. Oopses keep flooding in.

1. That's an important project that gives a "heads-up" call for code
   regressions.
2. Fedora-12 now installs it by default, and integrates it into a
   new bug-reporting framework (ABRT, a very buggy software in itself ;-)
   This enables sampling new kernels on a huge amount of configurations
   and scenarios that was not possible before.

> So, should I just take it cool and wait for a new kernel with this 
> fixed, ignoring these messages?

Yes, unless you joined the "kernel digging" hobby. However, this
collaborative "bug-collection" frameworks increase the chance
for a quicker fix.

Cheers,

-- 
Oron Peled                                 Voice: +972-4-8228492
oron at actcom.co.il                  http://users.actcom.co.il/~oron
   __
  / /  (_)__  __ ____  __
 / /__/ / _ \/ // /\ \/ /  . . .  t h e   c h o i c e  o f   a
/____/_/_//_/\_,_/ /_/\_\              G N U   g e n e r a t i o n . . .



More information about the Haifux mailing list