[Haifux] Kernel oops, so what?

Sun Jan 17 13:22:03 MSK 2010

Thanks, Oron.

You didn't just join in, but finally gave the full answer I was looking for.

I tried to get a simple explanation on what these macros actually stand 
for, and couldn't find a single place where they were explained as 
simply as you put it.

   Eli

Oron Peled wrote:

> OK, I'll join in...
>
> On Friday, 15 בJanuary 2010 16:27:34 Eli Billauer wrote:
>   
>> Now I get an oops warning every now and then, but nothing really 
>> happens. And I wonder what is going on? Has the dreaded oops become 
>> something one can live with?
>>     
>
> Nothing really changed that much. An Oops is caused by illegal operation
> in kernel space (e.g: null pointer dereference, division by zero etc.)
> It always had two possible results:
>  1. If it happened from an interrupt context -- a panic (prints a
>     backtrace on the console and freeze the machine).
>  2. If it happened from a process context (e.g: in a system call), than
>     it's a "normal" Oops -- Don't PANIC ;-)
>     - The calling process is killed with a segfault (as we cannot continue
>       any valid computation).
>     - A normal context switch happens.
>     - Everything continue normally, with the *hope* that there the bug
>       did not cause big collateral damage (e.g: memory corruption)
>       outside of the failed computation.
>
> For many years, there were several kernel macros for triggering such
> events from the kernel programmer side:
>  - BUG() - causes an immediate panic
>  - BUG_ON(condition) - think of it as a kernel "assert()"
>  - WARN() - just the trace, no other ill effects
>  - WARN_ON() - you guessed it.
>
> [for an un-favorite example, try to create a work-queue with a name longer
>  than ~10 characters -- you'll get an immediate panic, you can see the
>  code somewhere in kernel/workqueue.c]
>
>   
>> And then there's this site which collects 
>> oops reports (http://www.kerneloops.org/) which, judging by its sluggish 
>> response, is a pretty busy project. Oopses keep flooding in.
>>     
>
> 1. That's an important project that gives a "heads-up" call for code
>    regressions.
> 2. Fedora-12 now installs it by default, and integrates it into a
>    new bug-reporting framework (ABRT, a very buggy software in itself ;-)
>    This enables sampling new kernels on a huge amount of configurations
>    and scenarios that was not possible before.
>
>   
>> So, should I just take it cool and wait for a new kernel with this 
>> fixed, ignoring these messages?
>>     
>
> Yes, unless you joined the "kernel digging" hobby. However, this
> collaborative "bug-collection" frameworks increase the chance
> for a quicker fix.
>
> Cheers,
>
>   

-- 
Web: http://www.billauer.co.il

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://haifux.org/pipermail/haifux/attachments/20100117/b49ae5c4/attachment.html