[Haifux] Kernel oops, so what?
Eli Billauer
eli at billauer.co.il
Sun Jan 17 13:22:03 MSK 2010
Thanks, Oron.
You didn't just join in, but finally gave the full answer I was looking for.
I tried to get a simple explanation on what these macros actually stand
for, and couldn't find a single place where they were explained as
simply as you put it.
Eli
Oron Peled wrote:
> OK, I'll join in...
>
> On Friday, 15 בJanuary 2010 16:27:34 Eli Billauer wrote:
>
>> Now I get an oops warning every now and then, but nothing really
>> happens. And I wonder what is going on? Has the dreaded oops become
>> something one can live with?
>>
>
> Nothing really changed that much. An Oops is caused by illegal operation
> in kernel space (e.g: null pointer dereference, division by zero etc.)
> It always had two possible results:
> 1. If it happened from an interrupt context -- a panic (prints a
> backtrace on the console and freeze the machine).
> 2. If it happened from a process context (e.g: in a system call), than
> it's a "normal" Oops -- Don't PANIC ;-)
> - The calling process is killed with a segfault (as we cannot continue
> any valid computation).
> - A normal context switch happens.
> - Everything continue normally, with the *hope* that there the bug
> did not cause big collateral damage (e.g: memory corruption)
> outside of the failed computation.
>
> For many years, there were several kernel macros for triggering such
> events from the kernel programmer side:
> - BUG() - causes an immediate panic
> - BUG_ON(condition) - think of it as a kernel "assert()"
> - WARN() - just the trace, no other ill effects
> - WARN_ON() - you guessed it.
>
> [for an un-favorite example, try to create a work-queue with a name longer
> than ~10 characters -- you'll get an immediate panic, you can see the
> code somewhere in kernel/workqueue.c]
>
>
>> And then there's this site which collects
>> oops reports (http://www.kerneloops.org/) which, judging by its sluggish
>> response, is a pretty busy project. Oopses keep flooding in.
>>
>
> 1. That's an important project that gives a "heads-up" call for code
> regressions.
> 2. Fedora-12 now installs it by default, and integrates it into a
> new bug-reporting framework (ABRT, a very buggy software in itself ;-)
> This enables sampling new kernels on a huge amount of configurations
> and scenarios that was not possible before.
>
>
>> So, should I just take it cool and wait for a new kernel with this
>> fixed, ignoring these messages?
>>
>
> Yes, unless you joined the "kernel digging" hobby. However, this
> collaborative "bug-collection" frameworks increase the chance
> for a quicker fix.
>
> Cheers,
>
>
--
Web: http://www.billauer.co.il
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://haifux.org/pipermail/haifux/attachments/20100117/b49ae5c4/attachment.html
More information about the Haifux
mailing list