Tuesday, March 27, 2012

Objects Initialization in C99

In a get-together party with some friends of linuxfb, we chatted over many interesting/boring/useful/non-sense topics. One of these topics made me think deeper and wanted to write down something about it.

The background is that, Li Kai (@leekayak) reported that he encountered a problem where an allocated memory area is not initialized to zero. And someone told him that it is because kernel returns uninitialized memory through brk(). And Coly (@colyli, @淘泊松) corrected him that it is impossible because ever since the early days of 0.9 version of Linux kernel, memory returned to user space is always initialized because otherwise there would be security issues. And the uninitialized memory may be returned by libc memory allocator which manages memory fractions on its own. Possibly the memory freed by application is not returned to kernel thus next time it is requested, it is not initialized.

The discussion ended here but later I started to think about when should a programmer initialize variables. After digging for a while, I found following:

Storage classes
Specifiers Lifetime Scope Default initializer
auto Block (stack) Block Uninitialized
register Block (stack or CPU register) Block Uninitialized
static Program Block or compilation unit Zero
extern Program Block or compilation unit Zero
(none)1 Dynamic (heap)
1 Allocated and deallocated using the malloc() and free() library functions.

And C99 standard further defines:

Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization.
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
  • if it has pointer type, it is initialized to a null pointer;
  • if it has arithmetic type, it is initialized to (positive or unsigned) zero;
  • if it is an aggregate, every member is initialized (recursively) according to these rules;
  • if it is a union, the first named member is initialized (recursively) according to these rules.

One interesting thing about the extra effort that compiler tries to do for programmers is that it may create memory holes and leaks. Look at the example code:
struct foo {
       int a,b;
} f = {.a=1,};
It is usually used this way because C99 will ensure that f.b is initialized to 0. But when it comes to:
struct foo {
       short a;
       int b;
} f = {.a=1,};
It also initializes f.b to 0 but will generate a two-byte hole between foo.a and foo.b on 32bit machines. It is usually OK but if the code is in kernel and f is about to be sent to user, it leaves a security hole. Therefore in such unaligned member case, one need to use memset() family to initialize the structure.