Strict aliasing in GCC

topics: ,

A certain angry1 but otherwise well-regarded git once posted a bit of a rant2 about a tool that his project depended heavily on.

This post is not quite a response to that, since (beside being many years late) as far as I know, it behaved completely differently then than it does now.3

However, the message does still get quoted, too often.

some context

The C language standard is clear:

If an object has its stored value accessed other than by an lvalue of an allowable type, the behavior is undefined.4

In other words, if you write code like this5:

int f(void) {
    int *ip;
    double d = 3.0;
    ip = &d;
    return *ip;
}

then your program has no defined behaviour; it is meaningless6. A conformant C implementation may interpret this code however it likes. In fact, since any behaviour is correct behaviour, a clever implementation can act as if this code code will never even be run — because if that does happen, anything the implementation does will be correct.

Was this a good decision on the part of the C standardisation committee? Arguably not, but that’s beside the point. This is C, and if you write C code, then this is a matter that you need to understand, or it will get you sooner or later.7

gcc

One of the most popular ever implementations of the C language is called GCC — the GNU C Compiler (or GNU Compiler Collection)8. GCC is an amazing piece of technology and an absolutely massive body of software depends on it. Not only does it comprehensively and correctly9 implement the C language according to the standard, it does so efficiently and it can even perform optimizations on C code. A trivial optimization would be simplifying expressions, such as 3+4 to 7; more complex optimizations include memory reuse and reordering instructions. All this, and it is made available for free and comes with all the legal protections of a GNU free software license.10

The GCC developers agreed that the above example was a particularly subtle C language trap to avoid, and so they introduced a compiler setting — -fno-strict-aliasing — which instructs the compiler to be gentle and assume that the code might be… mistaken. Its counterpart, -fstrict-aliasing, specifically tells GCC that you are confident that you haven’t written any such bad code, and that it can use that assumption in making optimizations.

default settings

Basically, with -fno-strict-aliasing, you are advising the compiler that you might have written some incorrect code, and to please be defensive with the optimizations it performs regarding aliasing.

Since this broken code has no required behaviour, both aggressively optimizing and cautious non-optimizing are compliant with the C language standard. In other words, GCC compiles C correctly per the standard with or without this option set.

However, code that only works with GCC with -fno-strict-aliasing is not correct C, and will likely be broken with a different correct C implementation.

Many people feel that -fno-strict-aliasing ought to be the default setting when compiling with GCC. I have news for those people: it is.

a typical sequence of events

C is difficult to write correctly. We do our best, but sometimes mistakes creep in. That’s okay: GCC is careful and seems to generally do what we want, even when we fail to express our intent properly. We don’t even notice when our code is incorrect, and quickly we come to depend on GCC’s clairvoyance.

But eventually, our program starts to grow big and clunky. We notice the start-up time. It doesn’t respond instantly to our input. That’s when, in the noble pursuit of faster execution, we enable optimizations with a setting like -O3.

The -fstrict-aliasing option is enabled at levels -O2, -O3, -Os.11

Uh oh — somebody failed to read the manual. The program crashed, the clients are angry, and the server room is on fire.

At this point, it is easy to assign blame to the compiler, especially when the aforementioned angry git’s message can be cited.

-fwrapv

Permit me a little detour, for a moment — I would like to provide another example.

If, in C, I try to store the value of the expression INT_MAX + INT_MAX12 into an object of type int, what should happen?

The C language standard says plainly that overflowing the maximum bound of an integer type is undefined behaviour. A machine that does anything (or nothing!) is therefore compliant with everything the specification demands.

In an obvious case like this, the compiler could statically determine that the result would overflow. It could stop in its tracks and advise me that I’m doing something silly. However, it is not required to do this.

#include <limits.h>

int main(void) {
    return INT_MAX + INT_MAX;
}

GCC doesn’t actually prevent me from doing this, but it does alert me that something is amiss:

overblown.c: In function 'main':
overblown.c:4:20: warning: integer overflow in expression of type 'int' results in '-2' [-Woverflow]
    4 |     return INT_MAX + INT_MAX;
      |                    ^

One possible behaviour is that the compiler could define a requirement for what will happen. To do this would go above and beyond what the C language standard requires.

GCC, for which “above and beyond” is basically the modus operandi, offers the option to define the semantics of integer overflow to wrap-around using twos-complement. All you need to do is pass -fwrapv.13 Thanks GCC!

the point

GCC is not to be blamed14 for the consequences of C’s strict aliasing rules — it does the correct thing in all cases. It correctly implements C, and, by default, even takes extra care when presented with broken code.

Then users complain that GCC does something unsafe with their broken program, after telling GCC to apply the standard’s aliasing rules in the strictest possible way to produce faster code.

There are two parties that can reasonably be blamed here: the programmer who wrote the incorrect program, and the standardisation committee that decided to make the language unsafe. GCC is not at fault.

Turning on -fno-strict-aliasing is a perfectly reasonable decision, especially if you are not confident that your program is correct.

more from the friends of danso:

I Can't Sleep

February 12

Me: "Seth Rogan?" Wife: "Yeah, what about him?" Me: "He's one of the good ones" Wife: "Ah! Good. I always liked him." It's been a mad couple of months in this house. It probably started on New Years Eve…

via Searching For Tao

Simple Precedence

February 4

A discussion between Jonathan Blow and Casey Muratori on the handling of precedence in Jon’s compiler recently popped in my YouTube feed. The discussion is three hours long and focuses on implementing operator precedence more easily and more simply in Jai…

via Reasonable Performance

Iron, man

December 31

Did you know that iron deficiency is the most common nutritional deficiency in the world? I did not. What’s weird about it is that while there are many symptoms, they can be misconstrued as signs stemming from other causes. Tired in the afternoon? Oh well…

via Hey Heather, it’s me again.

generated by openring