There's a common misconception that larger numbers behind the -O option might automatically cause "better" optimization. First, there's no universal definition for "better", with optimization often being a speed vs. code size trade off. See the detailed discussion for which option affects which part of the code generation.

A test case was run on an ATmega128 to judge the effect of compiling the library itself using different optimization levels. The following table lists the results. The test case consisted of around 2 KB of strings to sort. Test #1 used qsort() using the standard library strcmp(), test #2 used a function that sorted the strings by their size (thus had two calls to strlen() per invocation).

When comparing the resulting code size, it should be noted that a floating point version of fvprintf() was linked into the binary (in order to print out the time elapsed) which is entirely not affected by the different optimization levels, and added about 2.5 KB to the code.

Optimization flags

Size of .text

Time for test #1

Time for test #2

-O3

6898

903 µs

19.7 ms

-O2

6666

972 µs

20.1 ms

-Os

6618

955 µs

20.1 ms

-Os -mcall-prologues

6474

972 µs

20.1 ms

(The difference between 955 µs and 972 µs was just a single timer-tick, so take this with a grain of salt.)

So generally, it seems -Os -mcall-prologues is the most universal "best" optimization level. Only applications that need to get the last few percent of speed benefit from using -O3.

Back to FAQ Index.