Starting point
Recently, a unit-test failed on one of the older build-machines which still uses more or less vanilla gcc 4.9.2.
After unwrapping the unit-test from the test-suite and removing
- templating
- classes
the test-case looks like this:
#include <cstddef> // size_t #include <cstdint> // uint16_t #include <climits> // CHAR_BIT #include <vector> void add_int16(std::vector<uint8_t> &v, uint16_t value, size_t length = sizeof(uint16_t)) { // reserve(size() + length); while(length-- > 0) { v.push_back(static_cast<uint8_t>(value)); value = static_cast<uint16_t>(value >> CHAR_BIT); } } int main() { std::vector<uint8_t> p; // push_back(0x03) // push_back(0x02) add_int16(p, static_cast<uint16_t>(0x0203)); // push_back(0x01) // push_back(0x00) add_int16(p, static_cast<uint16_t>(0x0001)); // return 0 if 'old' behaviour, 1 if 'new' behaviour // for git bisect return p[p.size()-1] == 0x00; }
I want to add a 16bit unsigned integer to a byte-stream and treat it as little endian. I expect:
0x03 0x02 0x01 0x00
But with gcc 4.9.2 and CXXFLAGS being -O2 I get:
0x03 0x02 0x01 0x01 ^^^^ WTF!
What is going on?
Using objdump allows to compare the generate assembly code.
$ objdump --disassemble -disassembler-options=intel --demangle gives us:
... # the loop is unrolled # push_back() calls emplace_back() in the end # <= 0x03 8051182: c6 44 24 12 03 mov BYTE PTR [esp+0x12],0x3 8051187: e8 44 fe ff ff call 8050fd0 <void std::vector<unsigned char, std::allocator<unsigned char> >::emplace_back<unsigned char>(unsigned char&&)> 805118c: 89 74 24 04 mov DWORD PTR [esp+0x4],esi 8051190: 89 1c 24 mov DWORD PTR [esp],ebx # <= 0x02 8051193: c6 44 24 12 02 mov BYTE PTR [esp+0x12],0x2 8051198: e8 33 fe ff ff call 8050fd0 <void std::vector<unsigned char, std::allocator<unsigned char> >::emplace_back<unsigned char>(unsigned char&&)> 805119d: 8d 74 24 13 lea esi,[esp+0x13] 80511a1: 89 74 24 04 mov DWORD PTR [esp+0x4],esi 80511a5: 89 1c 24 mov DWORD PTR [esp],ebx # <= 0x01 80511a8: c6 44 24 13 01 mov BYTE PTR [esp+0x13],0x1 80511ad: e8 1e fe ff ff call 8050fd0 <void std::vector<unsigned char, std::allocator<unsigned char> >::emplace_back<unsigned char>(unsigned char&&)> 80511b2: 89 74 24 04 mov DWORD PTR [esp+0x4],esi 80511b6: 89 1c 24 mov DWORD PTR [esp],ebx # ... where is the zero? 80511b9: e8 12 fe ff ff call 8050fd0 <void std::vector<unsigned char, std::allocator<unsigned char> >::emplace_back<unsigned char>(unsigned char&&)>
What we know about the bug
To narrow down the problem a few attempts have been made. The above code works if:
- other optimization levels are used: -O0, -O1, -O3 are fine
- a later GCC version is used: gcc 4.9.3-13ubuntu2 passes with -O2
- inlining on add_int16 is disabled with __attribute__((optimize("no-inline-small-functions")))
- the last byte to push_back() isn't 0x00
- memory is reserve()-ed before the push_back()
- the byte to push_pack() is assigned to a local variable
At this point it is hard to tell what real problem is and if applying a local fix to the unit-test like __attribute__((optimize(1))) is the right way. It makes the unit-test pass, but as this is a compiler bug it may sneak out in other places where the unit-test isn't as specific.
Narrowing down the fixed revision
There are a few options:
- ban known bad compilers
- apply a workaround for known bad compilers
But what are the known bad compilers?
- It is already known that gcc 4.9.3 with patches from Ubuntu works.
- A quick test with vanilla gcc 4.9.2 and gcc 4.9.3 built from source shows 4.9.2 is bad, 4.9.3 is good. No need to investigate the Ubuntu patches.
Somewhere between 4.9.2 and 4.9.3. Time to pull the sources from git (https://gcc.gnu.org/wiki/GitMirror) and let git bisect do its work.
git bisect
- takes a revision with the 'old' behaviour and a revision that has the 'new' behaviour
- splits that range in half and checks out that revision
- if the resulting build shows the 'old' behaviour, there is no need to check that side of revisions anymore
- rinse and repeat
Setting up git bisect based on the release tags from gcc:
$ cd gcc-bisect-src $ git bisect start $ git bisect old gcc-4_9_2-release $ git bisect new gcc-4_9_3-release Bisecting: 324 revisions left to test after this (roughly 8 steps)
Building gcc is done for each round with a script:
cd ../gcc-bisect-build # build and install gcc ../gcc-bisect-src/configure --prefix=`pwd`/../gcc-bisect-inst/ --disable-multilib --enable-languages=c,c++ make clean # we have 8 cores make -j 10 make install # make a test-build ../gcc-bisect-inst/bin/g++ -g -O2 -std=c++11 ~/pack_int_2.cc -o pack_int_2 # run the test: returns 0 on old behaviour, 1 on new behaviour ./pack_int_2
Giving the script to git bisect run and let it run for a while gives:
53ee2ca7887aa98bf163fc040ca2586cfc3acd9c is the first new commit commit 53ee2ca7887aa98bf163fc040ca2586cfc3acd9c Author: tejohnson <tejohnson@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Thu Nov 13 21:51:11 2014 +0000 2014-11-13 Teresa Johnson <tejohnson@google.com> gcc: PR tree-optimization/63841 * tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers. 2014-11-13 Teresa Johnson <tejohnson@google.com> gcc/testsuite: PR tree-optimization/63841 * testsuite/g++.dg/tree-ssa/pr63841.C: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch@217522 138bc75d-0d04-0410-961f-82ee72b054a4 :040000 040000 d7e96ffc45e0807734c8d5d0eaeb4427c5a82ac3 ce703eb89f8e89c48c53c082aaeef74781764456 M gcc bisect run success real 49m28.065s user 179m33.260s sys 7m55.424s
The workaround
The log message from git bisect points us to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63841 which shows a simpler test-case, but it matches the idea of our test sample.
Only
- 4.8.x with x < 4
- 4.9.x with x < 3
are affected.
In both cases -fdisable-tree-strlen disables the small part in the compiler that triggers this behaviour.
That can be used to apply a good workaround to the build-setup with cmake:
IF(CMAKE_CXX_COMPILER_ID EQUAL "GNU" AND ((CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "4.8" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.8.4") OR (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "4.9" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.9.3"))) # workaround https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63841 SET(CXXFLAGS "${CXXFLAGS} -fdisable-tree-strlen") ENDIF()
Comments