Starting point

Recently, a unit-test failed on one of the older build-machines which still uses more or less vanilla gcc 4.9.2.

After unwrapping the unit-test from the test-suite and removing

  • templating
  • classes

the test-case looks like this:

#include <cstddef>  // size_t
#include <cstdint>  // uint16_t
#include <climits>  // CHAR_BIT

#include <vector>

void add_int16(std::vector<uint8_t> &v, uint16_t value, size_t length = sizeof(uint16_t)) {
  // reserve(size() + length);
  while(length-- > 0) {
    v.push_back(static_cast<uint8_t>(value));
    value = static_cast<uint16_t>(value >> CHAR_BIT);
  }
}


int main() {
  std::vector<uint8_t> p;

  // push_back(0x03)
  // push_back(0x02)
  add_int16(p, static_cast<uint16_t>(0x0203));
  // push_back(0x01)
  // push_back(0x00)
  add_int16(p, static_cast<uint16_t>(0x0001));

  // return 0 if 'old' behaviour, 1 if 'new' behaviour
  // for git bisect
  return p[p.size()-1] == 0x00;
}

I want to add a 16bit unsigned integer to a byte-stream and treat it as little endian. I expect:

0x03 0x02 0x01 0x00

But with gcc 4.9.2 and CXXFLAGS being -O2 I get:

0x03 0x02 0x01 0x01
               ^^^^ WTF!

What is going on?

Using objdump allows to compare the generate assembly code.

$ objdump --disassemble -disassembler-options=intel --demangle gives us:

...
# the loop is unrolled
# push_back() calls emplace_back() in the end
# <= 0x03
8051182:       c6 44 24 12 03          mov    BYTE PTR [esp+0x12],0x3
8051187:       e8 44 fe ff ff          call   8050fd0 <void std::vector<unsigned char, std::allocator<unsigned char> >::emplace_back<unsigned char>(unsigned char&&)>
805118c:       89 74 24 04             mov    DWORD PTR [esp+0x4],esi
8051190:       89 1c 24                mov    DWORD PTR [esp],ebx

# <= 0x02
8051193:       c6 44 24 12 02          mov    BYTE PTR [esp+0x12],0x2
8051198:       e8 33 fe ff ff          call   8050fd0 <void std::vector<unsigned char, std::allocator<unsigned char> >::emplace_back<unsigned char>(unsigned char&&)>
805119d:       8d 74 24 13             lea    esi,[esp+0x13]
80511a1:       89 74 24 04             mov    DWORD PTR [esp+0x4],esi
80511a5:       89 1c 24                mov    DWORD PTR [esp],ebx

# <= 0x01
80511a8:       c6 44 24 13 01          mov    BYTE PTR [esp+0x13],0x1
80511ad:       e8 1e fe ff ff          call   8050fd0 <void std::vector<unsigned char, std::allocator<unsigned char> >::emplace_back<unsigned char>(unsigned char&&)>
80511b2:       89 74 24 04             mov    DWORD PTR [esp+0x4],esi
80511b6:       89 1c 24                mov    DWORD PTR [esp],ebx

# ... where is the zero?
80511b9:       e8 12 fe ff ff          call   8050fd0 <void std::vector<unsigned char, std::allocator<unsigned char> >::emplace_back<unsigned char>(unsigned char&&)>

What we know about the bug

To narrow down the problem a few attempts have been made. The above code works if:

  • other optimization levels are used: -O0, -O1, -O3 are fine
  • a later GCC version is used: gcc 4.9.3-13ubuntu2 passes with -O2
  • inlining on add_int16 is disabled with __attribute__((optimize("no-inline-small-functions")))
  • the last byte to push_back() isn't 0x00
  • memory is reserve()-ed before the push_back()
  • the byte to push_pack() is assigned to a local variable

At this point it is hard to tell what real problem is and if applying a local fix to the unit-test like __attribute__((optimize(1))) is the right way. It makes the unit-test pass, but as this is a compiler bug it may sneak out in other places where the unit-test isn't as specific.

Narrowing down the fixed revision

There are a few options:

  • ban known bad compilers
  • apply a workaround for known bad compilers

But what are the known bad compilers?

  • It is already known that gcc 4.9.3 with patches from Ubuntu works.
  • A quick test with vanilla gcc 4.9.2 and gcc 4.9.3 built from source shows 4.9.2 is bad, 4.9.3 is good. No need to investigate the Ubuntu patches.

Somewhere between 4.9.2 and 4.9.3. Time to pull the sources from git (https://gcc.gnu.org/wiki/GitMirror) and let git bisect do its work.

git bisect

  • takes a revision with the 'old' behaviour and a revision that has the 'new' behaviour
  • splits that range in half and checks out that revision
  • if the resulting build shows the 'old' behaviour, there is no need to check that side of revisions anymore
  • rinse and repeat

Setting up git bisect based on the release tags from gcc:

$ cd gcc-bisect-src
$ git bisect start
$ git bisect old gcc-4_9_2-release
$ git bisect new gcc-4_9_3-release
Bisecting: 324 revisions left to test after this (roughly 8 steps)

Building gcc is done for each round with a script:

cd ../gcc-bisect-build
# build and install gcc
../gcc-bisect-src/configure --prefix=`pwd`/../gcc-bisect-inst/ --disable-multilib --enable-languages=c,c++
make clean
# we have 8 cores
make -j 10
make install
# make a test-build
../gcc-bisect-inst/bin/g++ -g -O2 -std=c++11 ~/pack_int_2.cc -o pack_int_2
# run the test: returns 0 on old behaviour, 1 on new behaviour
./pack_int_2

Giving the script to git bisect run and let it run for a while gives:

53ee2ca7887aa98bf163fc040ca2586cfc3acd9c is the first new commit
commit 53ee2ca7887aa98bf163fc040ca2586cfc3acd9c
Author: tejohnson <tejohnson@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Nov 13 21:51:11 2014 +0000

    2014-11-13  Teresa Johnson  <tejohnson@google.com>

    gcc:
        PR tree-optimization/63841
        * tree-ssa-strlen.c (strlen_optimize_stmt): Ignore clobbers.

    2014-11-13  Teresa Johnson  <tejohnson@google.com>

    gcc/testsuite:
        PR tree-optimization/63841
        * testsuite/g++.dg/tree-ssa/pr63841.C: New test.


    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch@217522 138bc75d-0d04-0410-961f-82ee72b054a4

:040000 040000 d7e96ffc45e0807734c8d5d0eaeb4427c5a82ac3 ce703eb89f8e89c48c53c082aaeef74781764456 M      gcc
bisect run success

real    49m28.065s
user    179m33.260s
sys     7m55.424s

The workaround

The log message from git bisect points us to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63841 which shows a simpler test-case, but it matches the idea of our test sample.

Only

  • 4.8.x with x < 4
  • 4.9.x with x < 3

are affected.

In both cases -fdisable-tree-strlen disables the small part in the compiler that triggers this behaviour.

That can be used to apply a good workaround to the build-setup with cmake:

IF(CMAKE_CXX_COMPILER_ID EQUAL "GNU" AND
   ((CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "4.8" AND
     CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.8.4") OR
    (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "4.9" AND
     CMAKE_CXX_COMPILER_VERSION VERSION_LESS "4.9.3")))
  # workaround https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63841
  SET(CXXFLAGS "${CXXFLAGS} -fdisable-tree-strlen")
ENDIF()

Comments

Enable javascript to load comments.