Skip to content
  1. May 29, 2018
    • Alessandro Di Federico's avatar
      Introduce support for dynamic binaries · 31151436
      Alessandro Di Federico authored
      This commit introduces support for dynamic programs. The current
      implementation translate the main binary and uses native libraries. This
      works only if the target architecture is the same as the source
      one. Currently we only handle x86-64.
      
      * The `ExternalJumpsHandler` class has been introduced. It basically
        takes care of extending the dispatcher handling the case in which the
        program counter is an address outside the range of executable
        addresses of the input program. In this case, a `setjmp` is perfomed,
        the CPU state is serialized to physical registers and jump to the
        value of the program counter is performed.
      
        Once the target code will try to return to the translated program, a
        segmentation fault will be triggered, a `longjmp` is performed and the
        CPU state is deserialized so that the execution can resume (from the
        dispatcher).
      
      * `early-linked.c` has been introduced. Its purposes is to provide
        declarations of variables and functions defined in `support.c`. In the
        past, we had to manually create these definitions, a cumbersome and
        error prone we now avoid by letting `clang` compile `early-linked.c`
        and then linking it in.
      
      * The old `support.h` is now known as `commonconstants.h`. `support.h`
        now contains declarations that have to be consumed by
        `early-linked.c`.
      
      * Each architecture now provides additional information:
      
        1. Which registers are part of the ABI and have to be preserved. If
           necessary the QEMU name can be provided. For each register it's
           also possible to provide their position within the `mcontext_t`
           structure, provided by the signal handler.
        2. Three assembly snippets, one to write a register, one to read it
           and one perform an indirect jump.
      
        Some of this information is also exposed in the output module as
        metadata.
      
      * `support.c` now installs a SIGSEGV signal handler. Since pages that
        were originally executable are no longer executable, jumping there
        (typically, from a library) will trigger a SIGSEGV that we will
        handle. This allows us to properly deserialize the CPU state and
        resume execution of the translate code.
      
      * Now also a dynamic version of each test program is translated and
        tested.
      
      * The `merge-dynamic.py` script has been introduced: it takes case of
        rewriting the translated binary so to tell the linker to performe both
        the relocations of the translate program and the relocations of the
        original program. It does so by rewriting a large portion of the
        sections employed by the dynamic linker such as `.dynamic`, `.dynsym`
        and so on.
      
      * The `compile-time-constants.py` script has been introduced: it a
        user-specified compiler on a source file producing an object
        file. This object file is inspected and the value of global read-only
        variables is produced in a CSV.
      31151436
    • Alessandro Di Federico's avatar
      Introduce `printf` test using `%f` · 9c2bb85f
      Alessandro Di Federico authored
      This test will be useful to test that calls to external libraries using
      float arguments work as appropriate.
      9c2bb85f
    • Alessandro Di Federico's avatar
      support.c: ensure 256-bits alignment of the stack · 79013ee6
      Alessandro Di Federico authored
      This ensure floating point memory accesses work properly on x86. In the
      past this was working only in certain cases depending on the number of
      entries in the auxiliary vector, arguments and environment variables.
      79013ee6
    • Alessandro Di Federico's avatar
    • Alessandro Di Federico's avatar
      d169d37c
    • Alessandro Di Federico's avatar
      Output a CSV file with the required libraries · aa382551
      Alessandro Di Federico authored
      This commit makes `revamb` produce a new file `.ll.need.csv` containing
      a list of all the dynamic libraries required by the input program. This
      will be transformed by the 'csv-to-ld-options` (was:
      `li-csv-to-ld-options`) into the appropriate linking options.
      aa382551
    • Alessandro Di Federico's avatar
      0bd4204e
    • Alessandro Di Federico's avatar
      Convert `bool` arguments to `int` · 470ca264
      Alessandro Di Federico authored
      The `argparse` library treats boolean arguments as
      integers. Specifically, each time a boolean argument is meet the
      associated variable is incremented. This led to weird behaviors having
      `2` being converted to `false`. Using `int` as a type solves this issue.
      470ca264
    • Alessandro Di Federico's avatar
      Disable warning about virtual destructors · 5f8249ce
      Alessandro Di Federico authored
      This commit disables a warning emitted by recent clang versions that is
      triggered on libstdc++.
      5f8249ce
  2. Apr 22, 2018
    • Alessandro Di Federico's avatar
      3ec2148f
    • Andrea Gussoni's avatar
      Introduce the Function Isolation Pass · cf42e497
      Andrea Gussoni authored
      This commit introduces the Function Isolation Pass. We use the
      information provided by the Function Boundaries Detection Pass to
      organize the code that `revamb` places inside the `root` function in
      different LLVM functions. To do this we obviously need to introduce some
      changes and tricks to handle the execution of the translated program.
      
      The main idea is to have two different realms (one where the isolated
      functions live, one in which we have basically the old root function).
      We start the execution from the realm of the *non isolated* functions,
      and we transfer, as soon as possible, the execution to the *isolated
      functions* realm. We then have a fallback mechanism to restore the
      execution in the right place in the *non isolated* functions realm, and
      so on.
      
      The largest change, besides the re-organization of the code in different
      functions, is the use of the exception handling mechanism provided by
      the LLVM framework in order to be able to manage the switch between the
      two realms.
      
      We also introduce the `support.h` header file, which contains a couple
      of definitions used by `support.c` and that need to be shared with some
      of the components involved in the translation process. We have defined
      some helper functions, directly in C, that we use both for handling the
      exception mechanism and for giving extra debug informations when an
      exception is raised.
      
      The `revamb-dump` utility now supports the `-i` option to specify the
      path were to save the new LLVM module.
      
      The `translate` utility now supports the `-i` option that produces a
      binary in which the function isolation has been applied.
      
      We also introduced some tests that apply the function isolation pass to
      the `Runtime/` tests already present. In this way we can verify that the
      translation and the following function isolation preserve the behavior
      of the program.
      
      When serializing the new LLVM module we regenerate the metadata used for
      debug purposes, and for doing this, since we not longer have only the
      `root` function, we have changed some details in the `DebugHelper` class
      in order to be able to emit the metadata for all the functions of our
      interest in a single shot.
      cf42e497
  3. Mar 19, 2018
    • Andrea Gussoni's avatar
      Use `getCallTo` in `getBasicBlockPC` · 171db849
      Andrea Gussoni authored
      We now take advantage of the `getCallTo` helper function in the
      `getBasicBlockPC` function defined in `ir-helpes.h`
      171db849
    • Andrea Gussoni's avatar
      Fix `ReturnPC` type · 608b96b5
      Andrea Gussoni authored
      Solved a bug that always allocated an `i32` for representing the return
      address PC in the `function_call` helper (third parameter).
      
      Now we allocate an `i64` or an `i32` depending on the input architecture
      of the translated binary.
      608b96b5
  4. Feb 01, 2018
  5. Jan 28, 2018
  6. Jan 18, 2018
    • Pietro Fezzardi's avatar
      Remove useless type `GenericFunctor` · 5d9aa31c
      Pietro Fezzardi authored
      `GenericFunctor` is substituted with the `std::integral_constant`
      template. This also allows us to remove code that requires C++14.
      
      It also removes the now useless cmake tests on the compiler flag
      `-Wno-error=noexcept-type` that was introduced to disable fatal warnings
      on the type `GenericFunctor`.
      Now that this type has been removed the check is not necessary anymore,
      because the `std::integral_constant` template used now does not cause
      the warning.
      So we can go back to enabling the fatal warnings.
      5d9aa31c
  7. Jan 17, 2018
    • Andrea Gussoni's avatar
      Remove leading dot from segment variable names · 97ed77d2
      Andrea Gussoni authored
      Changed the names of the global variables (removed the leading `.`)
      representing the segments of the binary, in order to prevent errors with
      duplicated names when recompiling a binary with `llc` in debug mode.
      97ed77d2
    • Alessandro Di Federico's avatar
      Fix `li-csv-to-ld-options` warning message · 3aed484a
      Alessandro Di Federico authored
      We used to check if the value of `/proc/sys/vm/mmap_min_addr` is at
      least as high as the minimum segment of the input binary. If this is not
      the case the linked program will segfault at run-time without much
      explanation.
      
      However, in truth, we need to be able to map also the page before the
      lowest page the original binary mapped. The main reason for this is to
      have space for the (outer) ELF header.
      
      It turns out that on many distros the default minimum value is
      `0x10000`, which happens to be exactly the same address at which ARM
      binaries mmap their lowest page. This lead to no warning, but a segfault
      at run-time.
      
      The AWK script now checks for the correct value, and also suggests the
      correct value.
      
      In the future, we might want to create a new segment for the outer ELF
      header and position it elsewhere in the address space.
      3aed484a
    • Andrea Gussoni's avatar
      Refactored initialization of compilation flags · 4e871c6f
      Andrea Gussoni authored
      We now take advantage of a macro to add a series of compilation flags,
      macro that also takes care of checking that the flags are supported by
      the compiler.
      
      This patch has been developed by Alessandro Di Federico.
      4e871c6f
    • Andrea Gussoni's avatar
      Add checks for `no-pie` flag for cross-compilers · 0735acd2
      Andrea Gussoni authored
      The check to see if a compiler supports the `no-pie` flag was done only
      for the main C compiler, and not for the cross-compilers used for
      creating the executables for the different supported architectures.
      
      This commit introduces the aforementioned missing checks.
      
      In addition instead of hard-coding the flags to check in the CMakeLists
      file we have a list that we pass each time we instantiate a project for
      the cross-compilers, and we check for the availability of all the flags.
      
      In order to do this we need to apply a sort of serialization and
      deserialization to avoid the "unpack" of the list passed as argument to
      the external project (that is implemented as a `;` separated string).
      
      Also implemented a fix suggested in the merge request for a line that
      mistakenly added the `TEST_CFLAGS` variable to the `NO_PIE` variable.
      0735acd2
  8. Oct 29, 2017
    • Alessandro Di Federico's avatar
      Warn the user if `mmap_min_addr` is too high · abd3154c
      Alessandro Di Federico authored
      Many Linux distributions prevent programs from mapping memory pages at
      low addresses. This can lead the translated program to segfault without
      any additional explanation. This happens in particular with ARM
      binaries, which tend to have the first segment allocated at very low
      addresses.
      
      This behavior can be configured through
      `/proc/sys/vm/mmap_min_addr`. This commit introduces a warning to the
      user in the `li-csv-to-ld-options` script in case a segment with an
      address lower than `mmap_min_addr` is requested.
      abd3154c
  9. Oct 28, 2017
  10. Aug 28, 2017
    • Pietro Fezzardi's avatar
      Add `-no-pie` to compiler flags · 55eb769a
      Pietro Fezzardi authored
      Add this flag to the flags used for Runtime tests and to the flags used
      in the translate script.
      
      Recent GCC versions (`gcc-7` and later) enable PIE by default, and
      `-fno-pie` apparently is not enough to disable it.
      55eb769a
    • Pietro Fezzardi's avatar
      Make `-Wnoexcept-type` non-fatal, if present · 9824a1b6
      Pietro Fezzardi authored
      This warning was introduced with gcc-7 to (quoting the documentation)
      "Warn if the C++1z feature making noexcept part of a function type
      changes the mangled name of a symbol relative to C++14. Enabled by -Wabi
      and -Wc++1z-compat.".
      
      It is triggered from the `GenericFunctor` class template. It can be
      safely made not fatal, because this class template is not exposed
      outside and the whole project is currently compiled with the same C++
      standard compiler flags.
      
      This commit adds machinery to `CMakeLists.txt` to make the warning not
      fatal, but only if present. Disabling it when not present would trigger
      build errors.
      9824a1b6
  11. Aug 13, 2017
    • Alessandro Di Federico's avatar
      Fix handling of devirtualized calls · 069ae70d
      Alessandro Di Federico authored
      When we have an indirect call (or jump) we are sometimes able to
      identify one or more possible targets, therefore, as an optimization,
      before performing the indirect jump we check if the target is one of the
      expected ones.
      
      This optimization however was creating two issues with the handling of
      indirect function calls: 1) the call to the `function_call` marker was
      no longer positioned right before the terminator and 2) the function
      call was no longer identified as an indirect function call but as call
      to `anyPC`. This commit fixes these two issues.
      
      These issues have been identified thanks to a report from Andrea
      Gussoni.
      069ae70d
  12. Aug 12, 2017