Skip to content
Snippets Groups Projects
  1. Nov 14, 2018
  2. Oct 03, 2018
    • Alessandro Di Federico's avatar
      Abandon argparse in favor of LLVM's `CommandLine` · cc02713f
      Alessandro Di Federico authored
      This commit dismisses the `argparse` library (the only non-runtime C
      component of rev.ng) in favor of LLVM's CommandLine library, which
      offers several benefits. Among others, now command line arguments can be
      easily specified as a global variable, decentralizing their management
      and avoiding the long list of arguments in the constructor of singleton
      objects such as `CodeGenerator`.
      cc02713f
    • Alessandro Di Federico's avatar
      Move and rename all files · 076aeac7
      Alessandro Di Federico authored
      This commit moves around most files. The new directory structure is as
      follows:
      
      * `lib/$LIBRARY/`: contains a library, i.e., a set of `.cpp` files used
        by multiple libraries/tools.
      * `include/revng/$LIBRARY/`: contains the public headers associated to
        the library in `lib/$LIBRARY/`.
      * `tools/$TOOL/`: directory where all the `.cpp` files (and private
        headers) for a tool reside. Currently we have two tools: `revamb` and
        `revamb-dump`.
      
      On top of this, all file names are now in camel case.
      076aeac7
  3. Sep 29, 2018
  4. Sep 18, 2018
    • Alessandro Di Federico's avatar
      Rewrite the stack and introduce the ABI analyses · 7fe00c08
      Alessandro Di Federico authored
      This is a very large commit importing the reviewed (and heavily
      simplified) stack analysis and the new ABI analysis, which provides
      information on the calling convention of each function and so on.
      
      For an overview of the new analyses please consult OVERVIEW.md.
      7fe00c08
  5. Aug 31, 2018
    • Alessandro Di Federico's avatar
      CMake: one argument per line · fb7ad640
      Alessandro Di Federico authored
      This commit simply reduces the length of lines in CMake by splitting the
      statements over multiple lines. This is particularly useful when listing
      the translation units composing a program/library. In fact, it makes
      merge much easier.
      fb7ad640
    • Alessandro Di Federico's avatar
      Doxygen: parse files from `git ls-files` · 82ab7869
      Alessandro Di Federico authored
      Doxygen doc used to look for source files in the source root directory
      only. We now use `git ls-files` to figure out which files need to be
      part of the documentation.
      82ab7869
  6. Aug 18, 2018
    • Alessandro Di Federico's avatar
      Introduce new assertion framework · a0f4e0bb
      Alessandro Di Federico authored
      A set of assertion-related functions has been introduced:
      
      * `revng_abort(message)`: aborts, in release builds too.
      * `revng_check(what, message)`: asserts `what`, in release builds
        too. Also emits a `__builtin_assume`, that can lead to additional
        optimizations in clang.
      * `revng_unreahcable(message)`: identical to `revng_abort`, but in
        release builds emits `__built_unreachable`.
      * `revng_assert(what, message)`: asserts in debug builds, otherwise
        emits `sizeof(what)` (to suppress unused variable warnings) and
        `__builtin_assume`.
      
      The adoption of these function has the following benefits:
      
      * Nice stack traces.
      * The developer can choose to enforce an `assert` (or an `unreachable`)
        at release-time too by using `check`/`abort`.
      * Most warnings about unused variables in release mode should be gone.
      * When using clang, the `assert`s become `assume`s, which might enable
        additional optimizations (with no run-time costs).
      * The `assert(Condition && "Reason")` trick is no longer needed, we now
        have a proper argument.
      a0f4e0bb
  7. Jun 12, 2018
  8. May 31, 2018
    • Pietro Fezzardi's avatar
      Add `CPUStateAccessAnalysisPass` · f90f9451
      Pietro Fezzardi authored
      This commit adds a new analysis pass: `CPUStateAccessAnalysisPass`.
      
      This pass currently performs 4 operations.
      
      1. A preliminary analysis of the call graph, to select the functions
         that are reachable from the root function through direct calls.  All
         the other performed operations are executed on this set of reachable
         functions.
      
      2. An interprocedural forward taint analysis, starting from the uses of
         `env`, the global variable pointing to the QEMU struct continaint the
         CPU. This analysis taints all the instructions that use the address
         of `env`, until a load or a store is met. If a load or a store uses a
         tainted Value as address it means that it is accessing a CSV at a
         given offset (which at this point is still unknown).
      
      3. An interprocedural offset analysis, which deduces the possible
         offsets used by every tainted load/store to access the CSV. This
         analysis initially works backwards, exploring all the Values that
         contribute at the computation of the addresses used by tainted
         load/stores. Once it finds all the sources, it starts propagating the
         values forward, collecting the offsets computed along the way. It
         does this until it reaches the tainted load/stores again. At that
         point the analysis knows all the possible offsets used by each
         tainted load/store to access the CPU state.
      
      4. The results of the previous steps are used to do 3 things:
      
        * marking all the indirect calls with tainted arguments as illegal;
          this is necessary because those calls may access the CPU State in
          unpredictable ways;
        * attaching metadata to all the call sites to QEMU helpers in the root
          function; these metadata provide information on which parts of the
          CPU State may be accessed from that call site, which is a
          potentially useful information for users of libtinycode that we also
          plan to use in other parts of revamb;
        * substituting loads, stores, and memcpys to and from the CPU state
          with accesses to global variables; this operation effectively
          replaces what was previously done by the CorrectCPUStateUsagePass,
          which is now obsolete and was removed in this commit.
      f90f9451
  9. May 30, 2018
    • Pietro Fezzardi's avatar
      Add `-Wno-error=unused-local-typedefs` · 83992ba4
      Pietro Fezzardi authored
      83992ba4
    • Alessandro Di Federico's avatar
      Introduce statistics · 0dc77e10
      Alessandro Di Federico authored
      This class introduces the `RunningStatistics` class, which allows to
      compute the mean and standard deviation of a set of numbers. These
      values are computed incrementally and can be associated to a name. The
      values computed by `RunningStatistics` can be dumped upon regular
      program termination, `SIGABRT` and `SIGINT`. In practice they are
      printed at the end of the program execution, even in case of asserts and
      `Ctrl + C`. Moreover, `SIGUSR1` is used to trigger printing the
      statistics without crashing the program.
      0dc77e10
    • Alessandro Di Federico's avatar
      Force strict aliasing · 2f5d8b77
      Alessandro Di Federico authored
      `reinterpret_cast`s can lead to undefined behaviors, since the compiler
      can assume that each object will be accessed exclusively through
      pointers of a single type.
      
      Enabling strict aliasing and its warnings, even in debug builds, allows
      us to catch this kind of problems earlier.
      2f5d8b77
  10. May 29, 2018
    • Alessandro Di Federico's avatar
      Introduce support for dynamic binaries · 61cfbdfc
      Alessandro Di Federico authored
      This commit introduces support for dynamic programs. The current
      implementation translate the main binary and uses native libraries. This
      works only if the target architecture is the same as the source
      one. Currently we only handle x86-64.
      
      * The `ExternalJumpsHandler` class has been introduced. It basically
        takes care of extending the dispatcher handling the case in which the
        program counter is an address outside the range of executable
        addresses of the input program. In this case, a `setjmp` is perfomed,
        the CPU state is serialized to physical registers and jump to the
        value of the program counter is performed.
      
        Once the target code will try to return to the translated program, a
        segmentation fault will be triggered, a `longjmp` is performed and the
        CPU state is deserialized so that the execution can resume (from the
        dispatcher).
      
      * `early-linked.c` has been introduced. Its purposes is to provide
        declarations of variables and functions defined in `support.c`. In the
        past, we had to manually create these definitions, a cumbersome and
        error prone we now avoid by letting `clang` compile `early-linked.c`
        and then linking it in.
      
      * The old `support.h` is now known as `commonconstants.h`. `support.h`
        now contains declarations that have to be consumed by
        `early-linked.c`.
      
      * Each architecture now provides additional information:
      
        1. Which registers are part of the ABI and have to be preserved. If
           necessary the QEMU name can be provided. For each register it's
           also possible to provide their position within the `mcontext_t`
           structure, provided by the signal handler.
        2. Three assembly snippets, one to write a register, one to read it
           and one perform an indirect jump.
      
        Some of this information is also exposed in the output module as
        metadata.
      
      * `support.c` now installs a SIGSEGV signal handler. Since pages that
        were originally executable are no longer executable, jumping there
        (typically, from a library) will trigger a SIGSEGV that we will
        handle. This allows us to properly deserialize the CPU state and
        resume execution of the translate code.
      
      * Now also a dynamic version of each test program is translated and
        tested.
      
      * The `merge-dynamic.py` script has been introduced: it takes case of
        rewriting the translated binary so to tell the linker to performe both
        the relocations of the translate program and the relocations of the
        original program. It does so by rewriting a large portion of the
        sections employed by the dynamic linker such as `.dynamic`, `.dynsym`
        and so on.
      
      * The `compile-time-constants.py` script has been introduced: it a
        user-specified compiler on a source file producing an object
        file. This object file is inspected and the value of global read-only
        variables is produced in a CSV.
      61cfbdfc
    • Alessandro Di Federico's avatar
      Output a CSV file with the required libraries · aa382551
      Alessandro Di Federico authored
      This commit makes `revamb` produce a new file `.ll.need.csv` containing
      a list of all the dynamic libraries required by the input program. This
      will be transformed by the 'csv-to-ld-options` (was:
      `li-csv-to-ld-options`) into the appropriate linking options.
      aa382551
    • Alessandro Di Federico's avatar
      Disable warning about virtual destructors · 5f8249ce
      Alessandro Di Federico authored
      This commit disables a warning emitted by recent clang versions that is
      triggered on libstdc++.
      5f8249ce
  11. Apr 22, 2018
    • Andrea Gussoni's avatar
      Introduce the Function Isolation Pass · cf42e497
      Andrea Gussoni authored
      This commit introduces the Function Isolation Pass. We use the
      information provided by the Function Boundaries Detection Pass to
      organize the code that `revamb` places inside the `root` function in
      different LLVM functions. To do this we obviously need to introduce some
      changes and tricks to handle the execution of the translated program.
      
      The main idea is to have two different realms (one where the isolated
      functions live, one in which we have basically the old root function).
      We start the execution from the realm of the *non isolated* functions,
      and we transfer, as soon as possible, the execution to the *isolated
      functions* realm. We then have a fallback mechanism to restore the
      execution in the right place in the *non isolated* functions realm, and
      so on.
      
      The largest change, besides the re-organization of the code in different
      functions, is the use of the exception handling mechanism provided by
      the LLVM framework in order to be able to manage the switch between the
      two realms.
      
      We also introduce the `support.h` header file, which contains a couple
      of definitions used by `support.c` and that need to be shared with some
      of the components involved in the translation process. We have defined
      some helper functions, directly in C, that we use both for handling the
      exception mechanism and for giving extra debug informations when an
      exception is raised.
      
      The `revamb-dump` utility now supports the `-i` option to specify the
      path were to save the new LLVM module.
      
      The `translate` utility now supports the `-i` option that produces a
      binary in which the function isolation has been applied.
      
      We also introduced some tests that apply the function isolation pass to
      the `Runtime/` tests already present. In this way we can verify that the
      translation and the following function isolation preserve the behavior
      of the program.
      
      When serializing the new LLVM module we regenerate the metadata used for
      debug purposes, and for doing this, since we not longer have only the
      `root` function, we have changed some details in the `DebugHelper` class
      in order to be able to emit the metadata for all the functions of our
      interest in a single shot.
      cf42e497
  12. Jan 28, 2018
    • Thorbjörn Schulz's avatar
      Added i386 support · 2f55d4ba
      Thorbjörn Schulz authored
      Added the necessary information for i386 support and a call to a helper
      function initializing the global descriptor table at runtime.
      2f55d4ba
  13. Jan 18, 2018
    • Pietro Fezzardi's avatar
      Remove useless type `GenericFunctor` · 5d9aa31c
      Pietro Fezzardi authored
      `GenericFunctor` is substituted with the `std::integral_constant`
      template. This also allows us to remove code that requires C++14.
      
      It also removes the now useless cmake tests on the compiler flag
      `-Wno-error=noexcept-type` that was introduced to disable fatal warnings
      on the type `GenericFunctor`.
      Now that this type has been removed the check is not necessary anymore,
      because the `std::integral_constant` template used now does not cause
      the warning.
      So we can go back to enabling the fatal warnings.
      5d9aa31c
  14. Jan 17, 2018
    • Andrea Gussoni's avatar
      Refactored initialization of compilation flags · 4e871c6f
      Andrea Gussoni authored
      We now take advantage of a macro to add a series of compilation flags,
      macro that also takes care of checking that the flags are supported by
      the compiler.
      
      This patch has been developed by Alessandro Di Federico.
      4e871c6f
    • Andrea Gussoni's avatar
      Add checks for `no-pie` flag for cross-compilers · 0735acd2
      Andrea Gussoni authored
      The check to see if a compiler supports the `no-pie` flag was done only
      for the main C compiler, and not for the cross-compilers used for
      creating the executables for the different supported architectures.
      
      This commit introduces the aforementioned missing checks.
      
      In addition instead of hard-coding the flags to check in the CMakeLists
      file we have a list that we pass each time we instantiate a project for
      the cross-compilers, and we check for the availability of all the flags.
      
      In order to do this we need to apply a sort of serialization and
      deserialization to avoid the "unpack" of the list passed as argument to
      the external project (that is implemented as a `;` separated string).
      
      Also implemented a fix suggested in the merge request for a line that
      mistakenly added the `TEST_CFLAGS` variable to the `NO_PIE` variable.
      0735acd2
  15. Oct 28, 2017
  16. Aug 28, 2017
    • Pietro Fezzardi's avatar
      Make `-Wnoexcept-type` non-fatal, if present · 9824a1b6
      Pietro Fezzardi authored
      This warning was introduced with gcc-7 to (quoting the documentation)
      "Warn if the C++1z feature making noexcept part of a function type
      changes the mangled name of a symbol relative to C++14. Enabled by -Wabi
      and -Wc++1z-compat.".
      
      It is triggered from the `GenericFunctor` class template. It can be
      safely made not fatal, because this class template is not exposed
      outside and the whole project is currently compiled with the same C++
      standard compiler flags.
      
      This commit adds machinery to `CMakeLists.txt` to make the warning not
      fatal, but only if present. Disabling it when not present would trigger
      build errors.
      9824a1b6
  17. Aug 12, 2017
  18. Apr 21, 2017
    • Alessandro Di Federico's avatar
      Fix GCC 6.3.0 warnings · 24c1df35
      Alessandro Di Federico authored
      This commit fixes some warnings given by GCC 6.3.0.
      
      * Some `assert(false)` are not recognized as `noreturn`ing. They have
        been replaced with `llvm_unreachable`.
      * Added `-Wno-ignored-attributes`: attributes are not part the function
        name mangling, and therefore they might create some problems when they
        are involved in template arguments. We don't care.
      * Specializations of `readPointer` functions in `binaryfile.h` are now
        `inline`, so they don't appear as "unused" functions.
      24c1df35
  19. Mar 31, 2017
    • Alessandro Di Federico's avatar
      Install documentation · d4168436
      Alessandro Di Federico authored
      This commit introduces a docs target which translates `.rst` files into
      man pages or HTML documents and installs them in `/usr/share/man/man1`
      or `/usr/share/doc/revamb`.
      d4168436
    • Alessandro Di Federico's avatar
      Fix typo in CMakeLists.txt · e540400f
      Alessandro Di Federico authored
      To compare strings, `STREQUAL` should be used, not `EQUAL`. This
      prevented some inaccurate GCC warnings to be considered as non-errors.
      e540400f
  20. Mar 02, 2017
    • Alessandro Di Federico's avatar
      Compile `support.c` to LLVM IR · dae2f7e6
      Alessandro Di Federico authored
      `support.c` used to be compiled using the system compiler and then
      linked to the module generated by `revamb` as a separate translation
      unit. This commit introduces a change that lets `clang` compile
      `support.c`. This will allow us to make the CSV static, which should
      enable more aggressive optimizations.
      
      * Change the signature of the `root` function so that it accepts an
        argument: the initial value of the stack pointer, which the main is
        supposed to set up. QEMU now provides us with the offset of the stack
        pointer.
      * Let the build system compile `support.c` for each supported
        architecture, both in normal and "tracing" mode.
      * Remove the `--tracing` option, this is now handled by `support.c`, in
        particular depending on which version of `support.c` you link, you can
        have tracing enabled or not.
      * In `support.c` drop global variables representing the stack pointer,
        we no longer need them.
      * In `support.c` fix some warnings while handling the stack on 32-bit
        architectures.
      * Extende the `translate` script to handle the new way we link the final
        binary and the tracing mechanism.
      dae2f7e6
  21. Dec 08, 2016
    • Alessandro Di Federico's avatar
      Introduce the GCBI and FCI passes · 5c619ab0
      Alessandro Di Federico authored
      This commit introduces two new passes:
      
      * `GeneratedCodeBasicInfo`: recovers from the IR some basic information
        like the size of delay slots in the input architecture, the name of
        the program counter and so on. It can also identify the type of a
        basic block (e.g., dispatcher, jump target...).  *
      * `FunctionCallIdentification`: identifies function calls and injects a
        marker before the associated terminator instruction.
      
      The idea of these two passes is to try to progressively move information
      we used to keep in `JumpTargetManager` into the IR, so that it is more
      easily accessible and passes do not need a reference to `JTM`.
      
      In particular by having markers for function calls available during jump
      target discovery we don't have to have duplicated and suboptimal
      implementation of `isCall`.
      
      This commit also introduce some additional helper functions and an
      helper class to quickly.
      5c619ab0
  22. Dec 03, 2016
    • Alessandro Di Federico's avatar
      Introduce `revamb-dump` · e577f74b
      Alessandro Di Federico authored
      `revamb-dump` is a tool to extract various information from the LLVM IR
      generated by `revamb` and output them in a more human-friendly format,
      typically CSV. The main source of information are the various metadata.
      
      Currently `revamb-dump` can collect the CFG, function boundaries and
      `noreturn` functions.
      e577f74b
    • Alessandro Di Federico's avatar
      Isolate ELF code and remove architecture parameter · 83ea2caa
      Alessandro Di Federico authored
      This commit removes all the ELF-specific code from the `CodeGenerator`
      class by creating a new class, `BinaryFile` which contains all the
      information about the program that might be needed in an image format
      independent way. However, `BinaryFile` has some fields which are
      specific to ELF, we might want to address this when additional file
      formats are supported.
      
      A key benefit of isolating this code is that we can anticipate the
      parsing of the input file, so that we have its architecture available
      earlier than when `CodeGenerator` is instantiated, therefore we can drop
      the `--architecture` parameter.
      83ea2caa
  23. Sep 22, 2016
    • Alessandro Di Federico's avatar
      Improve installation · d4871549
      Alessandro Di Federico authored
      * Use "$ORIGIN/../lib/" as RPATH when linking the installed binary
      * Install also support material such as "support.c"
      * Import the `translate` script for easy end-to-end translation
      d4871549
    • Alessandro Di Federico's avatar
      Make revamb portable · 59c871af
      Alessandro Di Federico authored
      Add different search paths for QEMU components, in paritcular relative
      to the program's path.
      Also, install the revamb.
      59c871af
  24. Sep 21, 2016
  25. Sep 20, 2016
Loading