- Nov 14, 2018
-
-
Alessandro Di Federico authored
-
- Oct 03, 2018
-
-
Alessandro Di Federico authored
This commit dismisses the `argparse` library (the only non-runtime C component of rev.ng) in favor of LLVM's CommandLine library, which offers several benefits. Among others, now command line arguments can be easily specified as a global variable, decentralizing their management and avoiding the long list of arguments in the constructor of singleton objects such as `CodeGenerator`.
-
Alessandro Di Federico authored
This commit moves around most files. The new directory structure is as follows: * `lib/$LIBRARY/`: contains a library, i.e., a set of `.cpp` files used by multiple libraries/tools. * `include/revng/$LIBRARY/`: contains the public headers associated to the library in `lib/$LIBRARY/`. * `tools/$TOOL/`: directory where all the `.cpp` files (and private headers) for a tool reside. Currently we have two tools: `revamb` and `revamb-dump`. On top of this, all file names are now in camel case.
-
- Sep 29, 2018
-
-
Alessandro Di Federico authored
This commit checks if valgrind headers are available and, if so, defines a macro. In this way, if the heders are not available, `valgrindhelper.h` can be aware of this fact and become a no-op.
-
Alessandro Di Federico authored
Many of the previous commits are aimed at suppressing several compiler warnings. This commit updates the set of warning we want to have.
-
Alessandro Di Federico authored
We'd switch to C++17, but we currently aim at supporting building revamb with Ubuntu 16.04, whose default version of GCC doesn't support C++17.
-
- Sep 18, 2018
-
-
Alessandro Di Federico authored
This is a very large commit importing the reviewed (and heavily simplified) stack analysis and the new ABI analysis, which provides information on the calling convention of each function and so on. For an overview of the new analyses please consult OVERVIEW.md.
-
- Aug 31, 2018
-
-
Alessandro Di Federico authored
This commit simply reduces the length of lines in CMake by splitting the statements over multiple lines. This is particularly useful when listing the translation units composing a program/library. In fact, it makes merge much easier.
-
Alessandro Di Federico authored
Doxygen doc used to look for source files in the source root directory only. We now use `git ls-files` to figure out which files need to be part of the documentation.
-
- Aug 18, 2018
-
-
Alessandro Di Federico authored
A set of assertion-related functions has been introduced: * `revng_abort(message)`: aborts, in release builds too. * `revng_check(what, message)`: asserts `what`, in release builds too. Also emits a `__builtin_assume`, that can lead to additional optimizations in clang. * `revng_unreahcable(message)`: identical to `revng_abort`, but in release builds emits `__built_unreachable`. * `revng_assert(what, message)`: asserts in debug builds, otherwise emits `sizeof(what)` (to suppress unused variable warnings) and `__builtin_assume`. The adoption of these function has the following benefits: * Nice stack traces. * The developer can choose to enforce an `assert` (or an `unreachable`) at release-time too by using `check`/`abort`. * Most warnings about unused variables in release mode should be gone. * When using clang, the `assert`s become `assume`s, which might enable additional optimizations (with no run-time costs). * The `assert(Condition && "Reason")` trick is no longer needed, we now have a proper argument.
-
- Jun 12, 2018
-
-
Pietro Fezzardi authored
-
- May 31, 2018
-
-
Pietro Fezzardi authored
This commit adds a new analysis pass: `CPUStateAccessAnalysisPass`. This pass currently performs 4 operations. 1. A preliminary analysis of the call graph, to select the functions that are reachable from the root function through direct calls. All the other performed operations are executed on this set of reachable functions. 2. An interprocedural forward taint analysis, starting from the uses of `env`, the global variable pointing to the QEMU struct continaint the CPU. This analysis taints all the instructions that use the address of `env`, until a load or a store is met. If a load or a store uses a tainted Value as address it means that it is accessing a CSV at a given offset (which at this point is still unknown). 3. An interprocedural offset analysis, which deduces the possible offsets used by every tainted load/store to access the CSV. This analysis initially works backwards, exploring all the Values that contribute at the computation of the addresses used by tainted load/stores. Once it finds all the sources, it starts propagating the values forward, collecting the offsets computed along the way. It does this until it reaches the tainted load/stores again. At that point the analysis knows all the possible offsets used by each tainted load/store to access the CPU state. 4. The results of the previous steps are used to do 3 things: * marking all the indirect calls with tainted arguments as illegal; this is necessary because those calls may access the CPU State in unpredictable ways; * attaching metadata to all the call sites to QEMU helpers in the root function; these metadata provide information on which parts of the CPU State may be accessed from that call site, which is a potentially useful information for users of libtinycode that we also plan to use in other parts of revamb; * substituting loads, stores, and memcpys to and from the CPU state with accesses to global variables; this operation effectively replaces what was previously done by the CorrectCPUStateUsagePass, which is now obsolete and was removed in this commit.
-
- May 30, 2018
-
-
Pietro Fezzardi authored
-
Alessandro Di Federico authored
This class introduces the `RunningStatistics` class, which allows to compute the mean and standard deviation of a set of numbers. These values are computed incrementally and can be associated to a name. The values computed by `RunningStatistics` can be dumped upon regular program termination, `SIGABRT` and `SIGINT`. In practice they are printed at the end of the program execution, even in case of asserts and `Ctrl + C`. Moreover, `SIGUSR1` is used to trigger printing the statistics without crashing the program.
-
Alessandro Di Federico authored
`reinterpret_cast`s can lead to undefined behaviors, since the compiler can assume that each object will be accessed exclusively through pointers of a single type. Enabling strict aliasing and its warnings, even in debug builds, allows us to catch this kind of problems earlier.
-
- May 29, 2018
-
-
Alessandro Di Federico authored
This commit introduces support for dynamic programs. The current implementation translate the main binary and uses native libraries. This works only if the target architecture is the same as the source one. Currently we only handle x86-64. * The `ExternalJumpsHandler` class has been introduced. It basically takes care of extending the dispatcher handling the case in which the program counter is an address outside the range of executable addresses of the input program. In this case, a `setjmp` is perfomed, the CPU state is serialized to physical registers and jump to the value of the program counter is performed. Once the target code will try to return to the translated program, a segmentation fault will be triggered, a `longjmp` is performed and the CPU state is deserialized so that the execution can resume (from the dispatcher). * `early-linked.c` has been introduced. Its purposes is to provide declarations of variables and functions defined in `support.c`. In the past, we had to manually create these definitions, a cumbersome and error prone we now avoid by letting `clang` compile `early-linked.c` and then linking it in. * The old `support.h` is now known as `commonconstants.h`. `support.h` now contains declarations that have to be consumed by `early-linked.c`. * Each architecture now provides additional information: 1. Which registers are part of the ABI and have to be preserved. If necessary the QEMU name can be provided. For each register it's also possible to provide their position within the `mcontext_t` structure, provided by the signal handler. 2. Three assembly snippets, one to write a register, one to read it and one perform an indirect jump. Some of this information is also exposed in the output module as metadata. * `support.c` now installs a SIGSEGV signal handler. Since pages that were originally executable are no longer executable, jumping there (typically, from a library) will trigger a SIGSEGV that we will handle. This allows us to properly deserialize the CPU state and resume execution of the translate code. * Now also a dynamic version of each test program is translated and tested. * The `merge-dynamic.py` script has been introduced: it takes case of rewriting the translated binary so to tell the linker to performe both the relocations of the translate program and the relocations of the original program. It does so by rewriting a large portion of the sections employed by the dynamic linker such as `.dynamic`, `.dynsym` and so on. * The `compile-time-constants.py` script has been introduced: it a user-specified compiler on a source file producing an object file. This object file is inspected and the value of global read-only variables is produced in a CSV.
-
Alessandro Di Federico authored
This commit makes `revamb` produce a new file `.ll.need.csv` containing a list of all the dynamic libraries required by the input program. This will be transformed by the 'csv-to-ld-options` (was: `li-csv-to-ld-options`) into the appropriate linking options.
-
Alessandro Di Federico authored
This commit disables a warning emitted by recent clang versions that is triggered on libstdc++.
-
- Apr 22, 2018
-
-
Andrea Gussoni authored
This commit introduces the Function Isolation Pass. We use the information provided by the Function Boundaries Detection Pass to organize the code that `revamb` places inside the `root` function in different LLVM functions. To do this we obviously need to introduce some changes and tricks to handle the execution of the translated program. The main idea is to have two different realms (one where the isolated functions live, one in which we have basically the old root function). We start the execution from the realm of the *non isolated* functions, and we transfer, as soon as possible, the execution to the *isolated functions* realm. We then have a fallback mechanism to restore the execution in the right place in the *non isolated* functions realm, and so on. The largest change, besides the re-organization of the code in different functions, is the use of the exception handling mechanism provided by the LLVM framework in order to be able to manage the switch between the two realms. We also introduce the `support.h` header file, which contains a couple of definitions used by `support.c` and that need to be shared with some of the components involved in the translation process. We have defined some helper functions, directly in C, that we use both for handling the exception mechanism and for giving extra debug informations when an exception is raised. The `revamb-dump` utility now supports the `-i` option to specify the path were to save the new LLVM module. The `translate` utility now supports the `-i` option that produces a binary in which the function isolation has been applied. We also introduced some tests that apply the function isolation pass to the `Runtime/` tests already present. In this way we can verify that the translation and the following function isolation preserve the behavior of the program. When serializing the new LLVM module we regenerate the metadata used for debug purposes, and for doing this, since we not longer have only the `root` function, we have changed some details in the `DebugHelper` class in order to be able to emit the metadata for all the functions of our interest in a single shot.
-
- Jan 28, 2018
-
-
Thorbjörn Schulz authored
Added the necessary information for i386 support and a call to a helper function initializing the global descriptor table at runtime.
-
- Jan 18, 2018
-
-
Pietro Fezzardi authored
`GenericFunctor` is substituted with the `std::integral_constant` template. This also allows us to remove code that requires C++14. It also removes the now useless cmake tests on the compiler flag `-Wno-error=noexcept-type` that was introduced to disable fatal warnings on the type `GenericFunctor`. Now that this type has been removed the check is not necessary anymore, because the `std::integral_constant` template used now does not cause the warning. So we can go back to enabling the fatal warnings.
-
- Jan 17, 2018
-
-
Andrea Gussoni authored
We now take advantage of a macro to add a series of compilation flags, macro that also takes care of checking that the flags are supported by the compiler. This patch has been developed by Alessandro Di Federico.
-
Andrea Gussoni authored
The check to see if a compiler supports the `no-pie` flag was done only for the main C compiler, and not for the cross-compilers used for creating the executables for the different supported architectures. This commit introduces the aforementioned missing checks. In addition instead of hard-coding the flags to check in the CMakeLists file we have a list that we pass each time we instantiate a project for the cross-compilers, and we check for the availability of all the flags. In order to do this we need to apply a sort of serialization and deserialization to avoid the "unpack" of the list passed as argument to the external project (that is implemented as a `;` separated string). Also implemented a fix suggested in the merge request for a line that mistakenly added the `TEST_CFLAGS` variable to the `NO_PIE` variable.
-
- Oct 28, 2017
-
-
Alessandro Di Federico authored
A previous commit introduced `-no-pie` to disable PIE in GCC versions higher than 5.2. However, earlier versions don't support such an option. This commit introduces the necessary detection mechanism to enable it or not.
-
- Aug 28, 2017
-
-
Pietro Fezzardi authored
This warning was introduced with gcc-7 to (quoting the documentation) "Warn if the C++1z feature making noexcept part of a function type changes the mangled name of a symbol relative to C++14. Enabled by -Wabi and -Wc++1z-compat.". It is triggered from the `GenericFunctor` class template. It can be safely made not fatal, because this class template is not exposed outside and the whole project is currently compiled with the same C++ standard compiler flags. This commit adds machinery to `CMakeLists.txt` to make the warning not fatal, but only if present. Disabling it when not present would trigger build errors.
-
- Aug 12, 2017
-
-
Alessandro Di Federico authored
The stack analysis is the foundation to obtain accurate information about the body of a function, which registers are callee-saved, arguments, return values and so on. It is implemented as a pass to run in revamb-dump. This commit also introduces analysis tests specific to what we aim to obtain from the analysis and also some basic unit tests for data structures related to the stack analysis.
-
Alessandro Di Federico authored
-
- Apr 21, 2017
-
-
Alessandro Di Federico authored
This commit fixes some warnings given by GCC 6.3.0. * Some `assert(false)` are not recognized as `noreturn`ing. They have been replaced with `llvm_unreachable`. * Added `-Wno-ignored-attributes`: attributes are not part the function name mangling, and therefore they might create some problems when they are involved in template arguments. We don't care. * Specializations of `readPointer` functions in `binaryfile.h` are now `inline`, so they don't appear as "unused" functions.
-
- Mar 31, 2017
-
-
Alessandro Di Federico authored
This commit introduces a docs target which translates `.rst` files into man pages or HTML documents and installs them in `/usr/share/man/man1` or `/usr/share/doc/revamb`.
-
Alessandro Di Federico authored
To compare strings, `STREQUAL` should be used, not `EQUAL`. This prevented some inaccurate GCC warnings to be considered as non-errors.
-
- Mar 02, 2017
-
-
Alessandro Di Federico authored
`support.c` used to be compiled using the system compiler and then linked to the module generated by `revamb` as a separate translation unit. This commit introduces a change that lets `clang` compile `support.c`. This will allow us to make the CSV static, which should enable more aggressive optimizations. * Change the signature of the `root` function so that it accepts an argument: the initial value of the stack pointer, which the main is supposed to set up. QEMU now provides us with the offset of the stack pointer. * Let the build system compile `support.c` for each supported architecture, both in normal and "tracing" mode. * Remove the `--tracing` option, this is now handled by `support.c`, in particular depending on which version of `support.c` you link, you can have tracing enabled or not. * In `support.c` drop global variables representing the stack pointer, we no longer need them. * In `support.c` fix some warnings while handling the stack on 32-bit architectures. * Extende the `translate` script to handle the new way we link the final binary and the tracing mechanism.
-
- Dec 08, 2016
-
-
Alessandro Di Federico authored
This commit introduces two new passes: * `GeneratedCodeBasicInfo`: recovers from the IR some basic information like the size of delay slots in the input architecture, the name of the program counter and so on. It can also identify the type of a basic block (e.g., dispatcher, jump target...). * * `FunctionCallIdentification`: identifies function calls and injects a marker before the associated terminator instruction. The idea of these two passes is to try to progressively move information we used to keep in `JumpTargetManager` into the IR, so that it is more easily accessible and passes do not need a reference to `JTM`. In particular by having markers for function calls available during jump target discovery we don't have to have duplicated and suboptimal implementation of `isCall`. This commit also introduce some additional helper functions and an helper class to quickly.
-
- Dec 03, 2016
-
-
Alessandro Di Federico authored
`revamb-dump` is a tool to extract various information from the LLVM IR generated by `revamb` and output them in a more human-friendly format, typically CSV. The main source of information are the various metadata. Currently `revamb-dump` can collect the CFG, function boundaries and `noreturn` functions.
-
Alessandro Di Federico authored
This commit removes all the ELF-specific code from the `CodeGenerator` class by creating a new class, `BinaryFile` which contains all the information about the program that might be needed in an image format independent way. However, `BinaryFile` has some fields which are specific to ELF, we might want to address this when additional file formats are supported. A key benefit of isolating this code is that we can anticipate the parsing of the input file, so that we have its architecture available earlier than when `CodeGenerator` is instantiated, therefore we can drop the `--architecture` parameter.
-
- Sep 22, 2016
-
-
Alessandro Di Federico authored
* Use "$ORIGIN/../lib/" as RPATH when linking the installed binary * Install also support material such as "support.c" * Import the `translate` script for easy end-to-end translation
-
Alessandro Di Federico authored
Add different search paths for QEMU components, in paritcular relative to the program's path. Also, install the revamb.
-
- Sep 21, 2016
-
-
Alessandro Di Federico authored
-
- Sep 20, 2016
-
-
Alessandro Di Federico authored
-
Alessandro Di Federico authored
-
Alessandro Di Federico authored
* Disable PIE if enabled by default * Link librt.so to compiled binaries (sometimes the QEMU runtime needs it) * Replace `strtonum` with `int` in `awk` script * Specify the compiler, not the triple
-