Skip to content
Snippets Groups Projects
  1. Aug 12, 2017
  2. Jul 07, 2017
    • Alessandro Di Federico's avatar
      Handle `.bss`-only data segment · 9db62b28
      Alessandro Di Federico authored
      This commit fixes an assertion triggered by the fact that a segment
      includes exclusively zero-initialized data (i.e., size on file is 0,
      memory size is not). In this case LLVM detects the fact that the global
      variable associated to the segment is composed exclusively composed by
      0s and uses a `ConstantAggregateZero` as an initializer instead of a
      `ConstantDataArray`.
      
      Currently the solution is ignore that data, however, in the future it
      might be beneficial to be able to read data from `.bss`, even if we just
      have zeros there.
      
      Thanks to Thorbjoern Schulz for reporting this bug.
      9db62b28
  3. Mar 31, 2017
    • Alessandro Di Federico's avatar
      Detect `try`/`catch` landing pads · d8f13c79
      Alessandro Di Federico authored
      Landing pads are basically the `catch` blocks in C++ `try`/`catch`
      statements. So far we were missing them since they are encoded in a
      particular way in a way similar to DWARF debugging information in the
      `.eh_frame` and, more specifically, in the `.gcc_except_table` sections
      of ELF programs.
      
      This commit parses these sections so that the basic blocks associated to
      landing pads are correctly identified. Personality functions are
      detected too. A test is also introduced to assess the effectiveness of
      our code.
      d8f13c79
  4. Mar 23, 2017
  5. Mar 06, 2017
    • Alessandro Di Federico's avatar
      Purge translated code in post-order · d398213a
      Alessandro Di Federico authored
      This commit changes the way instruction and basic block are purged when
      re-translation is necessary. Specifically, the purge is now performed
      through a post-order visit, which should prevent the removal of any
      instructions still holding users.
      
      This commit also introduces the `SubGraph` class, which is useful to be
      able to navigate portions of a graph (e.g., a `Function`) in post-order
      easily.
      d398213a
  6. Mar 02, 2017
    • Alessandro Di Federico's avatar
      When splitting a basic block, retranslate · 04a4591f
      Alessandro Di Federico authored
      This commit should fix some bugs due to the fact that when we're
      splitting a basic block we don't retranslate the basic block at the
      split point but preserve the existing code. This lead to problems, in
      particular in x86-64 where certain QEMU local variables were not
      available. This change should fix it.
      
      Basically, every time we split a basic block in
      `JumpTargetManager::registerJT` we note down that the new basic block
      must be purged, and in `JumpTargetManager::harvest` we perform the
      purge. `harvest` has been chosen since it's a particularly quiet moment,
      i.e., there should be no pending references/iterator to code we have to
      delete.
      04a4591f
    • Alessandro Di Federico's avatar
      Dismiss basic block statistics collection · d6b257dd
      Alessandro Di Federico authored
      If we need this again, we can do it in revamb-dump.
      d6b257dd
  7. Dec 08, 2016
    • Alessandro Di Federico's avatar
      Change the way we denote JT basic blocks · 3e8e9c23
      Alessandro Di Federico authored
      Currently we're identifying basic blocks that are a jump target by
      adding metadata on the terminator instruction. This is a problem in many
      cases, therefore we now use the third parameter of `newpc` calls to
      understand if a basic block is a jump target.
      
      The third argument was set only at the very end of all our analysis,
      before producing the output. We anticipate this so that is done before
      each jump target harvesting, so that this information is available
      through `GeneratedCodeBasicInfo`.
      3e8e9c23
    • Alessandro Di Federico's avatar
      Introduce the GCBI and FCI passes · 5c619ab0
      Alessandro Di Federico authored
      This commit introduces two new passes:
      
      * `GeneratedCodeBasicInfo`: recovers from the IR some basic information
        like the size of delay slots in the input architecture, the name of
        the program counter and so on. It can also identify the type of a
        basic block (e.g., dispatcher, jump target...).  *
      * `FunctionCallIdentification`: identifies function calls and injects a
        marker before the associated terminator instruction.
      
      The idea of these two passes is to try to progressively move information
      we used to keep in `JumpTargetManager` into the IR, so that it is more
      easily accessible and passes do not need a reference to `JTM`.
      
      In particular by having markers for function calls available during jump
      target discovery we don't have to have duplicated and suboptimal
      implementation of `isCall`.
      
      This commit also introduce some additional helper functions and an
      helper class to quickly.
      5c619ab0
    • Alessandro Di Federico's avatar
      d6471b99
    • Alessandro Di Federico's avatar
      Specify endianess when reading from segments · c83559fc
      Alessandro Di Federico authored
      Let functions such as `JumpTargetManager::readRawValue` take a parameter
      specifying if the value should be read from the segment using the
      endianess of the original architecture or of the target architecture.
      
      This commit fixes a bug with big endian architectures (i.e., MIPS) since
      when materializing a value on the operation stack of SET, the endianess
      was changed twice, once in `readRawValue` and the second time while
      applying the `bswap` instruction which is registered on the stack.
      c83559fc
  8. Dec 03, 2016
    • Alessandro Di Federico's avatar
      Introduce the `NoFunctionCallsCFG` CFG form · 09e25267
      Alessandro Di Federico authored
      `NoFunctionCallsCFG` is a form of the CFG where all the function call
      edges are replaced with jumps to the return address. This is beneficial
      in certain analysis to pretend we're working on a function-level.
      
      To implement such a form of CFG we now emit right before the terminator
      of each caller basic block a call to the "function_call" function
      passing as the first parameter the callee basic block and as the second
      one the return basic block. Using this function calls, switching to
      `NoFunctionCallsCFG` and back becomes straightforward.
      09e25267
    • Alessandro Di Federico's avatar
      Add support to switch between different CFG forms · c579da60
      Alessandro Di Federico authored
      This commit introduces `JumpTargetManager::setCFGForm` which allows to
      choose which type of CFG the user currently wants. The default and final
      form should be `SemanticPreservingCFG`, which is the most conservative
      one. However for certain analysis might be beneficial to have a reduced
      CFG with almost no dispatcher (in particular for OSRA and SET).
      
      This new function handles the switching between the two currently
      available forms of CFG by changing the behavior of the `anyPC` and
      `unexpectedPC` basic blocks and rebuilding the dispatcher as
      appropriate.
      c579da60
    • Alessandro Di Federico's avatar
      Keep the CFG simple: do not jump to the dispatcher · c069700b
      Alessandro Di Federico authored
      Every time we don't know where an indirect jump can go, we used to emit
      a jump to the dispatcher, however this complicates our analyses, in
      particular the computed dominator tree provides less useful information
      than it could.
      
      This commit transforms all the jumps to the dispatcher into jumps to a
      "anypc" basic block which during analysis just contains an unreachable
      instruction, but during finalization this instruction is replaced with a
      jump to the dispatcher. A similar (temporary) situation is for the
      "unexpectepc" case.
      
      This commit also makes the `visit(Sucessors|Predecessors)` functions
      more idiomatic by employing a trait for black lists.
      c069700b
    • Alessandro Di Federico's avatar
      `JTM::readRawValue`: fix endianess bug · 4f60a23d
      Alessandro Di Federico authored
      `JumpTargetManager::readRawValue` used to take into account the
      endianess information from `DataLayout`, i.e., the output endianess,
      while the input endianess should be take into account.
      
      The commit also checks that during final basic block finalization we
      have no empty basic blocks.
      4f60a23d
    • Alessandro Di Federico's avatar
      Isolate ELF code and remove architecture parameter · 83ea2caa
      Alessandro Di Federico authored
      This commit removes all the ELF-specific code from the `CodeGenerator`
      class by creating a new class, `BinaryFile` which contains all the
      information about the program that might be needed in an image format
      independent way. However, `BinaryFile` has some fields which are
      specific to ELF, we might want to address this when additional file
      formats are supported.
      
      A key benefit of isolating this code is that we can anticipate the
      parsing of the input file, so that we have its architecture available
      earlier than when `CodeGenerator` is instantiated, therefore we can drop
      the `--architecture` parameter.
      83ea2caa
  9. Sep 27, 2016
    • Alessandro Di Federico's avatar
      Use symbols to produce meaningful names · c50dcc5c
      Alessandro Di Federico authored
      This commit introduces the usage of symbols, if they are available. We
      employ them to produce meaningful names for basic block names.
      
      * Collect the symbols from `.symtab`/`.dynsym`
      * Box the `Segments` into a new data structure (`BinaryInfo`) which also
        handles symbols.
      * `JumpTargetManager::nameForAddress`: produce a meaningful name using
        symbols, if possible.
      * Spread some `const`-ness
      c50dcc5c
  10. Sep 20, 2016
  11. Sep 17, 2016
  12. Aug 20, 2016
    • Alessandro Di Federico's avatar
      Give a sensible name to all the basic blocks · c21c1b19
      Alessandro Di Federico authored
      * When generating the code for setting a label or jumping to it, give
        sensible names to the new basic blocks.
      * Keep track of the last seen PC during translation so it can be used to
        obtain a sensible name for the basic block.
      * Let `JumpTargetManager::getBlockAt` set a proper name to the basic
        block before returning, if it doesn't already have one.
      c21c1b19
    • Alessandro Di Federico's avatar
      Introduce `forceFallthroughAfterHelper` · 9c868330
      Alessandro Di Federico authored
      `forceFallthroughAfterHelper` handles the situation where there isn't a
      PC-store between a call to an helper and to `exitTB`, in this case, we
      force a branch to the fallthrough PC.
      
      This commit also simplifies `InstructionTranslator::translateCall`:
      remove jump to the dispatcher after a call to an helper in case the PC
      was saved and it has changed. We don't really need to do this, QEMU will
      generate a call to `exitTB` has necessary or
      `forceFallthroughAfterHelper` will take care of the thing.
      9c868330
    • Alessandro Di Federico's avatar
      Factor out and improve `visitSuccessors` · d3a6442a
      Alessandro Di Federico authored
      * The function now can take a `std::set` of basic blocks to ignore.
      * The visitor function has now several options on how to proceed, and
        can express them through its return value.
      * A serious bug in the implementation was also fixed.
      d3a6442a
    • Alessandro Di Federico's avatar
      Introduce SimplifyComparisonPass · 430a7261
      Alessandro Di Federico authored
      This pass helps us handling instructions like ARM's `blt` which compute
      the result of the comparison by bit-fiddling with the bit sign of the
      operands of a subtraction.
      
      The idea is to have a series of known boolean expressions using `a`, `b'
      and `c` as variables (e.g. the boolean expression corresponding to
      "signed greater than") and compare their truth table against the one
      being analyzed. In case of match, the comparison can be simplified.
      430a7261
    • Alessandro Di Federico's avatar
      Force execution of `pinJTS` · 56c37f6c
      Alessandro Di Federico authored
      56c37f6c
    • Alessandro Di Federico's avatar
      Check for "sum jumps" more often · ba543727
      Alessandro Di Federico authored
      ba543727
    • Alessandro Di Federico's avatar
      acf7063a
    • Alessandro Di Federico's avatar
      Temporaly disable assertion · f473731b
      Alessandro Di Federico authored
      f473731b
Loading