Commits · edc6fe118c3fccb4d6139cb6936f08ee45970f2a · revng-bar-2019 / revamb

Aug 12, 2017

`JTM::setCFGForm`: ignore indirect calls · edc6fe11
Alessandro Di Federico authored 7 years ago

edc6fe11

Replace `getNext` with `nextNonMarker` · 47104c0f

Alessandro Di Federico authored 7 years ago

Most of the times, when we need to get the next instruction, we actually
want to skip over "marker" function calls (e.g., calls to `newpc` and
`function_call`). `nextNonMarker` does exactly this.

`FunctionCallIdentification::isCall` and `JumpTargetManager::setCFGForm`
have also been extended to correctly handle such situations.

47104c0f

Make `translateIndirectJumps` private · ce09875c

Alessandro Di Federico authored 7 years ago

`JumpTargetManager::translateIndirectJumps` has been pushed into
`JumpTargetManager::finalizeJumpTargets`. Moreover, an safety check
about the removal of `exitTB` has been introduced.

ce09875c

Tag the default case of the dispatcher · 355b8c8a

Alessandro Di Federico authored 7 years ago

The basic block handling the default case of the dispatcher used not to
be tagged with `revamb.block.type`, now it is.

355b8c8a

Jul 07, 2017

Handle `.bss`-only data segment · 9db62b28

Alessandro Di Federico authored 7 years ago

This commit fixes an assertion triggered by the fact that a segment
includes exclusively zero-initialized data (i.e., size on file is 0,
memory size is not). In this case LLVM detects the fact that the global
variable associated to the segment is composed exclusively composed by
0s and uses a `ConstantAggregateZero` as an initializer instead of a
`ConstantDataArray`.

Currently the solution is ignore that data, however, in the future it
might be beneficial to be able to read data from `.bss`, even if we just
have zeros there.

Thanks to Thorbjoern Schulz for reporting this bug.

9db62b28

Mar 31, 2017

Detect `try`/`catch` landing pads · d8f13c79

Alessandro Di Federico authored 7 years ago

Landing pads are basically the `catch` blocks in C++ `try`/`catch`
statements. So far we were missing them since they are encoded in a
particular way in a way similar to DWARF debugging information in the
`.eh_frame` and, more specifically, in the `.gcc_except_table` sections
of ELF programs.

This commit parses these sections so that the basic blocks associated to
landing pads are correctly identified. Personality functions are
detected too. A test is also introduced to assess the effectiveness of
our code.

d8f13c79

Mar 23, 2017
- Do not remove predecessors while iterating on them · ab160b03
  Alessandro Di Federico authored 7 years ago
  
  ab160b03
Mar 06, 2017

Purge translated code in post-order · d398213a

Alessandro Di Federico authored 8 years ago

This commit changes the way instruction and basic block are purged when
re-translation is necessary. Specifically, the purge is now performed
through a post-order visit, which should prevent the removal of any
instructions still holding users.

This commit also introduces the `SubGraph` class, which is useful to be
able to navigate portions of a graph (e.g., a `Function`) in post-order
easily.

d398213a

Mar 02, 2017

When splitting a basic block, retranslate · 04a4591f

Alessandro Di Federico authored 8 years ago

This commit should fix some bugs due to the fact that when we're
splitting a basic block we don't retranslate the basic block at the
split point but preserve the existing code. This lead to problems, in
particular in x86-64 where certain QEMU local variables were not
available. This change should fix it.

Basically, every time we split a basic block in
`JumpTargetManager::registerJT` we note down that the new basic block
must be purged, and in `JumpTargetManager::harvest` we perform the
purge. `harvest` has been chosen since it's a particularly quiet moment,
i.e., there should be no pending references/iterator to code we have to
delete.

04a4591f

Dismiss basic block statistics collection · d6b257dd
Alessandro Di Federico authored 8 years ago
```
If we need this again, we can do it in revamb-dump.
```
d6b257dd

Dec 08, 2016

Change the way we denote JT basic blocks · 3e8e9c23

Alessandro Di Federico authored 8 years ago

Currently we're identifying basic blocks that are a jump target by
adding metadata on the terminator instruction. This is a problem in many
cases, therefore we now use the third parameter of `newpc` calls to
understand if a basic block is a jump target.

The third argument was set only at the very end of all our analysis,
before producing the output. We anticipate this so that is done before
each jump target harvesting, so that this information is available
through `GeneratedCodeBasicInfo`.

3e8e9c23

Introduce the GCBI and FCI passes · 5c619ab0

Alessandro Di Federico authored 8 years ago

This commit introduces two new passes:

* `GeneratedCodeBasicInfo`: recovers from the IR some basic information
  like the size of delay slots in the input architecture, the name of
  the program counter and so on. It can also identify the type of a
  basic block (e.g., dispatcher, jump target...).  *
* `FunctionCallIdentification`: identifies function calls and injects a
  marker before the associated terminator instruction.

The idea of these two passes is to try to progressively move information
we used to keep in `JumpTargetManager` into the IR, so that it is more
easily accessible and passes do not need a reference to `JTM`.

In particular by having markers for function calls available during jump
target discovery we don't have to have duplicated and suboptimal
implementation of `isCall`.

This commit also introduce some additional helper functions and an
helper class to quickly.

5c619ab0

Remove clone of `getLimitedValue` from JTM · d6471b99
Alessandro Di Federico authored 8 years ago

d6471b99

Specify endianess when reading from segments · c83559fc

Alessandro Di Federico authored 8 years ago

Let functions such as `JumpTargetManager::readRawValue` take a parameter
specifying if the value should be read from the segment using the
endianess of the original architecture or of the target architecture.

This commit fixes a bug with big endian architectures (i.e., MIPS) since
when materializing a value on the operation stack of SET, the endianess
was changed twice, once in `readRawValue` and the second time while
applying the `bswap` instruction which is registered on the stack.

c83559fc

Dec 03, 2016

Introduce the `NoFunctionCallsCFG` CFG form · 09e25267

Alessandro Di Federico authored 8 years ago

`NoFunctionCallsCFG` is a form of the CFG where all the function call
edges are replaced with jumps to the return address. This is beneficial
in certain analysis to pretend we're working on a function-level.

To implement such a form of CFG we now emit right before the terminator
of each caller basic block a call to the "function_call" function
passing as the first parameter the callee basic block and as the second
one the return basic block. Using this function calls, switching to
`NoFunctionCallsCFG` and back becomes straightforward.

09e25267

Add support to switch between different CFG forms · c579da60

Alessandro Di Federico authored 8 years ago

This commit introduces `JumpTargetManager::setCFGForm` which allows to
choose which type of CFG the user currently wants. The default and final
form should be `SemanticPreservingCFG`, which is the most conservative
one. However for certain analysis might be beneficial to have a reduced
CFG with almost no dispatcher (in particular for OSRA and SET).

This new function handles the switching between the two currently
available forms of CFG by changing the behavior of the `anyPC` and
`unexpectedPC` basic blocks and rebuilding the dispatcher as
appropriate.

c579da60

Keep the CFG simple: do not jump to the dispatcher · c069700b

Alessandro Di Federico authored 8 years ago

Every time we don't know where an indirect jump can go, we used to emit
a jump to the dispatcher, however this complicates our analyses, in
particular the computed dominator tree provides less useful information
than it could.

This commit transforms all the jumps to the dispatcher into jumps to a
"anypc" basic block which during analysis just contains an unreachable
instruction, but during finalization this instruction is replaced with a
jump to the dispatcher. A similar (temporary) situation is for the
"unexpectepc" case.

This commit also makes the `visit(Sucessors|Predecessors)` functions
more idiomatic by employing a trait for black lists.

c069700b

`JTM::readRawValue`: fix endianess bug · 4f60a23d

Alessandro Di Federico authored 8 years ago

`JumpTargetManager::readRawValue` used to take into account the
endianess information from `DataLayout`, i.e., the output endianess,
while the input endianess should be take into account.

The commit also checks that during final basic block finalization we
have no empty basic blocks.

4f60a23d

Isolate ELF code and remove architecture parameter · 83ea2caa

Alessandro Di Federico authored 8 years ago

This commit removes all the ELF-specific code from the `CodeGenerator`
class by creating a new class, `BinaryFile` which contains all the
information about the program that might be needed in an image format
independent way. However, `BinaryFile` has some fields which are
specific to ELF, we might want to address this when additional file
formats are supported.

A key benefit of isolating this code is that we can anticipate the
parsing of the input file, so that we have its architecture available
earlier than when `CodeGenerator` is instantiated, therefore we can drop
the `--architecture` parameter.

83ea2caa

Sep 27, 2016

Use symbols to produce meaningful names · c50dcc5c

Alessandro Di Federico authored 8 years ago

This commit introduces the usage of symbols, if they are available. We
employ them to produce meaningful names for basic block names.

* Collect the symbols from `.symtab`/`.dynsym`
* Box the `Segments` into a new data structure (`BinaryInfo`) which also
  handles symbols.
* `JumpTargetManager::nameForAddress`: produce a meaningful name using
  symbols, if possible.
* Spread some `const`-ness

c50dcc5c

Sep 20, 2016
- Copyright notices, license and credits · d01ee1f4
  Alessandro Di Federico authored 8 years ago
  
  d01ee1f4
Sep 17, 2016

Remove some dead code · 29879c8d
Alessandro Di Federico authored 8 years ago

29879c8d

Introduce `NoreturnAnalysis` · cc87ad60

Alessandro Di Federico authored 8 years ago

This commit introduces the `noreturn` analysis, whose aim is to detect
all the basic blocks the are doomed to lead to a `noreturn` syscall such
as `execve` or `exit`.

* Implement `NoreturnAnalysis`.
* Include and initialize in the `Architecture` data structure all the
  necessary information to detect `noreturn` syscalls. Specifically, the
  name of the QEMU helper for syscalls, the name of the register holding
  the syscall number and the syscall numbers representing `noreturn`
  syscalls.
* `ReachingDefinitionsPass`: make reaching definitions available both in
  reaching definitions mode and reached loads mode. This part needs
  further cleanup. We also might be willing to implement this with a
  `Boost.Bimap`.
* Use `SET` to collect information useful for the
  `NoreturnAnalysis`. Also restructure how the `OperationsStack` works
  to be more streamlined and keep track of multiple information about
  the instruction currently being tracked.

cc87ad60

Keep track of memory ranges read by SET · 8dfda3d5
Alessandro Di Federico authored 8 years ago

8dfda3d5

Free memory after analyses · 46fe8622

Alessandro Di Federico authored 8 years ago

* Clear all the data that's not part of the analysis results at the end
  of the `runOnFunction` method
* Clear all the data that's part of the analysis results when the
  `PassManager` tells us so (`Pass::releaseMemory`)
* Do not use the `clear()` method, since it doesn't release memory
* Add some debugging information

46fe8622

Keep track of how jump targets have been met · 7859f9de

Alessandro Di Federico authored 8 years ago

This commit registers for each jump target how we met it, as a flag. It
also keeps track of which pointers in global data have been involved in
materialization performed by SET: those who are not are of special
interest for us, since they are likely function pointers, and are
therefore marked with a specific flag.

7859f9de

Dismiss `JumpTargetManager::registerBlock` · 4ae7cdad
Alessandro Di Federico authored 8 years ago

4ae7cdad
Draft tracking of reasons for registering JTs · 27b4e465
Alessandro Di Federico authored 8 years ago

27b4e465
Drop the concept of "reliable" jump target · 1d87dced
Alessandro Di Federico authored 8 years ago

1d87dced
Add support for using section information · 6c5c0ad8
Alessandro Di Federico authored 8 years ago

6c5c0ad8
`exitTBCleanup`: don't delete with pending uses · dd7e05d6
Alessandro Di Federico authored 8 years ago

dd7e05d6
SimplifyComparisonsPass: transform in analysis · deae1f84
Alessandro Di Federico authored 8 years ago
```
* Add an "s" in the name
* Transform the pass in analysis and let OSRA use it
```
deae1f84

Aug 20, 2016

Give a sensible name to all the basic blocks · c21c1b19

Alessandro Di Federico authored 8 years ago

* When generating the code for setting a label or jumping to it, give
  sensible names to the new basic blocks.
* Keep track of the last seen PC during translation so it can be used to
  obtain a sensible name for the basic block.
* Let `JumpTargetManager::getBlockAt` set a proper name to the basic
  block before returning, if it doesn't already have one.

c21c1b19

Introduce `forceFallthroughAfterHelper` · 9c868330

Alessandro Di Federico authored 8 years ago

`forceFallthroughAfterHelper` handles the situation where there isn't a
PC-store between a call to an helper and to `exitTB`, in this case, we
force a branch to the fallthrough PC.

This commit also simplifies `InstructionTranslator::translateCall`:
remove jump to the dispatcher after a call to an helper in case the PC
was saved and it has changed. We don't really need to do this, QEMU will
generate a call to `exitTB` has necessary or
`forceFallthroughAfterHelper` will take care of the thing.

9c868330

Factor out and improve `visitSuccessors` · d3a6442a

Alessandro Di Federico authored 8 years ago

* The function now can take a `std::set` of basic blocks to ignore.
* The visitor function has now several options on how to proceed, and
  can express them through its return value.
* A serious bug in the implementation was also fixed.

d3a6442a

Introduce SimplifyComparisonPass · 430a7261

Alessandro Di Federico authored 8 years ago

This pass helps us handling instructions like ARM's `blt` which compute
the result of the comparison by bit-fiddling with the bit sign of the
operands of a subtraction.

The idea is to have a series of known boolean expressions using `a`, `b'
and `c` as variables (e.g. the boolean expression corresponding to
"signed greater than") and compare their truth table against the one
being analyzed. In case of match, the comparison can be simplified.

430a7261

Force execution of `pinJTS` · 56c37f6c
Alessandro Di Federico authored 8 years ago

56c37f6c
Check for "sum jumps" more often · ba543727
Alessandro Di Federico authored 8 years ago

ba543727
Fix bugs in `JumpTargetManager::getPC` · acf7063a
Alessandro Di Federico authored 8 years ago

acf7063a
Temporaly disable assertion · f473731b
Alessandro Di Federico authored 8 years ago

f473731b