Roadmap
Where we are, where want to go
Building a decompiler is a big endeavour. Follow our latest advancements and the plan for the months to come towards the 1.0 release.
Tier 2: Closed beta (part 1)
Adopt new Hub frontend
Dismiss the old Web 1.0 Hub frontend in favor of the new Next.js-based Hub frontend.
Tier 2 Public Relations
Publish the new website, release a blog post, send newsletter, write tutorial for Tier 2 participants.
No timeouts/crashes on selected binaries
Make sure we can run the whole pipeline on a predetermined set reasonably-sized x86-64 Linux binaries without crashes and within a reasonable time frame.
Perform in-depth QA on `hostname`
Make sure the hostname
binary is decompiled in a sensible way.
Tier 2.5: Closed beta (part 2)
Performance
Decompile GCC in 5 minutes
Clift backend
Implement a the new C backend, using our custom Clift MLIR Dialect to generate decompiled code.
Adapt variable producers
Migrate all the passes of the old decompilation pipeline to use LLVM alloca/load/store to represent C local variables. This is preliminary work for the new Clift-based C backend.
Clift dialect conversion
Implement LLVM-to-Clift conversion for the new C decompilation backend.
Tiling
Match C control-flow statements in Clift-based C decompilation backend.
Handle all the forms of memcpy
Ensure we emit memcpy
gracefully for all the various architectures we support.
Comments in function's body
Enable users to input comments associated to a specific instruction of the program and show it in both the decompiled code and the disassembly.
Mass testing
Test on a massive amount of binaries and promote binaries that are decompiled without crashes and withing a reasonable time frame to the regression suite.
Implicit conversions
Detect and remove casts that in C would be implicit. This will significantly reduce the number of casts the user sees in the decompiled code.
EFA QA
Perform Quality Assurance on EFA results on a vast number of functions with a diverse set of arguments and return values.
Push variable declarations ALAP
In decompiled C code, make sure we declare local variables as late as possible.
Primitives inlining
Change the model layout so that primitive types (e.g., uint32_t
) are defined inline, instead of having an entry in model::Binary::Types
.
EFA4
Implement the 4th version of Early Function Analysis, which will significantly improve detection of register-based arguments and return values.
Invalidation logic
Implement the logic to detect what artifacts needs to be recomputed, instead of recomputing everything at every change.
Documentation
Provide public documentation of the model, the CLI and our Python/TypeScript wrappers.
Declutter the UI
Make the necessary changes to VSCode to remove everything that's not strictly necessary for our use case.
Tier 3: Open beta (part 1)
Clift: pre-backend passes
Various optimizations on Clift, aimed at generating better looking C code: integer literals, implicit casts, parentheses based on operator precedence.
EmitFieldAccesses
Transform integer arithmetic into field accesses expression in the new Clift-based C backend.
VMA1.5 on Clift
Implement basic forward type propagation in Clift.
Clift canonicalizations
Clift canonicalization: fold &*, fold *&, two's complement arithmetic normalization, remove empty branches of if-statements, match advanced loops (while, do-while), handle noreturn.
CRUD all model parts in UI
In the UI we need to provide a way to create, edit and remove types, functions, segments and so on.
Initial auto-analyses twice
We need to be able to run the analysis pipeline twice without crashing.
Rebase QEMU
Rebase QEMU to the latest version. This will enable us to support additional architectures and start working on for proper floating point support.
Preserve debug info
Review the decompilation pipeline to ensure that debug information, which we use to trace decompiled code back to assembly instructions, are preserved as much as possible. This ensure we don't lose the link between decompiled code and assembly in most situations.
Model verify on the client
Enable the VSCode client to verify the model without making a remote request. This ensures that the user can make interactive changes and immediately have a feedback if the changes are valid or not.
Drop kinds
Get rid of kinds from revng-pipeline
.
Improve C state variables
Rework how we name artificial state variables introduced, e.g., to exit from a loop.
Full backward navigation
Make sure the UI can perform backward navigation even between references that are not available in the call graph. This might require to materialize all artifacts in background.
Model upgrade
Implement infrastructure to automatically upgrade among model versions.
Implement verification passes
Implement passes whose role is to assert certain invariants are preserved in each part of the pipeline.
Propagate types in IR before DLA
Perform type propagation through the IR before DLA is run. This ensures DLA can reconstruct richer types, such as signed and unsigned integers.
Collaboration QA
Ensure collaboration works smoothly.
HexView
Implement a basic hexadecimal view.
Support multiple binaries
Make sure a single project can handle multiple projects. Also, switch to record hashes of binaries in the model, instead of asking the user to provide it.
Outlining/inling/tail calls
The Inline
attribute of model::Function
has known limitations. Make sure we can inline any function.
Hub: expose snippets
Implement in Hub a feature to embed decompiled code snippets.
Implement undo/redo
Implement the undo/redo feature.
Reorganize repositories
Merge the revng and revng-c repositories.
revng-pypeline
Implement a more git
-like CLI for revng and move most of the revng-pipeline logic to Python.
Python client
Implement a dev-friendly Python library to interact with revng-daemon
's GraphQL API.
Tier 3.5: Open beta (part 2)
CFG restructuring
Implement new generation of control-flow combing algorithm, with heuristics to prevent excessive duplication
Adopt alias analysis in SwitchToStatements
Inform the Clift-based C decompilation pipeline that the stack frame does not alias other memory, to avoid redundant accesses to it in decompiled C code.
Import C headers
Make sure we can import a C header into the model. Core idea: compile the header with debug info, and then import via DWARF importer.
Tackle stack slot reuse
Devise a way to handle stack slot being used in different ways across the body of a function. Core idea: promote to SSA value.
Perform QA on various architectures
We need to fix platform-specific issues, bug and limitations that pop up on architectures that have not gone through QA yet.
Support variadic arguments
Implement support for variadic arguments for the various ABIs we support.
Floating point support
Improve support for floating point instruction and data types.
Segment with designated initializers
In the C view, show segments as global variables using C's designed initializers.
Tidy model.h
Cleanup the C headers we emit.
Import stack frames from DWARF
Exploit information in DWARF debug info to automatically populate the stack frame of functions.
Strings view
Implement a simple view to show all the strings we detected in the binary.
DLA: import model + subgraph
Ensure DLA can import existing model information and can correctly run on a portion of the call graph. This will enable us to re-run DLA after the initial analysis.
Detect Strings
Add analysis to detect string literals in segments.
Tier 4: 1.0 release
ConvertToCABI QA
Perform Quality Assurance on the result of converting RawFunctionTypes
to CABIFunctionTypes
.
Background artifacts production
Implement logic to asynchronously produce artifacts in background.
Patches raw bytes in model
Add support for patching the binary from the model.
Metaview
Implement a view that shows, as a single document, the whole binary.
Shop
Implement an online shop to buy subscriptions/licenses.
Register to the UI closed beta!
Want to try the UI? We're now inviting people on a FIFO basis.