51 research outputs found

    Function extraction

    Get PDF
    AbstractLow-level imperative programming languages typically have complex operational semantics (e.g. derived from an underlying processor architecture). In this paper, we describe an automatic method for extracting recursive functions from such low-level programs. The functions are derived by formal deduction from the semantics of the programming language. For each function extracted, a proof of correspondence to the original program is automatically constructed. Subsequent program verification can then be done without referring to the details of the low-level programming language semantics at all: it suffices to prove properties of the extracted function. The technique is explained for simple while programs and also for the machine code of a widely used processor. We show how heuristics can enhance the output from the function extractor/decompiler and how the technique aids implementation of a trustworthy compiler. Our tools have been implemented in the HOL4 theorem prover

    Synthesizing Structured CAD Models with Equality Saturation and Inverse Transformations

    Full text link
    Recent program synthesis techniques help users customize CAD models(e.g., for 3D printing) by decompiling low-level triangle meshes to Constructive Solid Geometry (CSG) expressions. Without loops or functions, editing CSG can require many coordinated changes, and existing mesh decompilers use heuristics that can obfuscate high-level structure. This paper proposes a second decompilation stage to robustly "shrink" unstructured CSG expressions into more editable programs with map and fold operators. We present Szalinski, a tool that uses Equality Saturation with semantics-preserving CAD rewrites to efficiently search for smaller equivalent programs. Szalinski relies on inverse transformations, a novel way for solvers to speculatively add equivalences to an E-graph. We qualitatively evaluate Szalinski in case studies, show how it composes with an existing mesh decompiler, and demonstrate that Szalinski can shrink large models in seconds.Comment: 14 page

    A Survey of Methods for Converting Unstructured Data to CSG Models

    Full text link
    The goal of this document is to survey existing methods for recovering CSG representations from unstructured data such as 3D point-clouds or polygon meshes. We review and discuss related topics such as the segmentation and fitting of the input data. We cover techniques from solid modeling and CAD for polyhedron to CSG and B-rep to CSG conversion. We look at approaches coming from program synthesis, evolutionary techniques (such as genetic programming or genetic algorithm), and deep learning methods. Finally, we conclude with a discussion of techniques for the generation of computer programs representing solids (not just CSG models) and higher-level representations (such as, for example, the ones based on sketch and extrusion or feature based operations).Comment: 29 page

    Reverse Engineering of Software for Interoperability and Analysis

    Get PDF
    The rapid evolution of computer technology raises difficult questions about the scope of protection the law should afford computer programs. Computer programs are uniquely different from traditional literary works protected by the copyright laws, because they have machine-like properties, are primarily functional in nature, and frequently are distributed in a form that humans cannot read. Despite these differences, however, computer programs have received protection under the copyright paradigm along with literary and artistic works. The United States historically has employed a highly protectionist approach to computer programs, as evidenced by early software infringement decisions in which courts slowly expanded protection by prohibiting copying of not only the literal or tangible aspects of computer programs but also the nonliteral elements. Recently, some courts have made an underlying shift in their interpretation of legal doctrine and policy from a broad standard of infringement that favors software copyright owners to a more narrow standard

    Binary Disassembly Block Coverage by Symbolic Execution vs. Recursive Descent

    Get PDF
    This research determines how appropriate symbolic execution is (given its current implementation) for binary analysis by measuring how much of an executable symbolic execution allows an analyst to reason about. Using the S2E Selective Symbolic Execution Engine with a built-in constraint solver (KLEE), this research measures the effectiveness of S2E on a sample of 27 Debian Linux binaries as compared to a traditional static disassembly tool, IDA Pro. Disassembly code coverage and path exploration is used as a metric for determining success. This research also explores the effectiveness of symbolic execution on packed or obfuscated samples of the same binaries to generate a model-based evaluation of success for techniques commonly employed by malware. Obfuscated results were much higher than expected, which lead to the discovery that S2E was not actually handling the multiple executable memory regions present in unpacker runtime code. Three recommendations are made to address the shortcomings of S2E and allow it to process obfuscated samples correctly

    The State of the Art of Automatic Programming

    Get PDF
    Automaatprogrammeerimine või koodi genereerimine on teatud tüüpi arvutiprogrammide loomisviis, kus kood genereeritakse mõne tööriista abil, mis võimaldab arendajatel koodi kirjutada kõrgemal abstraktsioonitasemel. Selliste programmide rakendamine tarkvaraarenduse protsessis on hea viis programmeerijate produktiivsuse tõstmiseks, võimaldades neil keskenduda pigem käesolevale ülesandele kui implementatsiooni detailidele. Senises teaduskirjanduses on vaadeldud konkreetseid lähenemisi või meetodeid eraldi. Väga vähesed uurimustööd vaatlevad aga kogu valdkonna viimast taset. Käesolevas töös käsitletakse automaatprogrammeerimist olemasoleva kirjanduse süstemaatilise kirjandusülevaate meetodi abil. Töö teeb ülevaate teemaga seonduvatest algoritmidest, probleemidest ning uurmisvaldkonna avatud uurimisküsimustest ning võrdleb valdkonna hetketaset praktika hetketasemega. Vaaldeldud 37 asjakohasest uuringust tegelesid 19 automaatprogrammeerimise üldise määratlemise ja alateemadega. Kolmkümmend uuringut pakkusid välja konkreetse algoritmi või lähenemisviisi. Esitatud tehnikatest rakendati 2 praktikas. Viimasel ajal on automaatprogrammerimise fookus nihkunud programmide sünteesilt induktiivsele programmeerimisele, mille on põhjustanud läbimurded tehisintellekti valdkonnas. Mõistete ja alateemade määratlus on teadlaste vahel ühtne. Õigete spetsifikatsioonide sõnastamine ja piisava teabe andmine automatiseerimiseks on endiselt lahtine uurimisküsimus.Automatic programming or code generation is a type of computer programming where the code is generated using some tools allowing developers to write code at the higher level of abstraction. Implementing these types of programs into the software development process is a good way to boost programmers’ performance by focusing on the task at hand rather than implementation details. Current literature on the subject reviews single approach or method. Very few of them are reviewing state of the art in general. This paper reviews the state of the art of automatic programming by overviewing the existing literature on the topic using systematic literature review method. The paper overviews approaches and algorithms of the topic, examines issues and open questions in the field and compares the state of the art to the state of the practice. Of 37 relevant studies, 19 addressed general definitions and subtopics of automatic programming. 30 presented specific algorithms or approaches. 2 of proposed techniques were implemented in practice. Currently, the focus of automatic programming shifted from program synthesis to inductive programming, caused by a breakthrough in artificial intelligence. Definition of the term and subtopics is consistent between scholars. However, formulating correct specification and providing sufficient information for automation is still an open research question

    Decompilation of Binaries into LLVM IR for Automated Analysis

    Get PDF
    Complexity in malicious software is increasing to avoid detection and mitigation. As such, there is greater interest in using automation for reverse engineering. Current state-of-the-art tools use proprietary intermediate representations (IR) in decompilation and lack open-source development. LLVM IR has emerged as a candidate for a reverse engineering IR as it is already a mature tool for compilation and has a wide set of existing analysis tools. In 2019, the NSA released the Ghidra reverse engineering framework as a free and open-source alternative. In this thesis, we examine the development and application of IRs in Ghidra for lifting to LLVM IR and evaluating the efficacy of that lifting. Of interest was lifting at both the disassembly and decompilation stages of Ghidra. We developed two tools: Ghidra-to-LLVM and Ghidrall. The former uses Ghidra's Low P-Code IR for a disassembling lifter while the latter uses Ghidra's decompilation data structures as a decompiling lifter. Lastly, we test the efficacy of Ghidrall as an input for automated solving and against another lifter. Our results show that Ghidra is effective and has promise as an input for future LLVM-based reverse engineering technologies
    corecore