3,769 research outputs found

    Learning a Static Analyzer from Data

    Full text link
    To be practically useful, modern static analyzers must precisely model the effect of both, statements in the programming language as well as frameworks used by the program under analysis. While important, manually addressing these challenges is difficult for at least two reasons: (i) the effects on the overall analysis can be non-trivial, and (ii) as the size and complexity of modern libraries increase, so is the number of cases the analysis must handle. In this paper we present a new, automated approach for creating static analyzers: instead of manually providing the various inference rules of the analyzer, the key idea is to learn these rules from a dataset of programs. Our method consists of two ingredients: (i) a synthesis algorithm capable of learning a candidate analyzer from a given dataset, and (ii) a counter-example guided learning procedure which generates new programs beyond those in the initial dataset, critical for discovering corner cases and ensuring the learned analysis generalizes to unseen programs. We implemented and instantiated our approach to the task of learning JavaScript static analysis rules for a subset of points-to analysis and for allocation sites analysis. These are challenging yet important problems that have received significant research attention. We show that our approach is effective: our system automatically discovered practical and useful inference rules for many cases that are tricky to manually identify and are missed by state-of-the-art, manually tuned analyzers

    Sawja: Static Analysis Workshop for Java

    Get PDF
    Static analysis is a powerful technique for automatic verification of programs but raises major engineering challenges when developing a full-fledged analyzer for a realistic language such as Java. This paper describes the Sawja library: a static analysis framework fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main features of the library, including (i) efficient functional data-structures for representing program with implicit sharing and lazy parsing, (ii) an intermediate stack-less representation, and (iii) fast computation and manipulation of complete programs

    Generating Log File Analyzers

    Get PDF
    Software testing is a crucial part of the software development process, because it helps developers ensure that the software works correctly and according to stakehold- ers’ requirements and specifications. Faulty or problematic software can cause huge financial losses. Automation of testing tasks can have a positive impact on software development, by reducing costs and minimizing human error. Software testing can be divided into three tasks: choosing test cases, running test cases on the software under test (SUT) and evaluating the test results. To evaluate test results, testers need to examine the output of the SUT to determine if it performed as expected. Programs often store some of their outputs in files known as log files. The task of evaluating test results can be automated by using a log file analyzer. The main goal of this thesis is to design an approach to generate log file analyzers based on a set of state machine specifications. Our analyzers are generated in C++ and are capable of reading log files from disk or shared memory areas. Regular expressions have been incorporated, so that analyzers can be adapted to different logging policies. We analyze the purpose and benefits of this framework and discuss differences with a previous implementation based on Prolog. In particular, we discuss the results of a series of experiments that we performed in order to compare the performance between Prolog–based analyzers and C++ analyzers. Our results show that C++ analyzers are between 8 and 15 times faster than Prolog–based analyzers

    PowerDrive: Accurate De-Obfuscation and Analysis of PowerShell Malware

    Get PDF
    PowerShell is nowadays a widely-used technology to administrate and manage Windows-based operating systems. However, it is also extensively used by malware vectors to execute payloads or drop additional malicious contents. Similarly to other scripting languages used by malware, PowerShell attacks are challenging to analyze due to the extensive use of multiple obfuscation layers, which make the real malicious code hard to be unveiled. To the best of our knowledge, a comprehensive solution for properly de-obfuscating such attacks is currently missing. In this paper, we present PowerDrive, an open-source, static and dynamic multi-stage de-obfuscator for PowerShell attacks. PowerDrive instruments the PowerShell code to progressively de-obfuscate it by showing the analyst the employed obfuscation steps. We used PowerDrive to successfully analyze thousands of PowerShell attacks extracted from various malware vectors and executables. The attained results show interesting patterns used by attackers to devise their malicious scripts. Moreover, we provide a taxonomy of behavioral models adopted by the analyzed codes and a comprehensive list of the malicious domains contacted during the analysis

    Mechanized semantics

    Get PDF
    The goal of this lecture is to show how modern theorem provers---in this case, the Coq proof assistant---can be used to mechanize the specification of programming languages and their semantics, and to reason over individual programs and over generic program transformations, as typically found in compilers. The topics covered include: operational semantics (small-step, big-step, definitional interpreters); a simple form of denotational semantics; axiomatic semantics and Hoare logic; generation of verification conditions, with application to program proof; compilation to virtual machine code and its proof of correctness; an example of an optimizing program transformation (dead code elimination) and its proof of correctness

    Automatic Derivation of Abstract Semantics From Instruction Set Descriptions

    Get PDF
    Abstracted semantics of instructions of processor-based architectures are an invaluable asset for several formal verification techniques, such as software model checking and static analysis. In the field of model checking, abstract versions of instructions can help counter the state explosion problem, for instance by replacing explicit values by symbolic representations of sets of values. Similar to this, static analyses often operate on an abstract domain in order to reduce complexity, guarantee termination, or both. Hence, for a given microcontroller, the task at hand is to find such abstractions. Due to the large number of available microcontrollers, some of which are even created for specific applications, it is impracticable to rely on human developers to perform this step. Therefore, we propose a technique that starts from imperative descriptions of instructions, which allows to automate most of the process

    A Static Analyzer for Large Safety-Critical Software

    Get PDF
    We show that abstract interpretation-based static program analysis can be made efficient and precise enough to formally verify a class of properties for a family of large programs with few or no false alarms. This is achieved by refinement of a general purpose static analyzer and later adaptation to particular programs of the family by the end-user through parametrization. This is applied to the proof of soundness of data manipulation operations at the machine level for periodic synchronous safety critical embedded software. The main novelties are the design principle of static analyzers by refinement and adaptation through parametrization, the symbolic manipulation of expressions to improve the precision of abstract transfer functions, the octagon, ellipsoid, and decision tree abstract domains, all with sound handling of rounding errors in floating point computations, widening strategies (with thresholds, delayed) and the automatic determination of the parameters (parametrized packing)
    • …
    corecore