44 research outputs found

    Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript

    Get PDF
    In static analysis of modern JavaScript libraries, relational analysis at key locations is critical to provide sound and useful results. Prior work addresses this challenge by the use of various forms of trace partitioning and syntactic patterns, which is fragile and does not scale well, or by incorporating complex backwards analysis. In this paper, we propose a new lightweight variant of trace partitioning named value partitioning that refines individual abstract values instead of entire abstract states. We describe how this approach can effectively capture important relational properties involving dynamic property accesses, functions with free variables, and predicate functions. Furthermore, we extend an existing JavaScript analyzer with value partitioning and demonstrate experimentally that it is a simple, precise, and efficient alternative to the existing approaches for analyzing widely used JavaScript libraries

    Probabilistic Naming of Functions in Stripped Binaries

    Get PDF
    Debugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) for deployment. We present the design and implementation of punstrip, a tool which combines a probabilistic fingerprint of binary code based on high-level features with a probabilistic graphical model to learn the relationship between function names and program structure. As there are many naming conventions and developer styles, functions from different applications do not necessarily have the exact same name, even if they implement the exact same functionality. We therefore evaluate punstrip across three levels of name matching: exact; an approach based on natural language processing of name components; and using Symbol2Vec, a new embedding of function names based on random walks of function call graphs. We show that our approach is able to recognize functions compiled across different compilers and optimization levels and then demonstrate that punstrip can predict semantically similar function names based on code structure. We evaluate our approach over open source C binaries from the Debian Linux distribution and compare against the state of the art

    Cautiously Optimistic Program Analyses for Secure and Reliable Software

    Full text link
    Modern computer systems still have various security and reliability vulnerabilities. Well-known dynamic analyses solutions can mitigate them using runtime monitors that serve as lifeguards. But the additional work in enforcing these security and safety properties incurs exorbitant performance costs, and such tools are rarely used in practice. Our work addresses this problem by constructing a novel technique- Cautiously Optimistic Program Analysis (COPA). COPA is optimistic- it infers likely program invariants from dynamic observations, and assumes them in its static reasoning to precisely identify and elide wasteful runtime monitors. The resulting system is fast, but also ensures soundness by recovering to a conservatively optimized analysis when a likely invariant rarely fails at runtime. COPA is also cautious- by carefully restricting optimizations to only safe elisions, the recovery is greatly simplified. It avoids unbounded rollbacks upon recovery, thereby enabling analysis for live production software. We demonstrate the effectiveness of Cautiously Optimistic Program Analyses in three areas: Information-Flow Tracking (IFT) can help prevent security breaches and information leaks. But they are rarely used in practice due to their high performance overhead (>500% for web/email servers). COPA dramatically reduces this cost by eliding wasteful IFT monitors to make it practical (9% overhead, 4x speedup). Automatic Garbage Collection (GC) in managed languages (e.g. Java) simplifies programming tasks while ensuring memory safety. However, there is no correct GC for weakly-typed languages (e.g. C/C++), and manual memory management is prone to errors that have been exploited in high profile attacks. We develop the first sound GC for C/C++, and use COPA to optimize its performance (16% overhead). Sequential Consistency (SC) provides intuitive semantics to concurrent programs that simplifies reasoning for their correctness. However, ensuring SC behavior on commodity hardware remains expensive. We use COPA to ensure SC for Java at the language-level efficiently, and significantly reduce its cost (from 24% down to 5% on x86). COPA provides a way to realize strong software security, reliability and semantic guarantees at practical costs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/170027/1/subarno_1.pd

    Scalability-First Pointer Analysis with Self-Tuning Context-Sensitivity

    Get PDF
    Context-sensitivity is important in pointer analysis to ensure high precision, but existing techniques suffer from unpredictable scala- bility. Many variants of context-sensitivity exist, and it is difficult to choose one that leads to reasonable analysis time and obtains high precision, without running the analysis multiple times. We present the Scaler framework that addresses this problem. Scaler efficiently estimates the amount of points-to information that would be needed to analyze each method with different variants of context-sensitivity. It then selects an appropriate variant for each method so that the total amount of points-to information is bounded, while utilizing the available space to maximize precision. Our experimental results demonstrate that Scaler achieves pre- dictable scalability for all the evaluated programs (e.g., speedups can reach 10x for 2-object-sensitivity), while providing a precision that matches or even exceeds that of the best alternative techniques

    Doctor of Philosophy

    Get PDF
    dissertationAggressive random testing tools, or fuzzers, are impressively effective at finding bugs in compilers and programming language runtimes. For example, a single test-case generator has resulted in more than 460 bugs reported for a number of production-quality C compilers. However, fuzzers can be hard to use. The first problem is that failures triggered by random test cases can be difficult to debug because these tests are often large. To report a compiler bug, one must often construct a small test case that triggers the bug. The existing automated test-case reduction technique, delta debugging, is not sufficient to produce small, reportable test cases. A second problem is that fuzzers are indiscriminate: they repeatedly find bugs that may not be severe enough to fix right away. Third, fuzzers tend to generate a large number of test cases that only trigger a few bugs. Some bugs are triggered much more frequently than others, creating needle-in-the-haystack problems. Currently, users rule out undesirable test cases using ad hoc methods such as disallowing problematic features in tests and filtering test results. This dissertation investigates approaches to improving the utility of compiler fuzzers. Two components, an aggressive test-case reducer and a tamer, are added to the fuzzing workflow to make the fuzzer more user friendly. We introduce C-Reduce, an aggressive test-case reducer for C/C++ programs, which exploits rich domain-specific knowledge to output test cases nearly as good as those produced by skilled humans. This reducer produces outputs that are, on average, more than 30 times smaller than those produced by the existing reducer that is most commonly used by compiler engineers. Second, this dissertation formulates and addresses the fuzzer taming problem: given a potentially large number of random test cases that trigger failures, order them such that diverse, interesting test cases are highly ranked. Bug triage can be effectively automated, relying on techniques from machine learning to suppress duplicate bug-triggering test cases and test cases triggering known bugs. An evaluation shows the ability of this tool to solve the fuzzer taming problem for 3,799 test cases triggering 46 bugs in a C compiler

    UniASM: Binary Code Similarity Detection without Fine-tuning

    Full text link
    Binary code similarity detection (BCSD) is widely used in various binary analysis tasks such as vulnerability search, malware detection, clone detection, and patch analysis. Recent studies have shown that the learning-based binary code embedding models perform better than the traditional feature-based approaches. In this paper, we proposed a novel transformer-based binary code embedding model, named UniASM, to learn representations of the binary functions. We designed two new training tasks to make the spatial distribution of the generated vectors more uniform, which can be used directly in BCSD without any fine-tuning. In addition, we proposed a new tokenization approach for binary functions, increasing the token's semantic information while mitigating the out-of-vocabulary (OOV) problem. The experimental results show that UniASM outperforms state-of-the-art (SOTA) approaches on the evaluation dataset. We achieved the average scores of recall@1 on cross-compilers, cross-optimization-levels and cross-obfuscations are 0.72, 0.63, and 0.77, which is higher than existing SOTA baselines. In a real-world task of known vulnerability searching, UniASM outperforms all the current baselines.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Dynamic analysis for concurrent modern C/C++ applications

    Get PDF
    Concurrent programs are executed by multiple threads that run simultaneously. While this allows programs to run more efficiently by utilising multiple processors, it brings with it numerous complications. For example, a program may behave unpredictably or erroneously when multiple threads modify the same memory location in an uncoordinated manner. Issues such as this are difficult to avoid, and when introduced, can break the program in unpredictable ways. Programmers will therefore often turn towards automated tools to aide in the detection of concurrency bugs. The work presented in this thesis aims to provide methods to aid in the creation of tools for the purpose of finding and explaining concurrency bugs. In particular, the following studies have been conducted: Dynamic Race Detection for C/C++11 With the introduction of a weak memory model in C++, many tools that provide dynamic race detection have become outdated, and are unable to adequately identify data races. This work updates an existing data race detection algorithm such that it can identify data races according to this new definition. A method for allowing programs to explore many of the weak behaviours that this new memory model permits is also provided. Record and Replay Much work has gone into record and replay, however, most of this work is focussed on whole system replay, whereby a tool will aim to record as much of the program execution as possible. Contrasting this, the work presented here aims to record as little as possible. This sparse approach has many interesting implications: some programs that were previously out of reach for record and reply become tractable, and vice versa. To back this up, controlled scheduling is introduced that is capable of applying different scheduling strategies, which combined with the record and replay is beneficial for helping to root out bugs. Tool Support Both of the above techniques have been implemented in a tool, tsan11rec, that builds on the tsan dynamic race detection tool. A large experimental evaluation is presented investigating the effectiveness of the enhanced data race detection algorithm when applied to the Firefox and Chromium web browsers, and of the novel approach to record and replay when applied to a diverse set of concurrent applications.Open Acces

    Mutation Testing Advances: An Analysis and Survey

    Get PDF
    corecore