12 research outputs found

    The Separate Compilation Assumption

    Get PDF
    Call graphs are an essential requirement for almost all inter-procedural analyses. This motivated the development of many tools and frameworks to generate the call graph of a given program. However, the majority of these tools focus on generating the call graph of the whole program (i.e., both the application and the libraries that the application depends on). A popular compromise to the excessive cost of building a call graph for the whole program is to build an application-only call graph. To achieve this, all the effects of the library code and any calls that the library makes back into the application are usually ignored. This results in potential unsoundness in the generated call graph and therefore in analyses that use it. Additionally, the scope of the application classes to be analyzed by such an algorithm has been often arbitrarily defined. In this thesis, we define the separate compilation assumption, which clearly defines the division between the application and the library based on the fact that the code of the library has to be compiled without access to the code of the application. We then use this assumption to define more specific constraints on how the library code can interact with the application code. These constraints make it possible to generate sound and reasonably precise call graphs without analyzing libraries. We investigate whether the separate compilation assumption can be encoded universally in Java bytecode, such that all existing whole-program analysis frameworks can easily take advantage of it. We present and evaluate Averroes, a tool that generates a placeholder library that over-approximates the possible behaviour of an original library. The placeholder library can be constructed quickly without analyzing the whole program, and is typically in the order of 80 kB of class files (comparatively, the Java standard library is 25 MB). Any existing whole-program call graph construction framework can use the placeholder library as a replacement for the actual libraries to efficiently construct a sound and precise application call graph. Averroes improves the analysis time of whole-program call graph construction by a factor of 3.5x to 8x, and reduces memory requirements by a factor of 8.4x to 12x. In addition, Averroes makes it easier for whole-program frameworks to handle reflection soundly in two ways: it is based on conservative assumptions about all behaviour within the library, including reflection, and it provides analyses and tools to model reflection in the application. We also evaluate the precision of the call graphs built with Averroes in existing whole-program frameworks. Finally, we provide a correctness proof for Averroes based on Featherweight Java

    The Requirements Editor RED

    Get PDF

    Eventually Sound Points-To Analysis with Specifications

    Get PDF
    Static analyses make the increasingly tenuous assumption that all source code is available for analysis; for example, large libraries often call into native code that cannot be analyzed. We propose a points-to analysis that initially makes optimistic assumptions about missing code, and then inserts runtime checks that report counterexamples to these assumptions that occur during execution. Our approach guarantees eventual soundness, which combines two guarantees: (i) the runtime checks are guaranteed to catch the first counterexample that occurs during any execution, in which case execution can be terminated to prevent harm, and (ii) only finitely many counterexamples ever occur, implying that the static analysis eventually becomes statically sound with respect to all remaining executions. We implement Optix, an eventually sound points-to analysis for Android apps, where the Android framework is missing. We show that the runtime checks added by Optix incur low overhead on real programs, and demonstrate how Optix improves a client information flow analysis for detecting Android malware

    Lossless, Persisted Summarization of Static Callgraph, Points-To and Data-Flow Analysis

    Get PDF
    Static analysis is used to automatically detect bugs and security breaches, and aids compiler optimization. Whole-program analysis (WPA) can yield high precision, however causes long analysis times and thus does not match common software-development workflows, making it often impractical to use for large, real-world applications. This paper thus presents the design and implementation of ModAlyzer, a novel static-analysis approach that aims at accelerating whole-program analysis by making the analysis modular and compositional. It shows how to compute lossless, persisted summaries for callgraph, points-to and data-flow information, and it reports under which circumstances this function-level compositional analysis outperforms WPA. We implemented ModAlyzer as an extension to LLVM and PhASAR, and applied it to 12 real-world C and C++ applications. At analysis time, ModAlyzer modularly and losslessly summarizes the analysis effect of the library code those applications share, hence avoiding its repeated re-analysis. The experimental results show that the reuse of these summaries can save, on average, 72% of analysis time over WPA. Moreover, because it is lossless, the module-wise analysis fully retains precision and recall. Surprisingly, as our results show, it sometimes even yields precision superior to WPA. The initial summary generation, on average, takes about 3.67 times as long as WPA

    Call Graphs for Languages with Parametric Polymorphism

    Get PDF
    The performance of contemporary object oriented languages depends on optimizations such as devirtualization, inlining, and specialization, and these in turn depend on precise call graph analysis. Existing call graph analyses do not take advantage of the information provided by the rich type systems of contemporary languages, in particular generic type arguments. Many existing approaches analyze Java bytecode, in which generic types have been erased. This paper shows that this discarded information is actually very useful as the context in a context-sensitive analysis, where it significantly improves precision and keeps the running time small. Specifically, we propose and evaluate call graph construction algorithms in which the contexts of a method are (i) the type arguments passed to its type parameters, and (ii) the static types of the arguments passed to its term parameters. The use of static types from the caller as context is effective because it allows more precise dispatch of call sites inside the callee. Our evaluation indicates that the average number of contexts required per method is small. We implement the analysis in the Dotty compiler for Scala, and evaluate it on programs that use the type-parametric Scala collections library and on the Dotty compiler itself. The context-sensitive analysis runs 1.4x faster than a context-insensitive one and discovers 20\% more monomorphic call sites at the same time. When applied to method specialization, the imprecision in a context-insensitive call graph would require the average method to be cloned 22 times, whereas the context-sensitive call graph indicates a much more practical 1.00 to 1.50 clones per method

    Man-machine partial program analysis for malware detection

    Get PDF
    With the meteoric rise in popularity of the Android platform, there is an urgent need to combat the accompanying proliferation of malware. Existing work addresses the area of consumer malware detection, but cannot detect novel, sophisticated, domain-specific malware that is targeted specifically at one aspect of an organization (eg. ground operations of the US Military). Adversaries can exploit domain knowledge to camoflauge malice within the legitimate behaviors of an app and behind a domain-specific trigger, rendering traditional approaches such as signature-matching, machine learning, and dynamic monitoring ineffective. Manual code inspections are also inadequate, scaling poorly and introducing human error. Yet, there is a dire need to detect this kind of malware before it causes catastrophic loss of life and property. This dissertation presents the Security Toolbox, our novel solution for this challenging new problem posed by DARPA\u27s Automated Program Analysis for Cybersecurity (APAC) program. We employ a human-in-the-loop approach to amplify the natural intelligence of our analysts. Our automation detects interesting program behaviors and exposes them in an analysis Dashboard, allowing the analyst to brainstorm flaw hypotheses and ask new questions, which in turn can be answered by our automated analysis primitives. The Security Toolbox is built on top of Atlas, a novel program analysis platform made by EnSoft. Atlas uses a graph-based mathematical abstraction of software to produce a unified property multigraph, exposes a powerful API for writing analyzers using graph traversals, and provides both automated and interactive capabilities to facilitate program comprehension. The Security Toolbox is also powered by FlowMiner, a novel solution to mine fine-grained, compact data flow summaries of Java libraries. FlowMiner allows the Security Toolbox to complete a scalable and accurate partial program analysis of an application without including all of the libraries that it uses (eg. Android). This dissertation presents the Security Toolbox, Atlas, and FlowMiner. We provide empirical evidence of the effectiveness of the Security Toolbox for detecting novel, sophisticated, domain-specific Android malware, demonstrating that our approach outperforms other cutting-edge research tools and state-of-the-art commercial programs in both time and accuracy metrics. We also evaluate the effectiveness of Atlas as a program analysis platform and FlowMiner as a library summary tool

    Proceedings of the joint track "Tools", "Demos", and "Posters" of ECOOP, ECSA, and ECMFA, 2013

    Get PDF

    Design and implementation of an optimizing type-centric compiler for a high-level language

    Get PDF
    Production compilers for programming languages face multiple requirements. They should be correct, as we rely on them to produce code. They should be fast, in order to provide a good developer experience. They should also be easy to maintain and evolve. This thesis shows how an expressive high level type system can be used to simplify the development of a compiler and demonstrates this on a compiler for Scala. First, it shows how expressive types of high level languages can be used to build internal data structures that provide a statically checked API, ensuring that important properties hold at compile time. Second, we also show how high level language features can be used to abstract the components of a compiler. We demonstrate this by introducing a type-safe layer on top of the bytecode emission phase. This makes it possible to abstract away the implementation details of the compiler frontend and run the same bytecode emission phase in two different Scala compilers. Third, it presents MiniPhases, a novel way to organize transformation passes in a compiler. MiniPhases impose constraints on the organization of passes that are beneficial for maintainability, performance, and testability. We include a detailed performance evaluation of MiniPhases which indicates that their speedup is due to improved cache friendliness and to a lower rate of promotions of objects into the old generations of garbage collectors. Finally, we demonstrate how the expressive type system of the language being compiled can be used for static analysis. We present a novel call graph construction algorithm which uses the typing context for context sensitivity. The resulting algorithm is both substantially faster and more precise than existing alternatives. We demonstrate the applicability of this analysis by extending common subexpression elimination to idempotent expression elimination

    On the Scalability of Static Program Analysis to Detect Vulnerabilities in the Java Platform

    Get PDF
    Java has been a target for many zero-day exploits in the past years. We investigate one category of vulnerabilities used by many of these exploits. Attackers make use of so called unguarded caller-sensitive methods. While these methods provide features that can be dangerous if used in malicious ways, they perform only limited permission checks to restrict access by untrusted code. We derive a taint-analysis problem expressing how vulnerabilities regarding these methods can be detected automatically in the Java Class Library before its code is being released to the public. Unfortunately, while describing the analysis problem is relatively simple, it is challenging to actually implement the analysis. The goal of analyzing a library of the size as the Java Class Library raises scalability problems. Moreover, analyzing a library while assuming attackers can write arbitrary untrusted code results in mostly all parts of the library being accessible. Most existing approaches target the analysis of an application, which is less of a problem, because usually only small parts of the library are used by applications. Besides the fact that existing algorithms run into scalability problems we found that many of them are also not sound when applied to the problem. For example, standard call-graph algorithms produce unsound call graphs when only applied to a library. While the algorithms provide correct results for applications, they are also used when only a library is analyzed---the incompleteness of the results is then usually ignored. The requirements for this work do not allow to ignore that, as otherwise security-critical vulnerabilities may remain undetected. In this work we propose novel algorithms addressing the soundness and scalability problems. We discuss and solve practical challenges: we show a software design for the analysis such that it is still maintainable with growing complexity, and extend an existing algorithm to enrich results with exact data-flow information enabling comprehensible reporting. In experiments we show that designing the analysis to work forward and backward from inner layers to outer layers of the program results in better scalability. We investigate the challenge to track fields in a flow-sensitive and context-sensitive analysis and discuss several threats to scalability arising with field-based and field-sensitive data-flow models. In experiments comparing these against each other and against a novel approach proposed in this work, we show that our new approach successfully solves most of the scalability problems
    corecore