367 research outputs found

    Pruning, Pushdown Exception-Flow Analysis

    Full text link
    Statically reasoning in the presence of exceptions and about the effects of exceptions is challenging: exception-flows are mutually determined by traditional control-flow and points-to analyses. We tackle the challenge of analyzing exception-flows from two angles. First, from the angle of pruning control-flows (both normal and exceptional), we derive a pushdown framework for an object-oriented language with full-featured exceptions. Unlike traditional analyses, it allows precise matching of throwers to catchers. Second, from the angle of pruning points-to information, we generalize abstract garbage collection to object-oriented programs and enhance it with liveness analysis. We then seamlessly weave the techniques into enhanced reachability computation, yielding highly precise exception-flow analysis, without becoming intractable, even for large applications. We evaluate our pruned, pushdown exception-flow analysis, comparing it with an established analysis on large scale standard Java benchmarks. The results show that our analysis significantly improves analysis precision over traditional analysis within a reasonable analysis time.Comment: 14th IEEE International Working Conference on Source Code Analysis and Manipulatio

    Sound and Precise Malware Analysis for Android via Pushdown Reachability and Entry-Point Saturation

    Full text link
    We present Anadroid, a static malware analysis framework for Android apps. Anadroid exploits two techniques to soundly raise precision: (1) it uses a pushdown system to precisely model dynamically dispatched interprocedural and exception-driven control-flow; (2) it uses Entry-Point Saturation (EPS) to soundly approximate all possible interleavings of asynchronous entry points in Android applications. (It also integrates static taint-flow analysis and least permissions analysis to expand the class of malicious behaviors which it can catch.) Anadroid provides rich user interface support for human analysts which must ultimately rule on the "maliciousness" of a behavior. To demonstrate the effectiveness of Anadroid's malware analysis, we had teams of analysts analyze a challenge suite of 52 Android applications released as part of the Auto- mated Program Analysis for Cybersecurity (APAC) DARPA program. The first team analyzed the apps using a ver- sion of Anadroid that uses traditional (finite-state-machine-based) control-flow-analysis found in existing malware analysis tools; the second team analyzed the apps using a version of Anadroid that uses our enhanced pushdown-based control-flow-analysis. We measured machine analysis time, human analyst time, and their accuracy in flagging malicious applications. With pushdown analysis, we found statistically significant (p < 0.05) decreases in time: from 85 minutes per app to 35 minutes per app in human plus machine analysis time; and statistically significant (p < 0.05) increases in accuracy with the pushdown-driven analyzer: from 71% correct identification to 95% correct identification.Comment: Appears in 3rd Annual ACM CCS workshop on Security and Privacy in SmartPhones and Mobile Devices (SPSM'13), Berlin, Germany, 201

    Pushdown automata in statistical machine translation

    Get PDF
    This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. </jats:p

    Doctor of Philosophy

    Get PDF
    dissertationToday's smartphones house private and confidential data ubiquitously. Mobile apps running on the devices can leak sensitive information by accident or intentionally. To understand application behaviors before running a program, we need to statically analyze it, tracking what data are accessed, where sensitive data ow, and what operations are performed with the data. However, automated identification of malicious behaviors in Android apps is challenging: First, there is a primary challenge in analyzing object-oriented programs precisely, soundly and efficiently, especially in the presence of exceptions. Second, there is an Android-specific challenge|asynchronous execution of multiple entry points. Third, the maliciousness of any given behavior is application-dependent and subject to human judgment. In this work, I develop a generic, highly precise static analysis of object-oriented code with multiple entry points, on which I construct an eective malware identification system with a human in the loop. Specically, I develop a new analysis-pushdown exception-ow analysis, to generalize the analysis of normal control flows and exceptional flows in object-oriented programs. To rene points-to information, I generalize abstract garbage collection to object-oriented programs and enhance it with liveness analysis for even better precision. To tackle Android-specic challenges, I develop multientry point saturation to approximate the eect of arbitrary asynchronous events. To apply the analysis techniques to security, I develop a static taint- ow analysis to track and propagate tainted sensitive data in the push-down exception-flow framework. To accelerate the speed of static analysis, I develop a compact and ecient encoding scheme, called G odel hashes, and integrate it into the analysis framework. All the techniques are realized and evaluated in a system, named AnaDroid. AnaDroid is designed with a human in the loop to specify analysis conguration, properties of interest and then to make the nal judgment and identify where the maliciousness is, based on analysis results. The analysis results include control- ow graphs highlighting suspiciousness, permission and risk-ranking reports. The experiments show that AnaDroid can lead to precise and fast identication of common classes of Android malware

    Doctor of Philosophy in Computer Science

    Get PDF
    dissertationControl-flow analysis of higher-order languages is a difficult problem, yet an important one. It aids in enabling optimizations, improved reliability, and improved security of programs written in these languages. This dissertation explores three techniques to improve the precision and speed of a small-step abstract interpreter: using a priority work list, environment unrolling, and strong function call. In an abstract interpreter, the interpreter is no longer deterministic and choices can be made in how the abstract state space is explored and trade-offs exist. A priority queue is one option. There are also many ways to abstract the concrete interpreter. Environment unrolling gives a slightly different approach than is usually taken, by holding off abstraction in order to gain precision, which can lead to a faster analysis. Strong function call is an approach to clean up some of the imprecision when making a function call that is introduced when abstractly interpreting a program. An alternative approach to building an abstract interpreter to perform static analysis is through the use of constraint solving. Existing techniques to do this have been developed over the last several decades. This dissertation maps these constraints to three different problems, allowing control-flow analysis of higher-order languages to be solved with tools that are already mature and well developed. The control-flow problem is mapped to pointer analysis of first-order languages, SAT, and linear-algebra operations. These mappings allow for fast and parallel implementations of control-flow analysis of higher-order languages. A recent development in the field of static analysis has been pushdown control-flow analysis, which is able to precisely match calls and returns, a weakness in the existing techniques. This dissertation also provides an encoding of pushdown control-flow analysis to linear-algebra operations. In the process, it demonstrates that under certain conditions (monovariance and flow insensitivity) that in terms of precision, a pushdown control-flow analysis is in fact equivalent to a direct style constraint-based formulation

    A sequence-length sensitive approach to learning biological grammars using inductive logic programming.

    Get PDF
    This thesis aims to investigate if the ideas behind compression principles, such as the Minimum Description Length, can help us to improve the process of learning biological grammars from protein sequences using Inductive Logic Programming (ILP). Contrary to most traditional ILP learning problems, biological sequences often have a high variation in their length. This variation in length is an important feature of biological sequences which should not be ignored by ILP systems. However we have identified that some ILP systems do not take into account the length of examples when evaluating their proposed hypotheses. During the learning process, many ILP systems use clause evaluation functions to assign a score to induced hypotheses, estimating their quality and effectively influencing the search. Traditionally, clause evaluation functions do not take into account the length of the examples which are covered by the clause. We propose L-modification, a way of modifying existing clause evaluation functions so that they take into account the length of the examples which they learn from. An empirical study was undertaken to investigate if significant improvements can be achieved by applying L-modification to a standard clause evaluation function. Furthermore, we generally investigated how ILP systems cope with the length of examples in training data. We show that our L-modified clause evaluation function outperforms our benchmark function in every experiment we conducted and thus we prove that L-modification is a useful concept. We also show that the length of the examples in the training data used by ILP systems does have an undeniable impact on the results

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 10980 and 10981 constitutes the refereed proceedings of the 30th International Conference on Computer Aided Verification, CAV 2018, held in Oxford, UK, in July 2018. The 52 full and 13 tool papers presented together with 3 invited papers and 2 tutorials were carefully reviewed and selected from 215 submissions. The papers cover a wide range of topics and techniques, from algorithmic and logical foundations of verification to practical applications in distributed, networked, cyber-physical, and autonomous systems. They are organized in topical sections on model checking, program analysis using polyhedra, synthesis, learning, runtime verification, hybrid and timed systems, tools, probabilistic systems, static analysis, theory and security, SAT, SMT and decisions procedures, concurrency, and CPS, hardware, industrial applications

    Inductive Program Synthesis via Iterative Forward-Backward Abstract Interpretation

    Full text link
    A key challenge in example-based program synthesis is the gigantic search space of programs. To address this challenge, various work proposed to use abstract interpretation to prune the search space. However, most of existing approaches have focused only on forward abstract interpretation, and thus cannot fully exploit the power of abstract interpretation. In this paper, we propose a novel approach to inductive program synthesis via iterative forward-backward abstract interpretation. The forward abstract interpretation computes possible outputs of a program given inputs, while the backward abstract interpretation computes possible inputs of a program given outputs. By iteratively performing the two abstract interpretations in an alternating fashion, we can effectively determine if any completion of each partial program as a candidate can satisfy the input-output examples. We apply our approach to a standard formulation, syntax-guided synthesis (SyGuS), thereby supporting a wide range of inductive synthesis tasks. We have implemented our approach and evaluated it on a set of benchmarks from the prior work. The experimental results show that our approach significantly outperforms the state-of-the-art approaches thanks to the sophisticated abstract interpretation techniques

    Environment-Sensitive Intrusion Detection

    Get PDF
    Abstract. We perform host-based intrusion detection by constructing a model from a program’s binary code and then restricting the program’s execution by the model. We improve the effectiveness of such model-based intrusion detection systems by incorporating into the model knowledge of the environment in which the program runs, and by increasing the accuracy of our models with a new dataflow analysis algorithm for context-sensitive recovery of static data. The environment—configuration files, command-line parameters, and environment variables—constrains acceptable process execution. Environment dependencies added to a program model update the model to the current environment at every program execution. Our new static data-flow analysis associates a program’s data flows with specific calling contexts that use the data. We use this analysis to differentiate systemcall arguments flowing from distinct call sites in the program. Using a new average reachability measure suitable for evaluation of call-stackbased program models, we demonstrate that our techniques improve the precision of several test programs ’ models from 76 % to 100%
    • …
    corecore