367 research outputs found
Pruning, Pushdown Exception-Flow Analysis
Statically reasoning in the presence of exceptions and about the effects of
exceptions is challenging: exception-flows are mutually determined by
traditional control-flow and points-to analyses. We tackle the challenge of
analyzing exception-flows from two angles. First, from the angle of pruning
control-flows (both normal and exceptional), we derive a pushdown framework for
an object-oriented language with full-featured exceptions. Unlike traditional
analyses, it allows precise matching of throwers to catchers. Second, from the
angle of pruning points-to information, we generalize abstract garbage
collection to object-oriented programs and enhance it with liveness analysis.
We then seamlessly weave the techniques into enhanced reachability computation,
yielding highly precise exception-flow analysis, without becoming intractable,
even for large applications. We evaluate our pruned, pushdown exception-flow
analysis, comparing it with an established analysis on large scale standard
Java benchmarks. The results show that our analysis significantly improves
analysis precision over traditional analysis within a reasonable analysis time.Comment: 14th IEEE International Working Conference on Source Code Analysis
and Manipulatio
Sound and Precise Malware Analysis for Android via Pushdown Reachability and Entry-Point Saturation
We present Anadroid, a static malware analysis framework for Android apps.
Anadroid exploits two techniques to soundly raise precision: (1) it uses a
pushdown system to precisely model dynamically dispatched interprocedural and
exception-driven control-flow; (2) it uses Entry-Point Saturation (EPS) to
soundly approximate all possible interleavings of asynchronous entry points in
Android applications. (It also integrates static taint-flow analysis and least
permissions analysis to expand the class of malicious behaviors which it can
catch.) Anadroid provides rich user interface support for human analysts which
must ultimately rule on the "maliciousness" of a behavior.
To demonstrate the effectiveness of Anadroid's malware analysis, we had teams
of analysts analyze a challenge suite of 52 Android applications released as
part of the Auto- mated Program Analysis for Cybersecurity (APAC) DARPA
program. The first team analyzed the apps using a ver- sion of Anadroid that
uses traditional (finite-state-machine-based) control-flow-analysis found in
existing malware analysis tools; the second team analyzed the apps using a
version of Anadroid that uses our enhanced pushdown-based
control-flow-analysis. We measured machine analysis time, human analyst time,
and their accuracy in flagging malicious applications. With pushdown analysis,
we found statistically significant (p < 0.05) decreases in time: from 85
minutes per app to 35 minutes per app in human plus machine analysis time; and
statistically significant (p < 0.05) increases in accuracy with the
pushdown-driven analyzer: from 71% correct identification to 95% correct
identification.Comment: Appears in 3rd Annual ACM CCS workshop on Security and Privacy in
SmartPhones and Mobile Devices (SPSM'13), Berlin, Germany, 201
Pushdown automata in statistical machine translation
This article describes the use of pushdown automata (PDA) in the context of statistical machine translation and alignment under a synchronous context-free grammar. We use PDAs to compactly represent the space of candidate translations generated by the grammar when applied to an input sentence. General-purpose PDA algorithms for replacement, composition, shortest path, and expansion are presented. We describe HiPDT, a hierarchical phrase-based decoder using the PDA representation and these algorithms. We contrast the complexity of this decoder with a decoder based on a finite state automata representation, showing that PDAs provide a more suitable framework to achieve exact decoding for larger synchronous context-free grammars and smaller language models. We assess this experimentally on a large-scale Chinese-to-English alignment and translation task. In translation, we propose a two-pass decoding strategy involving a weaker language model in the first-pass to address the results of PDA complexity analysis. We study in depth the experimental conditions and tradeoffs in which HiPDT can achieve state-of-the-art performance for large-scale SMT. </jats:p
Doctor of Philosophy
dissertationToday's smartphones house private and confidential data ubiquitously. Mobile apps running on the devices can leak sensitive information by accident or intentionally. To understand application behaviors before running a program, we need to statically analyze it, tracking what data are accessed, where sensitive data ow, and what operations are performed with the data. However, automated identification of malicious behaviors in Android apps is challenging: First, there is a primary challenge in analyzing object-oriented programs precisely, soundly and efficiently, especially in the presence of exceptions. Second, there is an Android-specific challenge|asynchronous execution of multiple entry points. Third, the maliciousness of any given behavior is application-dependent and subject to human judgment. In this work, I develop a generic, highly precise static analysis of object-oriented code with multiple entry points, on which I construct an eective malware identification system with a human in the loop. Specically, I develop a new analysis-pushdown exception-ow analysis, to generalize the analysis of normal control flows and exceptional flows in object-oriented programs. To rene points-to information, I generalize abstract garbage collection to object-oriented programs and enhance it with liveness analysis for even better precision. To tackle Android-specic challenges, I develop multientry point saturation to approximate the eect of arbitrary asynchronous events. To apply the analysis techniques to security, I develop a static taint- ow analysis to track and propagate tainted sensitive data in the push-down exception-flow framework. To accelerate the speed of static analysis, I develop a compact and ecient encoding scheme, called G odel hashes, and integrate it into the analysis framework. All the techniques are realized and evaluated in a system, named AnaDroid. AnaDroid is designed with a human in the loop to specify analysis conguration, properties of interest and then to make the nal judgment and identify where the maliciousness is, based on analysis results. The analysis results include control- ow graphs highlighting suspiciousness, permission and risk-ranking reports. The experiments show that AnaDroid can lead to precise and fast identication of common classes of Android malware
Doctor of Philosophy in Computer Science
dissertationControl-flow analysis of higher-order languages is a difficult problem, yet an important one. It aids in enabling optimizations, improved reliability, and improved security of programs written in these languages. This dissertation explores three techniques to improve the precision and speed of a small-step abstract interpreter: using a priority work list, environment unrolling, and strong function call. In an abstract interpreter, the interpreter is no longer deterministic and choices can be made in how the abstract state space is explored and trade-offs exist. A priority queue is one option. There are also many ways to abstract the concrete interpreter. Environment unrolling gives a slightly different approach than is usually taken, by holding off abstraction in order to gain precision, which can lead to a faster analysis. Strong function call is an approach to clean up some of the imprecision when making a function call that is introduced when abstractly interpreting a program. An alternative approach to building an abstract interpreter to perform static analysis is through the use of constraint solving. Existing techniques to do this have been developed over the last several decades. This dissertation maps these constraints to three different problems, allowing control-flow analysis of higher-order languages to be solved with tools that are already mature and well developed. The control-flow problem is mapped to pointer analysis of first-order languages, SAT, and linear-algebra operations. These mappings allow for fast and parallel implementations of control-flow analysis of higher-order languages. A recent development in the field of static analysis has been pushdown control-flow analysis, which is able to precisely match calls and returns, a weakness in the existing techniques. This dissertation also provides an encoding of pushdown control-flow analysis to linear-algebra operations. In the process, it demonstrates that under certain conditions (monovariance and flow insensitivity) that in terms of precision, a pushdown control-flow analysis is in fact equivalent to a direct style constraint-based formulation
A sequence-length sensitive approach to learning biological grammars using inductive logic programming.
This thesis aims to investigate if the ideas behind compression principles, such as the Minimum Description Length, can help us to improve the process of learning biological grammars from protein sequences using Inductive Logic Programming (ILP). Contrary to most traditional ILP learning problems, biological sequences often have a high variation in their length. This variation in length is an important feature of biological sequences which should not be ignored by ILP systems. However we have identified that some ILP systems do not take into account the length of examples when evaluating their proposed hypotheses. During the learning process, many ILP systems use clause evaluation functions to assign a score to induced hypotheses, estimating their quality and effectively influencing the search. Traditionally, clause evaluation functions do not take into account the length of the examples which are covered by the clause. We propose L-modification, a way of modifying existing clause evaluation functions so that they take into account the length of the examples which they learn from. An empirical study was undertaken to investigate if significant improvements can be achieved by applying L-modification to a standard clause evaluation function. Furthermore, we generally investigated how ILP systems cope with the length of examples in training data. We show that our L-modified clause evaluation function outperforms our benchmark function in every experiment we conducted and thus we prove that L-modification is a useful concept. We also show that the length of the examples in the training data used by ILP systems does have an undeniable impact on the results
Computer Aided Verification
This open access two-volume set LNCS 10980 and 10981 constitutes the refereed proceedings of the 30th International Conference on Computer Aided Verification, CAV 2018, held in Oxford, UK, in July 2018. The 52 full and 13 tool papers presented together with 3 invited papers and 2 tutorials were carefully reviewed and selected from 215 submissions. The papers cover a wide range of topics and techniques, from algorithmic and logical foundations of verification to practical applications in distributed, networked, cyber-physical, and autonomous systems. They are organized in topical sections on model checking, program analysis using polyhedra, synthesis, learning, runtime verification, hybrid and timed systems, tools, probabilistic systems, static analysis, theory and security, SAT, SMT and decisions procedures, concurrency, and CPS, hardware, industrial applications
Inductive Program Synthesis via Iterative Forward-Backward Abstract Interpretation
A key challenge in example-based program synthesis is the gigantic search
space of programs. To address this challenge, various work proposed to use
abstract interpretation to prune the search space. However, most of existing
approaches have focused only on forward abstract interpretation, and thus
cannot fully exploit the power of abstract interpretation. In this paper, we
propose a novel approach to inductive program synthesis via iterative
forward-backward abstract interpretation. The forward abstract interpretation
computes possible outputs of a program given inputs, while the backward
abstract interpretation computes possible inputs of a program given outputs. By
iteratively performing the two abstract interpretations in an alternating
fashion, we can effectively determine if any completion of each partial program
as a candidate can satisfy the input-output examples. We apply our approach to
a standard formulation, syntax-guided synthesis (SyGuS), thereby supporting a
wide range of inductive synthesis tasks. We have implemented our approach and
evaluated it on a set of benchmarks from the prior work. The experimental
results show that our approach significantly outperforms the state-of-the-art
approaches thanks to the sophisticated abstract interpretation techniques
Environment-Sensitive Intrusion Detection
Abstract. We perform host-based intrusion detection by constructing a model from a program’s binary code and then restricting the program’s execution by the model. We improve the effectiveness of such model-based intrusion detection systems by incorporating into the model knowledge of the environment in which the program runs, and by increasing the accuracy of our models with a new dataflow analysis algorithm for context-sensitive recovery of static data. The environment—configuration files, command-line parameters, and environment variables—constrains acceptable process execution. Environment dependencies added to a program model update the model to the current environment at every program execution. Our new static data-flow analysis associates a program’s data flows with specific calling contexts that use the data. We use this analysis to differentiate systemcall arguments flowing from distinct call sites in the program. Using a new average reachability measure suitable for evaluation of call-stackbased program models, we demonstrate that our techniques improve the precision of several test programs ’ models from 76 % to 100%
- …