197 research outputs found
Recursive tree traversal dependence analysis
While there has been much work done on analyzing and transforming regular programs that operate over linear arrays and dense matrices, comparatively little has been done to try to carry these optimizations over to programs that operate over heap-based data structures using pointers. Previous work has shown that point blocking, a technique similar to loop tiling in regular programs, can help increase the temporal locality of repeated tree traversals. Point blocking, however, has only been shown to work on tree traversals where each traversal is fully independent and would allow parallelization, greatly limiting the types of applications that this transformation could be applied to.^ The purpose of this study is to develop a new framework for analyzing recursive methods that perform traversals over trees, called tree dependence analysis. This analysis translates dependence analysis techniques for regular programs to the irregular space, identifying the structure of dependences within a recursive method that traverses trees. In this study, a dependence test that exploits the dependence structure of such programs is developed, and is shown to be able to prove the legality of several locality— and parallelism-enhancing transformations, including point blocking. In addition, the analysis is extended with a novel path-dependent, conditional analysis to refine the dependence test and prove the legality of transformations for a wider range of algorithms. These analyses are then used to show that several common algorithms that manipulate trees recursively are amenable to several locality— and parallelism-enhancing transformations. This work shows that classical dependence analysis techniques, which have largely been confined to nested loops over array data structures, can be extended and translated to work for complex, recursive programs that operate over pointer-based data structures
Combatting Advanced Persistent Threat via Causality Inference and Program Analysis
Cyber attackers are becoming more and more sophisticated. In particular, Advanced Persistent Threat (APT) is a new class of attack that targets a specifc organization and compromises systems over a long time without being detected. Over the years, we have seen notorious examples of APTs including Stuxnet which disrupted Iranian nuclear centrifuges and data breaches affecting millions of users. Investigating APT is challenging as it occurs over an extended period of time and the attack process is highly sophisticated and stealthy. Also, preventing APTs is diffcult due to ever-expanding attack vectors.
In this dissertation, we present proposals for dealing with challenges in attack investigation. Specifcally, we present LDX which conducts precise counter-factual causality inference to determine dependencies between system calls (e.g., between input and output system calls) and allows investigators to determine the origin of an attack (e.g., receiving a spam email) and the propagation path of the attack, and assess the consequences of the attack. LDX is four times more accurate and two orders of magnitude faster than state-of-the-art taint analysis techniques. Moreover, we then present a practical model-based causality inference system, MCI, which achieves precise and accurate causality inference without requiring any modifcation or instrumentation in end-user systems.
Second, we show a general protection system against a wide spectrum of attack vectors and methods. Specifcally, we present A2C that prevents a wide range of attacks by randomizing inputs such that any malicious payloads contained in the inputs are corrupted. The protection provided by A2C is both general (e.g., against various attack vectors) and practical (7% runtime overhead)
A Survey of Symbolic Execution Techniques
Many security and software testing applications require checking whether
certain properties of a program hold for any possible usage scenario. For
instance, a tool for identifying software vulnerabilities may need to rule out
the existence of any backdoor to bypass a program's authentication. One
approach would be to test the program using different, possibly random inputs.
As the backdoor may only be hit for very specific program workloads, automated
exploration of the space of possible inputs is of the essence. Symbolic
execution provides an elegant solution to the problem, by systematically
exploring many possible execution paths at the same time without necessarily
requiring concrete inputs. Rather than taking on fully specified input values,
the technique abstractly represents them as symbols, resorting to constraint
solvers to construct actual instances that would cause property violations.
Symbolic execution has been incubated in dozens of tools developed over the
last four decades, leading to major practical breakthroughs in a number of
prominent software reliability applications. The goal of this survey is to
provide an overview of the main ideas, challenges, and solutions developed in
the area, distilling them for a broad audience.
The present survey has been accepted for publication at ACM Computing
Surveys. If you are considering citing this survey, we would appreciate if you
could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing
this survey, we would appreciate if you could use the following BibTeX entry:
http://goo.gl/Hf5Fv
Parallelizing irregular C codes assisted by interprocedural shape analysis
In the new multicore architecture arena, the problem of improving the performance of a code is more in the soft-ware side than in the hardware one. However, optimizing irregular dynamic data structure based codes for such ar-chitectures is not easy, either by hand or compiler assisted. Regarding this last approach, shape analysis is a static tech-nique that achieves abstraction of dynamic memory and can help to disambiguate, quite accurately, memory references in programs that create and traverse recursive data struc-tures. This kind of analysis has promising applicability for accurate data dependence tests in loops or recursive func-tions that traverse dynamic data structures. However, sup-port for interprocedural programs in shape analysis is still a challenge, especially in the presence of recursive func-tions. In this work we present a novel fully context-sensitive interprocedural shape analysis algorithm that supports re-cursion and can be used to uncover parallelism. Our ap-proach is based on three key ideas: i) intraprocedural sup-port based on “Coexistent Links Sets ” to precisely describe the memory configurations during the abstract interpreta-tion of the C code; ii) interprocedural support based on “Recursive Flow Links ” to trace the state of pointers in previous calls; and iii) annotations of the read/written heap locations during the program analysis. We present prelim-inary experiments that reveal that our technique compares favorably with related work, and obtains precise memory abstractions in a variety of recursive programs that create and manipulate dynamic data structures. We have also im-plemented a data dependence test over our interprocedural shape analysis. With this test we have obtained promis-ing results, automatically detecting parallelism in three C codes, which have been successfully parallelized
Dynamic Analysis Techniques for Effective and Efficient Debugging
Debugging is a tedious and time-consuming process for software developers. Therefore, providing effective and efficient debugging tools is essential for improving programmer productivity. Existing tools for debugging suffer from various drawbacks -- general-purpose debuggers provide little guidance for the programmers in locating the bug source while specialized debuggers require knowledge of the type of bug encountered. This dissertation makes several advances in debugging leading to effective, efficient, and extensible framework for interactive debugging of singlethreaded programs and deterministic debugging of multithreaded programs.This dissertation presents the Qzdb debugger for singlethreaded programs that raises the abstraction level of debugging by introducing high-level and powerful state alteration and state inspection capabilities. Case studies on 5 real reported bugs in 5 popular real programs demonstrate its effectiveness. To support integration of specialized debugging algorithms into Qzdb, anew approach for constructing debuggers is developed that employs declarative specification of bug conditions and their root causes, and automatic generation of debugger code. Experiments show that about 3,300 lines of C code are generated automatically from only 8 lines of specification for 6 memory bugs. Thanks to the effective generated bug locators, for the 8 real-worlds bugs we have applied our approach to, users have to examine just 1 to 16 instructions. To reduce the runtime overhead of dynamic analysis used during debugging, relevant input analysis is developed and employed to carry out input simplification and execution simplification which reduce the length of analyzed execution by reducing the input size and limiting the analysis to subset of the execution. Experiments show that relevant input analysis based input simplification algorithm is both efficient and effective -- it only requires 11% to 21% test runs of that needed by standard delta debugging algorithm and generates even smaller inputs.Finally, to demonstrate that the above approach can also be used for debugging multithreaded programs, this dissertation presents DrDebug, a deterministic and cyclic debugging framework. DrDebug allows efficient debugging by tailoring the scope of replay to a buggy execution region and an execution slice of a buggy region. Case studies of real reported concurrency bugs show that the buggy execution region size is less than 1 million instructions and the lengths of buggy execution region and execution slice are less than 15% and 7% of the total execution respectively
Beyond shared memory loop parallelism in the polyhedral model
2013 Spring.Includes bibliographical references.With the introduction of multi-core processors, motivated by power and energy concerns, parallel processing has become main-stream. Parallel programming is much more difficult due to its non-deterministic nature, and because of parallel programming bugs that arise from non-determinacy. One solution is automatic parallelization, where it is entirely up to the compiler to efficiently parallelize sequential programs. However, automatic parallelization is very difficult, and only a handful of successful techniques are available, even after decades of research. Automatic parallelization for distributed memory architectures is even more problematic in that it requires explicit handling of data partitioning and communication. Since data must be partitioned among multiple nodes that do not share memory, the original memory allocation of sequential programs cannot be directly used. One of the main contributions of this dissertation is the development of techniques for generating distributed memory parallel code with parametric tiling. Our approach builds on important contributions to the polyhedral model, a mathematical framework for reasoning about program transformations. We show that many affine control programs can be uniformized only with simple techniques. Being able to assume uniform dependences significantly simplifies distributed memory code generation, and also enables parametric tiling. Our approach implemented in the AlphaZ system, a system for prototyping analyses, transformations, and code generators in the polyhedral model. The key features of AlphaZ are memory re-allocation, and explicit representation of reductions. We evaluate our approach on a collection of polyhedral kernels from the PolyBench suite, and show that our approach scales as well as PLuTo, a state-of-the-art shared memory automatic parallelizer using the polyhedral model. Automatic parallelization is only one approach to dealing with the non-deterministic nature of parallel programming that leaves the difficulty entirely to the compiler. Another approach is to develop novel parallel programming languages. These languages, such as X10, aim to provide highly productive parallel programming environment by including parallelism into the language design. However, even in these languages, parallel bugs remain to be an important issue that hinders programmer productivity. Another contribution of this dissertation is to extend the array dataflow analysis to handle a subset of X10 programs. We apply the result of dataflow analysis to statically guarantee determinism. Providing static guarantees can significantly increase programmer productivity by catching questionable implementations at compile-time, or even while programming
Structured parallelism discovery with hybrid static-dynamic analysis and evaluation technique
Parallel computer architectures have dominated the computing landscape for the
past two decades; a trend that is only expected to continue and intensify, with increasing specialization and heterogeneity. This creates huge pressure across the software
stack to produce programming languages, libraries, frameworks and tools which will
efficiently exploit the capabilities of parallel computers, not only for new software, but
also revitalizing existing sequential code. Automatic parallelization, despite decades of
research, has had limited success in transforming sequential software to take advantage
of efficient parallel execution. This thesis investigates three approaches that use commutativity analysis as the enabler for parallelization. This has the potential to overcome
limitations of traditional techniques.
We introduce the concept of liveness-based commutativity for sequential loops.
We examine the use of a practical analysis utilizing liveness-based commutativity in a
symbolic execution framework. Symbolic execution represents input values as groups
of constraints, consequently deriving the output as a function of the input and enabling
the identification of further program properties. We employ this feature to develop an
analysis and discern commutativity properties between loop iterations. We study the
application of this approach on loops taken from real-world programs in the OLDEN
and NAS Parallel Benchmark (NPB) suites, and identify its limitations and related
overheads.
Informed by these findings, we develop Dynamic Commutativity Analysis (DCA), a
new technique that leverages profiling information from program execution with specific
input sets. Using profiling information, we track liveness information and detect loop
commutativity by examining the code’s live-out values. We evaluate DCA against almost
1400 loops of the NPB suite, discovering 86% of them as parallelizable. Comparing
our results against dependence-based methods, we match the detection efficacy of two
dynamic and outperform three static approaches, respectively. Additionally, DCA is
able to automatically detect parallelism in loops which iterate over Pointer-Linked
Data Structures (PLDSs), taken from wide range of benchmarks used in the literature,
where all other techniques we considered failed. Parallelizing the discovered loops, our
methodology achieves an average speedup of 3.6× across NPB (and up to 55×) and up
to 36.9× for the PLDS-based loops on a 72-core host. We also demonstrate that our
methodology, despite relying on specific input values for profiling each program, is able
to correctly identify parallelism that is valid for all potential input sets.
Lastly, we develop a methodology to utilize liveness-based commutativity, as implemented in DCA, to detect latent loop parallelism in the shape of patterns. Our approach
applies a series of transformations which subsequently enable multiple applications
of DCA over the generated multi-loop code section and match its loop commutativity
outcomes against the expected criteria for each pattern. Applying our methodology on
sets of sequential loops, we are able to identify well-known parallel patterns (i.e., maps,
reduction and scans). This extends the scope of parallelism detection to loops, such
as those performing scan operations, which cannot be determined as parallelizable by
simply evaluating liveness-based commutativity conditions on their original form
Profile-driven parallelisation of sequential programs
Traditional parallelism detection in compilers is performed by means of static analysis
and more specifically data and control dependence analysis. The information that
is available at compile time, however, is inherently limited and therefore restricts the
parallelisation opportunities. Furthermore, applications written in C – which represent
the majority of today’s scientific, embedded and system software – utilise many lowlevel
features and an intricate programming style that forces the compiler to even more
conservative assumptions. Despite the numerous proposals to handle this uncertainty
at compile time using speculative optimisation and parallelisation, the software industry
still lacks any pragmatic approaches that extracts coarse-grain parallelism to exploit
the multiple processing units of modern commodity hardware.
This thesis introduces a novel approach for extracting and exploiting multiple forms
of coarse-grain parallelism from sequential applications written in C. We utilise profiling
information to overcome the limitations of static data and control-flow analysis
enabling more aggressive parallelisation. Profiling is performed using an instrumentation
scheme operating at the Intermediate Representation (Ir) level of the compiler.
In contrast to existing approaches that depend on low-level binary tools and debugging
information, Ir-profiling provides precise and direct correlation of profiling information
back to the Ir structures of the compiler. Additionally, our approach is orthogonal to
existing automatic parallelisation approaches and additional fine-grain parallelism may
be exploited.
We demonstrate the applicability and versatility of the proposed methodology using
two studies that target different forms of parallelism. First, we focus on the exploitation
of loop-level parallelism that is abundant in many scientific and embedded
applications. We evaluate our parallelisation strategy against the Nas and Spec Fp
benchmarks and two different multi-core platforms (a shared-memory Intel Xeon Smp
and a heterogeneous distributed-memory Ibm Cell blade). Empirical evaluation shows
that our approach not only yields significant improvements when compared with state-of-
the-art parallelising compilers, but comes close to and sometimes exceeds the performance
of manually parallelised codes. On average, our methodology achieves 96%
of the performance of the hand-tuned parallel benchmarks on the Intel Xeon platform,
and a significant speedup for the Cell platform. The second study, addresses
the problem of partially sequential loops, typically found in implementations of multimedia
codecs. We develop a more powerful whole-program representation based on the Program Dependence Graph (Pdg) that supports profiling, partitioning and codegeneration
for pipeline parallelism. In addition we demonstrate how this enhances
conventional pipeline parallelisation by incorporating support for multi-level loops and
pipeline stage replication in a uniform and automatic way. Experimental results using a
set of complex multimedia and stream processing benchmarks confirm the effectiveness
of the proposed methodology that yields speedups up to 4.7 on a eight-core Intel Xeon
machine
- …