7 research outputs found

    On Fast Large-Scale Program Analysis in Datalog

    Get PDF
    Designing and crafting a static program analysis is challenging due to the complexity of the task at hand. Among the challenges are modelling the semantics of the input language, finding suitable abstractions for the analysis, and handwriting efficient code for the analysis in a traditional imperative language such as C++. Hence, the development of static program analysis tools is costly in terms of development time and resources for real world languages. To overcome, or at least alleviate the costs of developing a static program analysis, Datalog has been proposed as a domain specific language (DSL).With Datalog, a designer expresses a static program analysis in the form of a logical specification. While a domain specific language approach aids in the ease of development of program analyses, it is commonly accepted that such an approach has worse runtime performance than handcrafted static analysis tools. In this work, we introduce a new program synthesis methodology for Datalog specifications to produce highly efficient monolithic C++ analyzers. The synthesis technique requires the re-interpretation of the semi-naĂŻve evaluation as a scaffolding for translation using partial evaluation. To achieve high-performance, we employ staged compilation techniques and specialize the underlying relational data structures for a given Datalog specification. Experimentation on benchmarks for large-scale program analysis validates the superior performance of our approach over available Datalog tools and demonstrates our competitiveness with state-of-the-art handcrafted tools

    An Instantaneous Framework For Concurrency Bug Detection

    Get PDF
    Concurrency bug detection is important to guarantee the correct behavior of multithread programs. However, existing static techniques are expensive with false positives, and dynamic analyses cannot expose all potential bugs. This thesis presents an ultra-efficient concurrency analysis framework, D4, that detects concurrency bugs (e.g., data races and deadlocks) “instantly” in the programming phase. As developers add, modify, and remove statements, the changes are sent to D4 to detect concurrency bugs on-the-fly, which in turn provides immediate feedback to the developer of the new bugs. D4 includes a novel system design and two novel parallel incremental algorithms that embrace both change and parallelization for fundamental static analyses of concurrent programs. Both algorithms react to program changes by memoizing the analysis results and only recomputing the impact of a change in parallel without any redundant computation. Our evaluation on an extensive collection of large real-world applications shows that D4 efficiently pinpoints concurrency bugs within 10ms on average after a code change, several orders of magnitude faster than both the exhaustive analysis and the state-of-the-art incremental techniques

    On the Practice and Application of Context-Free Language Reachability

    Get PDF
    The Context-Free Language Reachability (CFL-R) formalism relates to some of the most important computational problems facing researchers and industry practitioners. CFL-R is a generalisation of graph reachability and language recognition, such that pairs in a labelled graph are reachable if and only if there is a path between them whose labels, joined together in the order they were encountered, spell a word in a given context-free language. The formalism finds particular use as a vehicle for phrasing and reasoning about program analysis, since complex relationships within the data, logic or structure of computer programs are easily expressed and discovered in CFL-R. Unfortunately, The potential of CFL-R can not be met by state of the art solvers. Current algorithms have scalability and expressibility issues that prevent them from being used on large graph instances or complex grammars. This work outlines our efforts in understanding the practical concerns surrounding CFL-R, and applying this knowledge to improve the performance of CFL-R applications. We examine the major difficulties with solving CFL-R-based analyses at-scale, via a case-study of points-to analysis as a CFL-R problem. Points-to analysis is fundamentally important to many modern research and industry efforts, and is relevant to optimisation, bug-checking and security technologies. Our understanding of the scalability challenge motivates work in developing practical CFL-R techniques. We present improved evaluation algorithms and declarative optimisation techniques for CFL-R, capitalising on the simplicity of CFL-R to creating fully automatic methodologies. The culmination of our work is a general-purpose and high-performance tool called Cauliflower, a solver-generator for CFL-R problems. We describe Cauliflower and evaluate its performance experimentally, showing significant improvement over alternative general techniques

    A Hybrid Approach to Logic Evaluation

    Get PDF
    In this thesis, we contribute the hybrid approach – a means of combining the practical advantages of feature-rich logic evaluation in the cloud, with the performance benefits of hand-written, optimized, efficient native code. In the first part of our hybrid approach, we introduce a cloud-based distribution for logic programs, which may be deployed as a service, in standard cloud environments, across cheap commodity hardware. Modern systems are in the cloud; while distributed logic solvers exist, these systems are highly specialized, requiring expensive, resource intensive hardware infrastructures. Our original technique achieves a fully automatic synthesis of cloud infrastructure for logic programs, and includes a range of practical features not present in existing distributed logic solvers. We show that an implementation of the distribution scales effectively within real-world cloud environments, against a distribution over cores of the same machine. We show that our multi-node distribution may be effectively combined with existing multi-threaded techniques to mitigate the network communication cost incurred by distribution. In the second part of our hybrid approach, we introduce extra-logical algorithms, to achieve performance for logic programs that would not be possible within a bottom-up logic evaluation. Modern systems must deliver high performance on big data; however, even the most powerful logic engines, distributed or otherwise, can be beaten by hand-written code on particular problems. We give a novel implementation of a system for the high-impact problem of sink-reachability, designed such that its algorithms may be used in logic programs. A thorough empirical evaluation, across a range of large-scale, real-world datasets, shows our system outperforms the current state of the art for the sink-reachability problem in all cases. Our hybrid approach addresses the two major deficiencies of modern logic systems, providing a practical means of evaluating logic in distributed cloud-based environments, while offering performance gains for specific high-impact problems that would not be possible using logic programming alone

    Empirical studies of structural phenomena using a curated corpus of Java code

    Full text link
    Contrary to 50 years\u27 worth of advice in the instructional literature on software design, long cyclic dependencies are found to be widespread in sizeable, curated corpus of real Java software. Among their causes may be overuse of static members, underuse of dependency injection and poor tool support for avoiding them.<br /
    corecore