92 research outputs found

    Program synthesis from polymorphic refinement types

    Get PDF
    We present a method for synthesizing recursive functions that provably satisfy a given specification in the form of a polymorphic refinement type. We observe that such specifications are particularly suitable for program synthesis for two reasons. First, they offer a unique combination of expressive power and decidability, which enables automatic verification—and hence synthesis—of nontrivial programs. Second, a type-based specification for a program can often be effectively decomposed into independent specifications for its components, causing the synthesizer to consider fewer component combinations and leading to a combinatorial reduction in the size of the search space. At the core of our synthesis procedure is a newalgorithm for refinement type checking, which supports specification decomposition. We have evaluated our prototype implementation on a large set of synthesis problems and found that it exceeds the state of the art in terms of both scalability and usability. The tool was able to synthesize more complex programs than those reported in prior work (several sorting algorithms and operations on balanced search trees), as well as most of the benchmarks tackled by existing synthesizers, often starting from a more concise and intuitive user input.National Science Foundation (U.S.) (Grant CCF-1438969)National Science Foundation (U.S.) (Grant CCF-1139056)United States. Defense Advanced Research Projects Agency (Grant FA8750-14-2-0242

    Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required. To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems. In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries. The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms. In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms. Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies

    Physical Design for Non-relational Data Systems

    Get PDF
    Decades of research have gone into the optimization of physical designs, query execution, and related tools for relational databases. These techniques and tools make it possible for non-expert users to make effective use of relational database management systems. However, the drive for flexible data models and increased scalability has spawned a new generation of data management systems which largely eschew the relational model. These include systems such as NoSQL databases and distributed analytics frameworks such as Apache Spark which make use of a diverse set of data models. Optimization techniques and tools developed for relational data do not directly apply in this setting. This leaves developers making use of these systems with the need to become intimately familiar with system details to obtain good performance. We present techniques and tools for physical design for non-relational data systems. We explore two settings: NoSQL database systems and distributed analytics frameworks. While NoSQL databases often avoid explicit schema definitions, many choices on how to structure data remain. These choices can have a significant impact on application performance. The data structuring process normally requires expert knowledge of the underlying database. We present the NoSQL Schema Evaluator (NoSE). Given a target workload, NoSE provides an optimized physical design for NoSQL database applications which compares favourably to schemas designed by expert users. To enable existing applications to benefit from conceptual modeling, we also present an algorithm to recover a logical model from a denormalized database instance. Our second setting is distributed analytics frameworks such as Apache Spark. As is the case for NoSQL databases, expert knowledge of Spark is often required to construct efficient data pipelines. In NoSQL systems, a key challenge is how to structure stored data, while in Spark, a key challenge is how to cache intermediate results. We examine a particularly common scenario in Spark which involves performing iterative analysis on an input dataset. We show that jobs written in an intuitive manner using existing Spark APIs can have poor performance. We propose ReSpark, which automates caching decisions for iterative Spark analyses. Like NoSE, ReSpark makes it possible for non-expert users to obtain good performance from a non-relational data system

    Computer Aided Verification

    Get PDF
    The open access two-volume set LNCS 12224 and 12225 constitutes the refereed proceedings of the 32st International Conference on Computer Aided Verification, CAV 2020, held in Los Angeles, CA, USA, in July 2020.* The 43 full papers presented together with 18 tool papers and 4 case studies, were carefully reviewed and selected from 240 submissions. The papers were organized in the following topical sections: Part I: AI verification; blockchain and Security; Concurrency; hardware verification and decision procedures; and hybrid and dynamic systems. Part II: model checking; software verification; stochastic systems; and synthesis. *The conference was held virtually due to the COVID-19 pandemic

    Towards More Expressive and Usable Types for Dynamic Languages

    Get PDF
    Many popular programming languages, including Ruby, JavaScript, and Python, feature dynamic type systems, in which types are not known until runtime. Dynamic typing provides the programmer with flexibility and allows for rapid program development. In contrast, static type systems, found in languages like C++ and Java, help catch errors early during development, enforce invariants as programs evolve, and provide useful documentation via type annotations. Many researchers have explored combining these contrasting paradigms, seeking to marry the flexibility of dynamic types with the correctness guarantees and documentation of static types. However, many challenges remain in this pursuit: programmers using dynamic languages may wish to verify more expressive properties than basic type safety; operations for commonly used libraries, such as those for databases and heterogeneous data structures, are difficult to precisely type check; and type inference---the process of automatically deducingthe types of methods and variables in a program---often produces type annotations that are complex and verbose, and thus less usable for the programmer. To address these issues, I present four pieces of work that aim to increase the expressiveness and usability of static types for dynamic languages. First, I present RTR, a system that adds refinement types to Ruby: basic types extended with expressive predicates. RTR uses assume-guarantee reasoning and a novel idea called just-in-time verification---in which verification is deferred until runtime---to handle dynamic program features such as mixins and metaprogramming. We found RTR was useful for verifying key methods in six Ruby programs. Second, I present CompRDL, a Ruby type system that allows library method type signatures to include type-level computations(or comp types). Comp types can be used to precisely type check database queries, as well as operations over heterogeneous data structures like arrays and hashes. We used CompRDL to type check methods from six Ruby programs, enabling us to check more expressive properties, with fewer manually inserted type casts, than was possible without comp types. Third, I present InferDL, a Ruby type inference system that aims to produce usable type annotations. Becausethe types inferred by standard, constraint-based inference are often complex and less useful to the programmer, InferDL complements constraints with configurable heuristics that aim to produce more usable types. We applied InferDL to four Ruby programs with existing type annotations and found that InferDL inferred 22% more types that matched the prior annotations compared to standard inference. Finally, I present SimTyper, a system that builds on InferDL by using a novel machine learning-based technique called type equality prediction. When standard and heuristic inference produce a non-usable type for a position (argument/return/variable), we use a deep similarity network to compare that position to other positions with usable types. If the network predicts that two positions have the same type, we guess the usable type in place of the non-usable one, and check the guess against constraints to ensure soundness. We evaluated SimTyper on eight Ruby programs with prior annotations and found that, compared to standard inference, SimTyper finds 69% more types that match programmer-written annotations. In sum, I claim that RTR, CompRDL, InferDL, and SimTyper represent promising steps towards more expressive and usable types for dynamic languages

    Declarative symbolic pure-logic model checking

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 173-181).Model checking, a technique for findings errors in systems, involves building a formal model that describes possible system behaviors and correctness conditions, and using a tool to search for model behaviors violating correctness properties. Existing model checkers are well-suited for analyzing control-intensive algorithms (e.g. network protocols with simple node state). Many important analyses, however, fall outside the capabilities of existing model checkers. Examples include checking algorithms with complex state, distributed algorithms over all network topologies, and highly declarative models. This thesis addresses the problem of building an efficient model checker that overcomes these limitations. The work builds on Alloy, a relational modeling language. Previous work has defined the language and shown that it can be analyzed by translation to SAT. The primary contributions of this thesis include: a modeling paradigm for describing complex structures in Alloy; significant improvements in scalability of the analyzer; and improvements in usability of the analyzer via addition of a debugger for over constraints. Together, these changes make model-checking practical for important new classes of analyses. While the work was done in the context of Alloy, some techniques generalize to other verification tools.by Ilya A. Shlyakhter.S.M

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems
    • …
    corecore