196 research outputs found

    Logic Programming Applications: What Are the Abstractions and Implementations?

    Full text link
    This article presents an overview of applications of logic programming, classifying them based on the abstractions and implementations of logic languages that support the applications. The three key abstractions are join, recursion, and constraint. Their essential implementations are for-loops, fixed points, and backtracking, respectively. The corresponding kinds of applications are database queries, inductive analysis, and combinatorial search, respectively. We also discuss language extensions and programming paradigms, summarize example application problems by application areas, and touch on example systems that support variants of the abstractions with different implementations

    On the Practice and Application of Context-Free Language Reachability

    Get PDF
    The Context-Free Language Reachability (CFL-R) formalism relates to some of the most important computational problems facing researchers and industry practitioners. CFL-R is a generalisation of graph reachability and language recognition, such that pairs in a labelled graph are reachable if and only if there is a path between them whose labels, joined together in the order they were encountered, spell a word in a given context-free language. The formalism finds particular use as a vehicle for phrasing and reasoning about program analysis, since complex relationships within the data, logic or structure of computer programs are easily expressed and discovered in CFL-R. Unfortunately, The potential of CFL-R can not be met by state of the art solvers. Current algorithms have scalability and expressibility issues that prevent them from being used on large graph instances or complex grammars. This work outlines our efforts in understanding the practical concerns surrounding CFL-R, and applying this knowledge to improve the performance of CFL-R applications. We examine the major difficulties with solving CFL-R-based analyses at-scale, via a case-study of points-to analysis as a CFL-R problem. Points-to analysis is fundamentally important to many modern research and industry efforts, and is relevant to optimisation, bug-checking and security technologies. Our understanding of the scalability challenge motivates work in developing practical CFL-R techniques. We present improved evaluation algorithms and declarative optimisation techniques for CFL-R, capitalising on the simplicity of CFL-R to creating fully automatic methodologies. The culmination of our work is a general-purpose and high-performance tool called Cauliflower, a solver-generator for CFL-R problems. We describe Cauliflower and evaluate its performance experimentally, showing significant improvement over alternative general techniques

    On Fast Large-Scale Program Analysis in Datalog

    Get PDF
    Designing and crafting a static program analysis is challenging due to the complexity of the task at hand. Among the challenges are modelling the semantics of the input language, finding suitable abstractions for the analysis, and handwriting efficient code for the analysis in a traditional imperative language such as C++. Hence, the development of static program analysis tools is costly in terms of development time and resources for real world languages. To overcome, or at least alleviate the costs of developing a static program analysis, Datalog has been proposed as a domain specific language (DSL).With Datalog, a designer expresses a static program analysis in the form of a logical specification. While a domain specific language approach aids in the ease of development of program analyses, it is commonly accepted that such an approach has worse runtime performance than handcrafted static analysis tools. In this work, we introduce a new program synthesis methodology for Datalog specifications to produce highly efficient monolithic C++ analyzers. The synthesis technique requires the re-interpretation of the semi-naïve evaluation as a scaffolding for translation using partial evaluation. To achieve high-performance, we employ staged compilation techniques and specialize the underlying relational data structures for a given Datalog specification. Experimentation on benchmarks for large-scale program analysis validates the superior performance of our approach over available Datalog tools and demonstrates our competitiveness with state-of-the-art handcrafted tools

    Query evaluation revised: parallel, distributed, via rewritings

    Get PDF
    This is a thesis on query evaluation in parallel and distributed settings, and structurally simple rewritings. It consists of three parts. In the first part, we investigate the efficiency of constant-time parallel evaluation algorithms. That is, the number of required processors or, asymptotically equivalent, the work required to evaluate queries in constant time. It is known that relational algebra queries can be evaluated in constant time. However, work-efficiency has not been a focus, and indeed known evaluation algorithms yield huge (polynomial) work bounds. We establish work-efficient constant-time algorithms for several query classes: (free-connex) acyclic, semi-join algebra, and natural join queries; the latter in the worst-case framework. The second part is about deciding parallel-correctness of distributed evaluation strategies: Given a query and policies specifying how data is distributed and communicated among multiple servers, does the distributed evaluation yield the same result as the classical evaluation, for every database? Ketsman et al. proved that parallel-correctness for Datalog is undecidable; by reduction from the undecidable containment problem for Datalog. We show that parallel-correctness is already undecidable for monadic and frontier-guarded Datalog queries, for which containment is decidable. However, deciding parallel-correctness for frontier-guarded Datalog and constraint-based communication policies satisfying a certain property is 2ExpTime-complete. Furthermore, we obtain the same bounds for the parallel-boundedness problem, which asks whether the number of required communication rounds is bounded, over all databases. The third part is about structurally simple rewritings. The (classical) rewriting problem asks whether, for a given query and a set of views, there is a query, called rewriting, over the views that is equivalent to the given query. We study the variant of this problem for (subclasses of) conjunctive queries and views that asks for a structurally simple rewriting. We prove that, if the given query is acyclic, an acyclic rewriting exists if there is any rewriting at all. Analogous statements hold for free-connex acyclic, hierarchical, and q-hierarchical queries. Furthermore, we prove that the problem is NP-hard, even if the given query and the views are acyclic or hierarchical. It becomes tractable if the views are free-connex acyclic or q-hierarchical (and the arity of the database schema is bounded)

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”

    Static Analysis of Shape in TensorFlow Programs

    Get PDF
    Machine learning has been widely adopted in diverse science and engineering domains, aided by reusable libraries and quick development patterns. The TensorFlow library is probably the best-known representative of this trend and most users employ the Python API to its powerful back-end. TensorFlow programs are susceptible to several systematic errors, especially in the dynamic typing setting of Python. We present Pythia, a static analysis that tracks the shapes of tensors across Python library calls and warns of several possible mismatches. The key technical aspects are a close modeling of library semantics with respect to tensor shape, and an identification of violations and error-prone patterns. Pythia is powerful enough to statically detect (with 84.62% precision) 11 of the 14 shape-related TensorFlow bugs in the recent Zhang et al. empirical study - an independent slice of real-world bugs

    Provenance, Incremental Evaluation, and Debugging in Datalog

    Get PDF
    The Datalog programming language has recently found increasing traction in research and industry. Driven by its clean declarative semantics, along with its conciseness and ease of use, Datalog has been adopted for a wide range of important applications, such as program analysis, graph problems, and networking. To enable this adoption, modern Datalog engines have implemented advanced language features and high-performance evaluation of Datalog programs. Unfortunately, critical infrastructure and tooling to support Datalog users and developers are still missing. For example, there are only limited tools addressing the crucial debugging problem, where developers can spend up to 30% of their time finding and fixing bugs. This thesis addresses Datalog’s tooling gaps, with the ultimate goal of improving the productivity of Datalog programmers. The first contribution is centered around the critical problem of debugging: we develop a new debugging approach that explains the execution steps taken to produce a faulty output. Crucially, our debugging method can be applied for large-scale applications without substantially sacrificing performance. The second contribution addresses the problem of incremental evaluation, which is necessary when program inputs change slightly, and results need to be recomputed. Incremental evaluation allows this recomputation to happen more efficiently, without discarding the previous results and recomputing from scratch. Finally, the last contribution provides a new incremental debugging approach that identifies the root causes of faulty outputs that occur after an incremental evaluation. Incremental debugging focuses on the relationship between input and output and can provide debugging suggestions to amend the inputs so that faults no longer occur. These techniques, in combination, form a corpus of critical infrastructure and tooling developments for Datalog, allowing developers and users to use Datalog more productively

    Logic-based techniques for program analysis and specification synthesis

    Full text link
    La Tesis investiga técnicas ágiles dentro del paradigma declarativo para dar solución a dos problemas: el análisis de programas y la inferencia de especificaciones a partir de programas escritos en lenguajes multiparadigma y en lenguajes imperativos con tipos, objetos, estructuras y punteros. Respecto al estado actual de la tesis, la parte de análisis de programas ya está consolidada, mientras que la parte de inferencia de especificaciones sigue en fase de desarrollo activo. La primera parte da soluciones para la ejecución de análisis de punteros especificados en Datalog. En esta parte se han desarrollado dos técnicas de ejecución de especificaciones en dicho lenguaje Datalog: una de ellas utiliza resolutores de sistemas de ecuaciones booleanas, y la otra utiliza la lógica de reescritura implementada eficientemente en el lenguaje Maude. La segunda parte desarrolla técnicas de inferencia de especificaciones a partir de programas. En esta parte se han desarrollado dos métodos de inferencia de especificaciones. El primer método se desarrolló para el lenguaje lógico-funcional Curry y permite inferir especificaciones ecuacionales mediante interpretación abstracta de los programas. El segundo método está siendo desarrollado para lenguajes imperativos realistas, y se ha aplicado a un subconjunto del lenguaje de programación C. Este método permite inferir especificaciones en forma de reglas que representan las distintas relaciones entre las propiedades que el estado de un programa satisface antes y después de su ejecución. Además, estas propiedades son expresables en términos de las abstracciones funcionales del propio programa, resultando en una especificación de muy alto nivel y, por lo tanto, de más fácil comprensión.Feliú Gabaldón, MA. (2013). Logic-based techniques for program analysis and specification synthesis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33747TESI
    corecore