Search CORE

224 research outputs found

A survey of parallel execution strategies for transitive closure and logic programs

Author: Cacace F.
Ceri S.
Houtsma M.A.W.
Publication venue: Kluwer Academic Publishers
Publication date: 01/01/1993
Field of study

An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods

CiteSeerX

University of Twente Research Information

Recommended from our members

Why a Single Parallelization Strategy Is Not Enough in Knowledge Bases

Author: Cohen Simona Rabinovici
Wolfson Ouri
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1990
Field of study

We address the problem of parallelizing the evaluation of logic programs in data intensive applications. We argue that the appropriate parallelization strategy for logic-program evaluation depends on the program being evaluated. Therefore, this paper is concerned with the issues of program classification and parallelization strategies. We propose several parallelization strategies based on the concept of data reduction—the original logic program is evaluated by several processors working in parallel, each using only a subset of the database. The strategies differ on the evaluation cost, the overhead of communication and synchronization among processors, and the programs to which they are applicable. In particular, we start our study with pure parallelization, i.e., parallelization without overhead. An interesting class structure of logic programs is demonstrated, when considering amenability to pure parallelization. The relationship to the NC complexity class is demonstrated. Then we propose strategies that do incur an overhead, but are optimal in a sense that will be precisely defined. This paper makes the initial steps towards a theory of parallel logic programming

Columbia University Academic Commons

Higher-Order, Data-Parallel Structured Deduction

Author: Gilray Thomas
Kumar Sidharth
Micinski Kristopher
Sahebolamri Arash
Publication venue
Publication date: 21/11/2022
Field of study

State-of-the-art Datalog engines include expressive features such as ADTs (structured heap values), stratified aggregation and negation, various primitive operations, and the opportunity for further extension using FFIs. Current parallelization approaches for state-of-art Datalogs target shared-memory locking data-structures using conventional multi-threading, or use the map-reduce model for distributed computing. Furthermore, current state-of-art approaches cannot scale to formal systems which pervasively manipulate structured data due to their lack of indexing for structured data stored in the heap. In this paper, we describe a new approach to data-parallel structured deduction that involves a key semantic extension of Datalog to permit first-class facts and higher-order relations via defunctionalization, an implementation approach that enables parallelism uniformly both across sets of disjoint facts and over individual facts with nested structure. We detail a core language,

DL_s

, whose key invariant (subfact closure) ensures that each subfact is materialized as a top-class fact. We extend

DL_s

to Slog, a fully-featured language whose forms facilitate leveraging subfact closure to rapidly implement expressive, high-performance formal systems. We demonstrate Slog by building a family of control-flow analyses from abstract machines, systematically, along with several implementations of classical type systems (such as STLC and LF). We performed experiments on EC2, Azure, and ALCF's Theta at up to 1000 threads, showing orders-of-magnitude scalability improvements versus competing state-of-art systems

arXiv.org e-Print Archive

Reasoning in description logics using resolution and deductive databases

Author: Motik Boris
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2006
Field of study

KITopen

Declarative Data Analytics: a Survey

Author: Makrynioti Nantia
Vassalos Vasilis
Publication venue
Publication date: 01/01/2004
Field of study

The area of declarative data analytics explores the application of the declarative paradigm on data science and machine learning. It proposes declarative languages for expressing data analysis tasks and develops systems which optimize programs written in those languages. The execution engine can be either centralized or distributed, as the declarative paradigm advocates independence from particular physical implementations. The survey explores a wide range of declarative data analysis frameworks by examining both the programming model and the optimization techniques used, in order to provide conclusions on the current state of the art in the area and identify open challenges.Comment: 36 pages, 2 figure

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Incremental Processing and Optimization of Update Streams

Author: Liu Mengmeng
Publication venue: ScholarlyCommons
Publication date: 01/01/2016
Field of study

Over the recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, which monitor, plan, control, and make decisions over data streams from multiple sources. We are interested in extending traditional stream processing techniques to meet the new challenges of these applications. Generally, in order to support genuine continuous query optimization and processing over data streams, we need to systematically understand how to address incremental optimization and processing of update streams for a rich class of queries commonly used in the applications. Our general thesis is that efficient incremental processing and re-optimization of update streams can be achieved by various incremental view maintenance techniques if we cast the problems as incremental view maintenance problems over data streams. We focus on two incremental processing of update streams challenges currently not addressed in existing work on stream query processing: incremental processing of transitive closure queries over data streams, and incremental re-optimization of queries. In addition to addressing these specific challenges, we also develop a working prototype system Aspen, which serves as an end-to-end stream processing system that has been deployed as the foundation for a case study of our SmartCIS application. We validate our solutions both analytically and empirically on top of our prototype system Aspen, over a variety of benchmark workloads such as TPC-H and LinearRoad Benchmarks

ScholarlyCommons@Penn

Parallelization of JStar Programs on a Distributed Computer

Author: Crosby Simon Graeme
Publication venue: 'University of Waikato'
Publication date: 13/06/2012
Field of study

In the past, the performance of sequential programs grew exponentially as the performance of CPUs increased with Moore’s Law. Since 2005 however, performance improvements have come in the form of more parallel CPU cores. Writing parallel programs using existing programming languages can be difficult and error-prone. JStar is a new programming language that allows programs to be written in a naturally parallel way. The JStar project aims to produce compilers that can produce executables for a variety of architectures (such as many-core, GPUs and distributed computers). This thesis proposes a process for compiling these programs into distributed executables, and investigates various trade-offs and techniques for implementing JStar programs on a distributed computer. In this process, first a parallel design is created, then this design is expressed by a separate set of distribute statements that are combined with the original program to create a distributed program. The expressiveness and effectiveness of this approach is investigated for two case study JStar programs: (1) a prime number counting program (2) a version of Conway’s Game of Life. Various designs were hand-translated into a distributed Java programs and benchmarks were run to assess the performance of different designs. For each of these case studies, parallel designs were found that achieved high levels of speedup

Research Commons@Waikato

Computational and human-based methods for knowledge discovery over knowledge graphs

Author: Rivas Méndez Ariam
Publication venue: Hannover : Institutionelles Repositorium der Leibniz Universität Hannover
Publication date: 01/01/2023
Field of study

The modern world has evolved, accompanied by the huge exploitation of data and information. Daily, increasing volumes of data from various sources and formats are stored, resulting in a challenging strategy to manage and integrate them to discover new knowledge. The appropriate use of data in various sectors of society, such as education, healthcare, e-commerce, and industry, provides advantages for decision support in these areas. However, knowledge discovery becomes challenging since data may come from heterogeneous sources with important information hidden. Thus, new approaches that adapt to the new challenges of knowledge discovery in such heterogeneous data environments are required. The semantic web and knowledge graphs (KGs) are becoming increasingly relevant on the road to knowledge discovery. This thesis tackles the problem of knowledge discovery over KGs built from heterogeneous data sources. We provide a neuro-symbolic artificial intelligence system that integrates symbolic and sub-symbolic frameworks to exploit the semantics encoded in a KG and its structure. The symbolic system relies on existing approaches of deductive databases to make explicit, implicit knowledge encoded in a KG. The proposed deductive database

DS

can derive new statements to ego networks given an abstract target prediction. Thus,

DS

minimizes data sparsity in KGs. In addition, a sub-symbolic system relies on knowledge graph embedding (KGE) models. KGE models are commonly applied in the KG completion task to represent entities in a KG in a low-dimensional vector space. However, KGE models are known to suffer from data sparsity, and a symbolic system assists in overcoming this fact. The proposed approach discovers knowledge given a target prediction in a KG and extracts unknown implicit information related to the target prediction. As a proof of concept, we have implemented the neuro-symbolic system on top of a KG for lung cancer to predict polypharmacy treatment effectiveness. The symbolic system implements a deductive system to deduce pharmacokinetic drug-drug interactions encoded in a set of rules through the Datalog program. Additionally, the sub-symbolic system predicts treatment effectiveness using a KGE model, which preserves the KG structure. An ablation study on the components of our approach is conducted, considering state-of-the-art KGE methods. The observed results provide evidence for the benefits of the neuro-symbolic integration of our approach, where the neuro-symbolic system for an abstract target prediction exhibits improved results. The enhancement of the results occurs because the symbolic system increases the prediction capacity of the sub-symbolic system. Moreover, the proposed neuro-symbolic artificial intelligence system in Industry 4.0 (I4.0) is evaluated, demonstrating its effectiveness in determining relatedness among standards and analyzing their properties to detect unknown relations in the I4.0KG. The results achieved allow us to conclude that the proposed neuro-symbolic approach for an abstract target prediction improves the prediction capability of KGE models by minimizing data sparsity in KGs

Institutionelles Repositorium der Leibniz Universität Hannover