13 research outputs found

    A Paradigm for Learning Queries on Big Data

    Get PDF
    International audienceSpecifying a database query using a formal query language is typically a challenging task for non-expert users. In the context of big data, this problem becomes even harder as it requires the users to deal with database instances of big sizes and hence difficult to visualize. Such instances usually lack a schema to help the users specify their queries, or have an incomplete schema as they come from disparate data sources. In this paper, we propose a novel paradigm for interactive learning of queries on big data, without assuming any knowledge of the database schema. The paradigm can be applied to different database models and a class of queries adequate to the database model. In particular, in this paper we present two instantiations that validated the proposed paradigm for learning relational join queries and for learning path queries on graph databases. Finally, we discuss the challenges of employing the paradigm for further data models and for learning cross-model schema mappings

    Reverse engineering queries in ontology-enriched systems: the case of expressive horn description logic ontologies

    Get PDF
    We introduce the query-by-example (QBE) paradigm for query answering in the presence of ontologies. Intuitively, QBE permits non-expert users to explore the data by providing examples of the information they (do not) want, which the system then generalizes into a query. Formally, we study the following question: given a knowledge base and sets of positive and negative examples, is there a query that returns all positive but none of the negative examples? We focus on description logic knowledge bases with ontologies formulated in Horn-ALCI and (unions of) conjunctive queries. Our main contributions are characterizations, algorithms and tight complexity bounds for QBE

    Regular Path Query Evaluation on Streaming Graphs

    Full text link
    We study persistent query evaluation over streaming graphs, which is becoming increasingly important. We focus on navigational queries that determine if there exists a path between two entities that satisfies a user-specified constraint. We adopt the Regular Path Query (RPQ) model that specifies navigational patterns with labeled constraints. We propose deterministic algorithms to efficiently evaluate persistent RPQs under both arbitrary and simple path semantics in a uniform manner. Experimental analysis on real and synthetic streaming graphs shows that the proposed algorithms can process up to tens of thousands of edges per second and efficiently answer RPQs that are commonly used in real-world workloads.Comment: A shorter version of this paper has been accepted for publication in 2020 International Conference on Management of Data (SIGMOD 2020

    Using Knowledge Anchors to Facilitate User Exploration of Data Graphs

    Get PDF
    YesThis paper investigates how to facilitate users’ exploration through data graphs for knowledge expansion. Our work focuses on knowledge utility – increasing users’ domain knowledge while exploring a data graph. We introduce a novel exploration support mechanism underpinned by the subsumption theory of meaningful learning, which postulates that new knowledge is grasped by starting from familiar concepts in the graph which serve as knowledge anchors from where links to new knowledge are made. A core algorithmic component for operationalising the subsumption theory for meaningful learning to generate exploration paths for knowledge expansion is the automatic identification of knowledge anchors in a data graph (KADG). We present several metrics for identifying KADG which are evaluated against familiar concepts in human cognitive structures. A subsumption algorithm that utilises KADG for generating exploration paths for knowledge expansion is presented, and applied in the context of a Semantic data browser in a music domain. The resultant exploration paths are evaluated in a task-driven experimental user study compared to free data graph exploration. The findings show that exploration paths, based on subsumption and using knowledge anchors, lead to significantly higher increase in the users’ conceptual knowledge and better usability than free exploration of data graphs. The work opens a new avenue in semantic data exploration which investigates the link between learning and knowledge exploration. This extends the value of exploration and enables broader applications of data graphs in systems where the end users are not experts in the specific domain

    Learning Join Queries from User Examples

    Get PDF
    International audienceWe investigate the problem of learning join queries from user examples. The user is presented with a set of candidate tuples and is asked to label them as positive or negative examples, depending on whether or not she would like the tuples as part of the join result. The goal is to quickly infer an arbitrary n-ary join predicate across an arbitrary number m of relations while keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints across the involved relations. Inferring the join predicate across multiple relations when the referential constraints are unknown may occur in several applications, such as data integration, reverse engineering of database queries, and schema inference. In such scenarios, the number of tuples involved in the join is typically large. We introduce a set of strategies that let us inspect the search space and aggressively prune what we call uninformative tuples, and we directly present to the user the informative ones that is, those that allow the user to quickly find the goal query she has in mind. In this article, we focus on the inference of joins with equality predicates and also allow disjunctive join predicates and projection in the queries. We precisely characterize the frontier between tractability and intractability for the following problems of interest in these settings: consistency checking, learnability, and deciding the informativeness of a tuple. Next, we propose several strategies for presenting tuples to the user in a given order that allows minimization of the number of interactions. We show the efficiency of our approach through an experimental study on both benchmark and synthetic datasets

    Generalizing spreadsheet computation for evolving spreadsheets at scale

    Get PDF
    Spreadsheets are one of the most ubiquitous ad-hoc data analysis and manipulation tools. Their strength over traditional relational database management systems lies in their ability to allow users to manipulate data interactively through an intuitive interface. However, the capabilities of current spreadsheet systems to handle datasets that evolve over time are limited in several dimensions: (a) limited power: it is difficult to perform relational-style queries, which is often needed for large data analysis, while keeping the convenience of formula-like automatic recalculation, (b) limited introspection: the ability to reason about the source of changes between versions at a higher level is often unsupported, and (c) limited interactivity: the computation in spreadsheets at scale can make the system unresponsive, rendering the strength of spreadsheets moot, (d) limited structure utilization: the computation in spreadsheets often fails to utilize the semi-structured nature of real-world spreadsheets. The dissertation discusses developments that overcome these hurdles. First, we discuss an extension to spreadsheet formulae that allows for relational-style queries in a manner that is consistent with typical formula computation engines. Second, we develop the theory of "diffing", representing data updates in a concise manner. Third, we introduce Asynchronous Formula Computation, a technique that improves spreadsheet interactivity when dealing with formula computation, while guaranteeing consistency of the results. Finally, we improve formula computation by utilizing structures of real-world spreadsheets and building a more concise representation

    Intelligent Support for Exploration of Data Graphs

    Get PDF
    This research investigates how to support a user’s exploration through data graphs generated from semantic databases in a way leading to expanding the user’s domain knowledge. To be effective, approaches to facilitate exploration of data graphs should take into account the utility from a user’s point of view. Our work focuses on knowledge utility – how useful exploration paths through a data graph are for expanding the user’s knowledge. The main goal of this research is to design an intelligent support mechanism to direct the user to ‘good’ exploration paths through big data graphs for knowledge expansion. We propose a new exploration support mechanism underpinned by the subsumption theory for meaningful learning, which postulates that new knowledge is grasped by starting from familiar concepts in the graph which serve as knowledge anchors from where links to new knowledge are made. A core algorithmic component for adapting the subsumption theory for generating exploration paths is the automatic identification of Knowledge Anchors in a Data Graph (KADG). Several metrics for identifying KADG and the corresponding algorithms for implementation have been developed and evaluated against human cognitive structures. A subsumption algorithm which utilises KADG for generating exploration paths for knowledge expansion is presented and evaluated in the context of a semantic data browser in a musical instrument domain. The resultant exploration paths are evaluated in a controlled user study to examine whether they increase the users’ knowledge as compared to free exploration. The findings show that exploration paths using knowledge anchors and subsumption lead to significantly higher increase in the users’ conceptual knowledge. The approach can be adopted in applications providing data graph exploration to facilitate learning and sensemaking of layman users who are not fully familiar with the domain presented in the data graph
    corecore