1,714 research outputs found

    Answering FO+MOD Queries Under Updates on Bounded Degree Databases

    Get PDF
    We investigate the query evaluation problem for fixed queries over fully dynamic databases, where tuples can be inserted or deleted. The task is to design a dynamic algorithm that immediately reports the new result of a fixed query after every database update. We consider queries in first-order logic (FO) and its extension with modulo-counting quantifiers (FO+MOD), and show that they can be efficiently evaluated under updates, provided that the dynamic database does not exceed a certain degree bound. In particular, we construct a data structure that allows to answer a Boolean FO+MOD query and to compute the size of the query result within constant time after every database update. Furthermore, after every update we are able to immediately enumerate the new query result with constant delay between the output tuples. The time needed to build the data structure is linear in the size of the database. Our results extend earlier work on the evaluation of first-order queries on static databases of bounded degree and rely on an effective Hanf normal form for FO+MOD recently obtained by [Heimberg, Kuske, and Schweikardt, LICS, 2016]

    Performance and scalability of indexed subgraph query processing methods

    Get PDF
    Graph data management systems have become very popular as graphs are the natural data model for many applications. One of the main problems addressed by these systems is subgraph query processing; i.e., given a query graph, return all graphs that contain the query. The naive method for processing such queries is to perform a subgraph isomorphism test against each graph in the dataset. This obviously does not scale, as subgraph isomorphism is NP-Complete. Thus, many indexing methods have been proposed to reduce the number of candidate graphs that have to underpass the subgraph isomorphism test. In this paper, we identify a set of key factors-parameters, that influence the performance of related methods: namely, the number of nodes per graph, the graph density, the number of distinct labels, the number of graphs in the dataset, and the query graph size. We then conduct comprehensive and systematic experiments that analyze the sensitivity of the various methods on the values of the key parameters. Our aims are twofold: first to derive conclusions about the algorithms’ relative performance, and, second, to stress-test all algorithms, deriving insights as to their scalability, and highlight how both performance and scalability depend on the above factors. We choose six wellestablished indexing methods, namely Grapes, CT-Index, GraphGrepSX, gIndex, Tree+∆, and gCode, as representative approaches of the overall design space, including the most recent and best performing methods. We report on their index construction time and index size, and on query processing performance in terms of time and false positive ratio. We employ both real and synthetic datasets. Specifi- cally, four real datasets of different characteristics are used: AIDS, PDBS, PCM, and PPI. In addition, we generate a large number of synthetic graph datasets, empowering us to systematically study the algorithms’ performance and scalability versus the aforementioned key parameters

    A Practically Efficient Algorithm for Generating Answers to Keyword Search over Data Graphs

    Get PDF
    In keyword search over a data graph, an answer is a non-redundant subtree that contains all the keywords of the query. A naive approach to producing all the answers by increasing height is to generalize Dijkstra's algorithm to enumerating all acyclic paths by increasing weight. The idea of freezing is introduced so that (most) non-shortest paths are generated only if they are actually needed for producing answers. The resulting algorithm for generating subtrees, called GTF, is subtle and its proof of correctness is intricate. Extensive experiments show that GTF outperforms existing systems, even ones that for efficiency's sake are incomplete (i.e., cannot produce all the answers). In particular, GTF is scalable and performs well even on large data graphs and when many answers are needed.Comment: Full version of ICDT'16 pape

    Enumeration Complexity of Conjunctive Queries with Functional Dependencies

    Get PDF
    We study the complexity of enumerating the answers of Conjunctive Queries (CQs) in the presence of Functional Dependencies (FDs). Our focus is on the ability to list output tuples with a constant delay in between, following a linear-time preprocessing. A known dichotomy classifies the acyclic self-join-free CQs into those that admit such enumeration, and those that do not. However, this classification no longer holds in the common case where the database exhibits dependencies among attributes. That is, some queries that are classified as hard are in fact tractable if dependencies are accounted for. We establish a generalization of the dichotomy to accommodate FDs; hence, our classification determines which combination of a CQ and a set of FDs admits constant-delay enumeration with a linear-time preprocessing. In addition, we generalize a hardness result for cyclic CQs to accommodate a common type of FDs. Further conclusions of our development include a dichotomy for enumeration with linear delay, and a dichotomy for CQs with disequalities. Finally, we show that all our results apply to the known class of "cardinality dependencies" that generalize FDs (e.g., by stating an upper bound on the number of genres per movies, or friends per person)

    First-order queries on classes of structures with bounded expansion

    Full text link
    We consider the evaluation of first-order queries over classes of databases with bounded expansion. The notion of bounded expansion is fairly broad and generalizes bounded degree, bounded treewidth and exclusion of at least one minor. It was known that over a class of databases with bounded expansion, first-order sentences could be evaluated in time linear in the size of the database. We give a different proof of this result. Moreover, we show that answers to first-order queries can be enumerated with constant delay after a linear time preprocessing. We also show that counting the number of answers to a query can be done in time linear in the size of the database
    • 

    corecore