208 research outputs found

    Context-Free Path Queries on RDF Graphs

    Full text link
    Navigational graph queries are an important class of queries that canextract implicit binary relations over the nodes of input graphs. Most of the navigational query languages used in the RDF community, e.g. property paths in W3C SPARQL 1.1 and nested regular expressions in nSPARQL, are based on the regular expressions. It is known that regular expressions have limited expressivity; for instance, some natural queries, like same generation-queries, are not expressible with regular expressions. To overcome this limitation, in this paper, we present cfSPARQL, an extension of SPARQL query language equipped with context-free grammars. The cfSPARQL language is strictly more expressive than property paths and nested expressions. The additional expressivity can be used for modelling graph similarities, graph summarization and ontology alignment. Despite the increasing expressivity, we show that cfSPARQL still enjoys a low computational complexity and can be evaluated efficiently.Comment: 25 page

    Transformers over Directed Acyclic Graphs

    Full text link
    Transformer models have recently gained popularity in graph representation learning as they have the potential to learn complex relationships beyond the ones captured by regular graph neural networks. The main research question is how to inject the structural bias of graphs into the transformer architecture, and several proposals have been made for undirected molecular graphs and, recently, also for larger network graphs. In this paper, we study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs: (1) An attention mechanism that is considerably more efficient than the regular quadratic complexity of transformers and at the same time faithfully captures the DAG structure, and (2) a positional encoding of the DAG's partial order, complementing the former. We rigorously evaluate our approach over various types of tasks, ranging from classifying source code graphs to nodes in citation networks, and show that it is effective in two important aspects: in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency

    On Quasi-Interpretations, Blind Abstractions and Implicit Complexity

    Full text link
    Quasi-interpretations are a technique to guarantee complexity bounds on first-order functional programs: with termination orderings they give in particular a sufficient condition for a program to be executable in polynomial time, called here the P-criterion. We study properties of the programs satisfying the P-criterion, in order to better understand its intensional expressive power. Given a program on binary lists, its blind abstraction is the nondeterministic program obtained by replacing lists by their lengths (natural numbers). A program is blindly polynomial if its blind abstraction terminates in polynomial time. We show that all programs satisfying a variant of the P-criterion are in fact blindly polynomial. Then we give two extensions of the P-criterion: one by relaxing the termination ordering condition, and the other one (the bounded value property) giving a necessary and sufficient condition for a program to be polynomial time executable, with memoisation.Comment: 18 page

    Unification and Matching on Compressed Terms

    Full text link
    Term unification plays an important role in many areas of computer science, especially in those related to logic. The universal mechanism of grammar-based compression for terms, in particular the so-called Singleton Tree Grammars (STG), have recently drawn considerable attention. Using STGs, terms of exponential size and height can be represented in linear space. Furthermore, the term representation by directed acyclic graphs (dags) can be efficiently simulated. The present paper is the result of an investigation on term unification and matching when the terms given as input are represented using different compression mechanisms for terms such as dags and Singleton Tree Grammars. We describe a polynomial time algorithm for context matching with dags, when the number of different context variables is fixed for the problem. For the same problem, NP-completeness is obtained when the terms are represented using the more general formalism of Singleton Tree Grammars. For first-order unification and matching polynomial time algorithms are presented, each of them improving previous results for those problems.Comment: This paper is posted at the Computing Research Repository (CoRR) as part of the process of submission to the journal ACM Transactions on Computational Logic (TOCL)

    Probabilistic Constraint Logic Programming

    Full text link
    This paper addresses two central problems for probabilistic processing models: parameter estimation from incomplete data and efficient retrieval of most probable analyses. These questions have been answered satisfactorily only for probabilistic regular and context-free models. We address these problems for a more expressive probabilistic constraint logic programming model. We present a log-linear probability model for probabilistic constraint logic programming. On top of this model we define an algorithm to estimate the parameters and to select the properties of log-linear models from incomplete data. This algorithm is an extension of the improved iterative scaling algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm applies to log-linear models in general and is accompanied with suitable approximation methods when applied to large data spaces. Furthermore, we present an approach for searching for most probable analyses of the probabilistic constraint logic programming model. This method can be applied to the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl

    An annotation database for multimodal scientific data

    Full text link
    Cristina Bogdanschi, Simone Santini, "An annotation database for multimodal scientific data", Proc. SPIE 7255, Multimedia Content Access: Algorithms and Systems III, 72550G (2009). Copyright 2009 Society of Photo‑Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.In many collaborative research environments novel tools and techniques allow researchers to generate data from experiments and observations at a staggering rate. Researchers in these areas are now facing the strong need for querying, sharing and exchanging these data in a uniform and transparent fashion. However, due to the nature of the various types of heterogeneous data and lack of local and global database structures, standard data integration approaches fail or are not applicable. A viable solution to this problem is the extensive use of metadata. In this paper we present the model of an annotation management system suitable for such research environments, and discuss some aspects of its implementation. Annotations provide rich linkage structure between data and between themselves that translates in a complex graph structure of which annotations and data are the nodes. We show how annotations are managed and used for data retrieval and outline some of the query techniques used in the system

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    CAPRI: efficient inference of cancer progression models from cross-sectional data

    Get PDF
    We devise a novel inference algorithm to effectively solve the cancer progression model reconstruction problem. Our empirical analysis of the accuracy and convergence rate of our algorithm, CAncer PRogression Inference (CAPRI), shows that it outperforms the state-of-the-art algorithms addressing similar problems. Motivation: Several cancer-related genomic data have become available (e.g., The Cancer Genome Atlas, TCGA) typically involving hundreds of patients. At present, most of these data are aggregated in a cross-sectional fashion providing all measurements at the time of diagnosis.Our goal is to infer cancer progression models from such data. These models are represented as directed acyclic graphs (DAGs) of collections of selectivity relations, where a mutation in a gene A selects for a later mutation in a gene B. Gaining insight into the structure of such progressions has the potential to improve both the stratification of patients and personalized therapy choices. Results: The CAPRI algorithm relies on a scoring method based on a probabilistic theory developed by Suppes, coupled with bootstrap and maximum likelihood inference. The resulting algorithm is efficient, achieves high accuracy, and has good complexity, also, in terms of convergence properties. CAPRI performs especially well in the presence of noise in the data, and with limited sample sizes. Moreover CAPRI, in contrast to other approaches, robustly reconstructs different types of confluent trajectories despite irregularities in the data.We also report on an ongoing investigation using CAPRI to study atypical Chronic Myeloid Leukemia, in which we uncovered non trivial selectivity relations and exclusivity patterns among key genomic events
    • 

    corecore