208 research outputs found
Context-Free Path Queries on RDF Graphs
Navigational graph queries are an important class of queries that canextract
implicit binary relations over the nodes of input graphs. Most of the
navigational query languages used in the RDF community, e.g. property paths in
W3C SPARQL 1.1 and nested regular expressions in nSPARQL, are based on the
regular expressions. It is known that regular expressions have limited
expressivity; for instance, some natural queries, like same generation-queries,
are not expressible with regular expressions. To overcome this limitation, in
this paper, we present cfSPARQL, an extension of SPARQL query language equipped
with context-free grammars. The cfSPARQL language is strictly more expressive
than property paths and nested expressions. The additional expressivity can be
used for modelling graph similarities, graph summarization and ontology
alignment. Despite the increasing expressivity, we show that cfSPARQL still
enjoys a low computational complexity and can be evaluated efficiently.Comment: 25 page
Transformers over Directed Acyclic Graphs
Transformer models have recently gained popularity in graph representation
learning as they have the potential to learn complex relationships beyond the
ones captured by regular graph neural networks. The main research question is
how to inject the structural bias of graphs into the transformer architecture,
and several proposals have been made for undirected molecular graphs and,
recently, also for larger network graphs. In this paper, we study transformers
over directed acyclic graphs (DAGs) and propose architecture adaptations
tailored to DAGs: (1) An attention mechanism that is considerably more
efficient than the regular quadratic complexity of transformers and at the same
time faithfully captures the DAG structure, and (2) a positional encoding of
the DAG's partial order, complementing the former. We rigorously evaluate our
approach over various types of tasks, ranging from classifying source code
graphs to nodes in citation networks, and show that it is effective in two
important aspects: in making graph transformers generally outperform graph
neural networks tailored to DAGs and in improving SOTA graph transformer
performance in terms of both quality and efficiency
On Quasi-Interpretations, Blind Abstractions and Implicit Complexity
Quasi-interpretations are a technique to guarantee complexity bounds on
first-order functional programs: with termination orderings they give in
particular a sufficient condition for a program to be executable in polynomial
time, called here the P-criterion. We study properties of the programs
satisfying the P-criterion, in order to better understand its intensional
expressive power. Given a program on binary lists, its blind abstraction is the
nondeterministic program obtained by replacing lists by their lengths (natural
numbers). A program is blindly polynomial if its blind abstraction terminates
in polynomial time. We show that all programs satisfying a variant of the
P-criterion are in fact blindly polynomial. Then we give two extensions of the
P-criterion: one by relaxing the termination ordering condition, and the other
one (the bounded value property) giving a necessary and sufficient condition
for a program to be polynomial time executable, with memoisation.Comment: 18 page
Unification and Matching on Compressed Terms
Term unification plays an important role in many areas of computer science,
especially in those related to logic. The universal mechanism of grammar-based
compression for terms, in particular the so-called Singleton Tree Grammars
(STG), have recently drawn considerable attention. Using STGs, terms of
exponential size and height can be represented in linear space. Furthermore,
the term representation by directed acyclic graphs (dags) can be efficiently
simulated. The present paper is the result of an investigation on term
unification and matching when the terms given as input are represented using
different compression mechanisms for terms such as dags and Singleton Tree
Grammars. We describe a polynomial time algorithm for context matching with
dags, when the number of different context variables is fixed for the problem.
For the same problem, NP-completeness is obtained when the terms are
represented using the more general formalism of Singleton Tree Grammars. For
first-order unification and matching polynomial time algorithms are presented,
each of them improving previous results for those problems.Comment: This paper is posted at the Computing Research Repository (CoRR) as
part of the process of submission to the journal ACM Transactions on
Computational Logic (TOCL)
Probabilistic Constraint Logic Programming
This paper addresses two central problems for probabilistic processing
models: parameter estimation from incomplete data and efficient retrieval of
most probable analyses. These questions have been answered satisfactorily only
for probabilistic regular and context-free models. We address these problems
for a more expressive probabilistic constraint logic programming model. We
present a log-linear probability model for probabilistic constraint logic
programming. On top of this model we define an algorithm to estimate the
parameters and to select the properties of log-linear models from incomplete
data. This algorithm is an extension of the improved iterative scaling
algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm
applies to log-linear models in general and is accompanied with suitable
approximation methods when applied to large data spaces. Furthermore, we
present an approach for searching for most probable analyses of the
probabilistic constraint logic programming model. This method can be applied to
the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl
An annotation database for multimodal scientific data
Cristina Bogdanschi, Simone Santini, "An annotation database for multimodal scientific data", Proc. SPIE 7255, Multimedia Content Access: Algorithms and Systems III, 72550G (2009). Copyright 2009 Society of PhotoâOptical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.In many collaborative research environments novel tools and techniques allow researchers to generate data from experiments and observations at a staggering rate. Researchers in these areas are now facing the strong need for querying, sharing and exchanging these data in a uniform and transparent fashion. However, due to the nature of the various types of heterogeneous data and lack of local and global database structures, standard data integration approaches fail or are not applicable. A viable solution to this problem is the extensive use of metadata. In this paper we present the model of an annotation management system suitable for such research environments, and discuss some aspects of its implementation. Annotations provide rich linkage structure between data and between themselves that translates in a complex graph structure of which annotations and data are the nodes. We show how annotations are managed and used for data retrieval and outline some of the query techniques used in the system
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
CAPRI: efficient inference of cancer progression models from cross-sectional data
We devise a novel inference algorithm to effectively solve the cancer
progression model reconstruction problem. Our empirical analysis of the
accuracy and convergence rate of our algorithm, CAncer PRogression Inference
(CAPRI), shows that it outperforms the state-of-the-art algorithms addressing
similar problems.
Motivation: Several cancer-related genomic data have become available (e.g.,
The Cancer Genome Atlas, TCGA) typically involving hundreds of patients. At
present, most of these data are aggregated in a cross-sectional fashion
providing all measurements at the time of diagnosis.Our goal is to infer cancer
progression models from such data. These models are represented as directed
acyclic graphs (DAGs) of collections of selectivity relations, where a mutation
in a gene A selects for a later mutation in a gene B. Gaining insight into the
structure of such progressions has the potential to improve both the
stratification of patients and personalized therapy choices.
Results: The CAPRI algorithm relies on a scoring method based on a
probabilistic theory developed by Suppes, coupled with bootstrap and maximum
likelihood inference. The resulting algorithm is efficient, achieves high
accuracy, and has good complexity, also, in terms of convergence properties.
CAPRI performs especially well in the presence of noise in the data, and with
limited sample sizes. Moreover CAPRI, in contrast to other approaches, robustly
reconstructs different types of confluent trajectories despite irregularities
in the data.We also report on an ongoing investigation using CAPRI to study
atypical Chronic Myeloid Leukemia, in which we uncovered non trivial
selectivity relations and exclusivity patterns among key genomic events
- âŠ