6,560 research outputs found
TAPER: query-aware, partition-enhancement for large, heterogenous, graphs
Graph partitioning has long been seen as a viable approach to address Graph
DBMS scalability. A partitioning, however, may introduce extra query processing
latency unless it is sensitive to a specific query workload, and optimised to
minimise inter-partition traversals for that workload. Additionally, it should
also be possible to incrementally adjust the partitioning in reaction to
changes in the graph topology, the query workload, or both. Because of their
complexity, current partitioning algorithms fall short of one or both of these
requirements, as they are designed for offline use and as one-off operations.
The TAPER system aims to address both requirements, whilst leveraging existing
partitioning algorithms. TAPER takes any given initial partitioning as a
starting point, and iteratively adjusts it by swapping chosen vertices across
partitions, heuristically reducing the probability of inter-partition
traversals for a given pattern matching queries workload. Iterations are
inexpensive thanks to time and space optimisations in the underlying support
data structures. We evaluate TAPER on two different large test graphs and over
realistic query workloads. Our results indicate that, given a hash-based
partitioning, TAPER reduces the number of inter-partition traversals by around
80%; given an unweighted METIS partitioning, by around 30%. These reductions
are achieved within 8 iterations and with the additional advantage of being
workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
A Trichotomy for Regular Simple Path Queries on Graphs
Regular path queries (RPQs) select nodes connected by some path in a graph.
The edge labels of such a path have to form a word that matches a given regular
expression. We investigate the evaluation of RPQs with an additional constraint
that prevents multiple traversals of the same nodes. Those regular simple path
queries (RSPQs) find several applications in practice, yet they quickly become
intractable, even for basic languages such as (aa)* or a*ba*.
In this paper, we establish a comprehensive classification of regular
languages with respect to the complexity of the corresponding regular simple
path query problem. More precisely, we identify the fragment that is maximal in
the following sense: regular simple path queries can be evaluated in polynomial
time for every regular language L that belongs to this fragment and evaluation
is NP-complete for languages outside this fragment. We thus fully characterize
the frontier between tractability and intractability for RSPQs, and we refine
our results to show the following trichotomy: Evaluations of RSPQs is either
AC0, NL-complete or NP-complete in data complexity, depending on the regular
language L. The fragment identified also admits a simple characterization in
terms of regular expressions.
Finally, we also discuss the complexity of the following decision problem:
decide, given a language L, whether finding a regular simple path for L is
tractable. We consider several alternative representations of L: DFAs, NFAs or
regular expressions, and prove that this problem is NL-complete for the first
representation and PSPACE-complete for the other two. As a conclusion we extend
our results from edge-labeled graphs to vertex-labeled graphs and vertex-edge
labeled graphs.Comment: 15 pages, conference submissio
Bioinformatics service reconciliation by heterogeneous schema transformation
This paper focuses on the problem of bioinformatics service reconciliation in a generic and scalable manner so as to enhance interoperability in a highly evolving field. Using XML as a common representation format, but also supporting existing flat-file representation formats, we propose an approach for the scalable semi-automatic reconciliation of services, possibly invoked from within a scientific workflows tool. Service reconciliation may use the AutoMed heterogeneous data integration system as an intermediary service, or may use AutoMed to produce services that mediate between services. We discuss the application of our approach for the reconciliation of services in an example bioinformatics workflow. The main contribution of this research is an architecture for the scalable reconciliation of bioinformatics services
Web and Semantic Web Query Languages
A number of techniques have been developed to facilitate
powerful data retrieval on the Web and Semantic Web. Three categories
of Web query languages can be distinguished, according to the format
of the data they can retrieve: XML, RDF and Topic Maps. This article
introduces the spectrum of languages falling into these categories
and summarises their salient aspects. The languages are introduced using
common sample data and query types. Key aspects of the query
languages considered are stressed in a conclusion
- …