Search CORE

15,999 research outputs found

Search and Result Presentation in Scientific Workflow Repositories

Author: Davidson Susan B.
Huang Xiaocheng
Stoyanovich Julia
Yuan Xiaojie
Publication venue
Publication date: 01/01/2013
Field of study

We study the problem of searching a repository of complex hierarchical workflows whose component modules, both composite and atomic, have been annotated with keywords. Since keyword search does not use the graph structure of a workflow, we develop a model of workflows using context-free bag grammars. We then give efficient polynomial-time algorithms that, given a workflow and a keyword query, determine whether some execution of the workflow matches the query. Based on these algorithms we develop a search and ranking solution that efficiently retrieves the top-k grammars from a repository. Finally, we propose a novel result presentation method for grammars matching a keyword query, based on representative parse-trees. The effectiveness of our approach is validated through an extensive experimental evaluation

arXiv.org e-Print Archive

Crossref

ScholarlyCommons@Penn

Using Hashing to Solve the Dictionary Problem (In External Memory)

Author: Iacono John
Pǎtraşcu Mihai
Publication venue
Publication date: 01/01/2011
Field of study

We consider the dictionary problem in external memory and improve the update time of the well-known buffer tree by roughly a logarithmic factor. For any \lambda >= max {lg lg n, log_{M/B} (n/B)}, we can support updates in time O(\lambda / B) and queries in sublogarithmic time, O(log_\lambda n). We also present a lower bound in the cell-probe model showing that our data structure is optimal. In the RAM, hash tables have been used to solve the dictionary problem faster than binary search for more than half a century. By contrast, our data structure is the first to beat the comparison barrier in external memory. Ours is also the first data structure to depart convincingly from the indivisibility paradigm

arXiv.org e-Print Archive

CiteSeerX

DI-fusion

A limit process for partial match queries in random quadtrees and $2$ -d trees

Author: Broutin Nicolas
Neininger Ralph
Sulzbach Henning
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

We consider the problem of recovering items matching a partially specified pattern in multidimensional trees (quadtrees and

k

-d trees). We assume the traditional model where the data consist of independent and uniform points in the unit square. For this model, in a structure on

n

points, it is known that the number of nodes

C_n(\xi )

to visit in order to report the items matching a random query

\xi

, independent and uniformly distributed on

[0,1]

, satisfies

\mathbf {E}[{C_n(\xi )}]\sim\kappa n^{\beta}

, where

\kappa

and

\beta

are explicit constants. We develop an approach based on the analysis of the cost

C_n(s)

of any fixed query

s\in[0,1]

, and give precise estimates for the variance and limit distribution of the cost

C_n(x)

. Our results permit us to describe a limit process for the costs

C_n(x)

x

varies in

[0,1]

; one of the consequences is that

\mathbf {E}[{\max_{x\in[0,1]}C_n(x)}]\sim \gamma n^{\beta}

; this settles a question of Devroye [Pers. Comm., 2000].Comment: Published in at http://dx.doi.org/10.1214/12-AAP912 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: text overlap with arXiv:1107.223

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Ensuring Query Compatibility with Evolving XML Schemas

Author: Genevès Pierre
Layaïda Nabil
Quint Vincent
Publication venue
Publication date: 01/01/2008
Field of study

During the life cycle of an XML application, both schemas and queries may change from one version to another. Schema evolutions may affect query results and potentially the validity of produced data. Nowadays, a challenge is to assess and accommodate the impact of theses changes in rapidly evolving XML applications. This article proposes a logical framework and tool for verifying forward/backward compatibility issues involving schemas and queries. First, it allows analyzing relations between schemas. Second, it allows XML designers to identify queries that must be reformulated in order to produce the expected results across successive schema versions. Third, it allows examining more precisely the impact of schema changes over queries, therefore facilitating their reformulation

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Efficient Genomic Interval Queries Using Augmented Range Trees

Author: Eran Alal
Luo Yuan
Mao Chengsheng
Publication venue
Publication date: 04/06/2018
Field of study

Efficient large-scale annotation of genomic intervals is essential for personal genome interpretation in the realm of precision medicine. There are 13 possible relations between two intervals according to Allen's interval algebra. Conventional interval trees are routinely used to identify the genomic intervals satisfying a coarse relation with a query interval, but cannot support efficient query for more refined relations such as all Allen's relations. We design and implement a novel approach to address this unmet need. Through rewriting Allen's interval relations, we transform an interval query to a range query, then adapt and utilize the range trees for querying. We implement two types of range trees: a basic 2-dimensional range tree (2D-RT) and an augmented range tree with fractional cascading (RTFC) and compare them with the conventional interval tree (IT). Theoretical analysis shows that RTFC can achieve the best time complexity for interval queries regarding all Allen's relations among the three trees. We also perform comparative experiments on the efficiency of RTFC, 2D-RT and IT in querying noncoding element annotations in a large collection of personal genomes. Our experimental results show that 2D-RT is more efficient than IT for interval queries regarding most of Allen's relations, RTFC is even more efficient than 2D-RT. The results demonstrate that RTFC is an efficient data structure for querying large-scale datasets regarding Allen's relations between genomic intervals, such as those required by interpreting genome-wide variation in large populations.Comment: 4 figures, 4 table

arXiv.org e-Print Archive

Directory of Open Access Journals

Eliminating Recursion from Monadic Datalog Programs on Trees

Author: D Calvanese
Filip Mazowiecki
G Hillebrand
H Gaifman
Henrik Björklund
J Naughton
J Naughton
Michael Benedikt
Mikołaj Bojańczyk
O Shmueli
S Abiteboul
S Ceri
Y Sagiv
Publication venue
Publication date: 10/05/2015
Field of study

We study the problem of eliminating recursion from monadic datalog programs on trees with an infinite set of labels. We show that the boundedness problem, i.e., determining whether a datalog program is equivalent to some nonrecursive one is undecidable but the decidability is regained if the descendant relation is disallowed. Under similar restrictions we obtain decidability of the problem of equivalence to a given nonrecursive program. We investigate the connection between these two problems in more detail

arXiv.org e-Print Archive

Crossref