32,870 research outputs found
Non-hierarchical Structures: How to Model and Index Overlaps?
Overlap is a common phenomenon seen when structural components of a digital
object are neither disjoint nor nested inside each other. Overlapping
components resist reduction to a structural hierarchy, and tree-based indexing
and query processing techniques cannot be used for them. Our solution to this
data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a
novel extension of the XML data model for non-hierarchical structures. We
introduce an algorithm for constructing TGSA from annotated documents; the
algorithm can efficiently process non-hierarchical structures and is associated
with formal proofs, ensuring that transformation of the document to the data
model is valid. To enable high performance query analysis in large data
repositories, we further introduce an extension of XML pre-post indexing for
non-hierarchical structures, which can process both reachability and
overlapping relationships.Comment: The paper has been accepted at the Balisage 2014 conferenc
A D.C. Programming Approach to the Sparse Generalized Eigenvalue Problem
In this paper, we consider the sparse eigenvalue problem wherein the goal is
to obtain a sparse solution to the generalized eigenvalue problem. We achieve
this by constraining the cardinality of the solution to the generalized
eigenvalue problem and obtain sparse principal component analysis (PCA), sparse
canonical correlation analysis (CCA) and sparse Fisher discriminant analysis
(FDA) as special cases. Unlike the -norm approximation to the
cardinality constraint, which previous methods have used in the context of
sparse PCA, we propose a tighter approximation that is related to the negative
log-likelihood of a Student's t-distribution. The problem is then framed as a
d.c. (difference of convex functions) program and is solved as a sequence of
convex programs by invoking the majorization-minimization method. The resulting
algorithm is proved to exhibit \emph{global convergence} behavior, i.e., for
any random initialization, the sequence (subsequence) of iterates generated by
the algorithm converges to a stationary point of the d.c. program. The
performance of the algorithm is empirically demonstrated on both sparse PCA
(finding few relevant genes that explain as much variance as possible in a
high-dimensional gene dataset) and sparse CCA (cross-language document
retrieval and vocabulary selection for music retrieval) applications.Comment: 40 page
An LSH Index for Computing Kendall's Tau over Top-k Lists
We consider the problem of similarity search within a set of top-k lists
under the Kendall's Tau distance function. This distance describes how related
two rankings are in terms of concordantly and discordantly ordered items. As
top-k lists are usually very short compared to the global domain of possible
items to be ranked, creating an inverted index to look up overlapping lists is
possible but does not capture tight enough the similarity measure. In this
work, we investigate locality sensitive hashing schemes for the Kendall's Tau
distance and evaluate the proposed methods using two real-world datasets.Comment: 6 pages, 8 subfigures, presented in Seventeenth International
Workshop on the Web and Databases (WebDB 2014) co-located with ACM SIGMOD201
Ptolemaic Indexing
This paper discusses a new family of bounds for use in similarity search,
related to those used in metric indexing, but based on Ptolemy's inequality,
rather than the metric axioms. Ptolemy's inequality holds for the well-known
Euclidean distance, but is also shown here to hold for quadratic form metrics
in general, with Mahalanobis distance as an important special case. The
inequality is examined empirically on both synthetic and real-world data sets
and is also found to hold approximately, with a very low degree of error, for
important distances such as the angular pseudometric and several Lp norms.
Indexing experiments demonstrate a highly increased filtering power compared to
existing, triangular methods. It is also shown that combining the Ptolemaic and
triangular filtering can lead to better results than using either approach on
its own
Phaseless VLBI mapping of compact extragalactic radio sources
The problem of phaseless aperture synthesis is of current interest in
phase-unstable VLBI with a small number of elements when either the use of
closure phases is not possible (a two-element interferometer) or their quality
and number are not enough for acceptable image reconstruction by standard
adaptive calibration methods. Therefore, we discuss the problem of unique image
reconstruction only from the spectrum magnitude of a source. We suggest an
efficient method for phaseless VLBI mapping of compact extragalactic radio
sources. This method is based on the reconstruction of the spectrum magnitude
for a source on the entire UV plane from the measured visibility magnitude on a
limited set of points and the reconstruction of the sought-for image of the
source by Fienup's method from the spectrum magnitude reconstructed at the
first stage. We present the results of our mapping of the extragalactic radio
source 2200 +420 using astrometric and geodetic observations on a global VLBI
array. Particular attention is given to studying the capabilities of a
two-element interferometer in connection with the putting into operation of a
Russian-made radio interferometer based on Quasar RT-32 radio telescopes.Comment: 21 pages, 6 figure
- …