31,455 research outputs found
Improving Entity Retrieval on Structured Data
The increasing amount of data on the Web, in particular of Linked Data, has
led to a diverse landscape of datasets, which make entity retrieval a
challenging task. Explicit cross-dataset links, for instance to indicate
co-references or related entities can significantly improve entity retrieval.
However, only a small fraction of entities are interlinked through explicit
statements. In this paper, we propose a two-fold entity retrieval approach. In
a first, offline preprocessing step, we cluster entities based on the
\emph{x--means} and \emph{spectral} clustering algorithms. In the second step,
we propose an optimized retrieval model which takes advantage of our
precomputed clusters. For a given set of entities retrieved by the BM25F
retrieval approach and a given user query, we further expand the result set
with relevant entities by considering features of the queries, entities and the
precomputed clusters. Finally, we re-rank the expanded result set with respect
to the relevance to the query. We perform a thorough experimental evaluation on
the Billions Triple Challenge (BTC12) dataset. The proposed approach shows
significant improvements compared to the baseline and state of the art
approaches
Clustering and Latent Semantic Indexing Aspects of the Nonnegative Matrix Factorization
This paper provides a theoretical support for clustering aspect of the
nonnegative matrix factorization (NMF). By utilizing the Karush-Kuhn-Tucker
optimality conditions, we show that NMF objective is equivalent to graph
clustering objective, so clustering aspect of the NMF has a solid
justification. Different from previous approaches which usually discard the
nonnegativity constraints, our approach guarantees the stationary point being
used in deriving the equivalence is located on the feasible region in the
nonnegative orthant. Additionally, since clustering capability of a matrix
decomposition technique can sometimes imply its latent semantic indexing (LSI)
aspect, we will also evaluate LSI aspect of the NMF by showing its capability
in solving the synonymy and polysemy problems in synthetic datasets. And more
extensive evaluation will be conducted by comparing LSI performances of the NMF
and the singular value decomposition (SVD), the standard LSI method, using some
standard datasets.Comment: 28 pages, 5 figure
Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval
We summarize math search engines and search interfaces produced by the
Document and Pattern Recognition Lab in recent years, and in particular the min
math search interface and the Tangent search engine. Source code for both
systems are publicly available. "The Masses" refers to our emphasis on creating
systems for mathematical non-experts, who may be looking to define unfamiliar
notation, or browse documents based on the visual appearance of formulae rather
than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer
Mathematics (July, Washington DC
- …