8,771 research outputs found
Entity Query Feature Expansion Using Knowledge Base Links
Recent advances in automatic entity linking and knowledge base
construction have resulted in entity annotations for document and
query collections. For example, annotations of entities from large
general purpose knowledge bases, such as Freebase and the Google
Knowledge Graph. Understanding how to leverage these entity
annotations of text to improve ad hoc document retrieval is an open
research area. Query expansion is a commonly used technique to
improve retrieval effectiveness. Most previous query expansion
approaches focus on text, mainly using unigram concepts. In this
paper, we propose a new technique, called entity query feature
expansion (EQFE) which enriches the query with features from
entities and their links to knowledge bases, including structured
attributes and text. We experiment using both explicit query entity
annotations and latent entities. We evaluate our technique on TREC
text collections automatically annotated with knowledge base entity
links, including the Google Freebase Annotations (FACC1) data.
We find that entity-based feature expansion results in significant
improvements in retrieval effectiveness over state-of-the-art text
expansion approaches
Binary Models for Marginal Independence
Log-linear models are a classical tool for the analysis of contingency
tables. In particular, the subclass of graphical log-linear models provides a
general framework for modelling conditional independences. However, with the
exception of special structures, marginal independence hypotheses cannot be
accommodated by these traditional models. Focusing on binary variables, we
present a model class that provides a framework for modelling marginal
independences in contingency tables. The approach taken is graphical and draws
on analogies to multivariate Gaussian models for marginal independence. For the
graphical model representation we use bi-directed graphs, which are in the
tradition of path diagrams. We show how the models can be parameterized in a
simple fashion, and how maximum likelihood estimation can be performed using a
version of the Iterated Conditional Fitting algorithm. Finally we consider
combining these models with symmetry restrictions
Recommended from our members
Event-based hyperspace analogue to language for query expansion
Bag-of-words approaches to information retrieval (IR) are effective but assume independence between words. The Hyperspace Analogue to Language (HAL) is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. HAL has been successfully applied to query expansion in IR, but has several limitations, including high processing cost and use of distributional statistics that do not exploit syntax. In this paper, we pursue two methods for incorporating syntactic-semantic information from textual ‘events’ into HAL. We build the HAL space directly from events to investigate whether processing costs can be reduced through more careful definition of word co-occurrence, and improve the quality of the pseudo-relevance feedback by applying event information as a constraint during HAL construction. Both methods significantly improve performance results in comparison with original HAL, and interpolation of HAL and relevance model expansion outperforms either method alone
Flexible sampling of discrete data correlations without the marginal distributions
Learning the joint dependence of discrete variables is a fundamental problem
in machine learning, with many applications including prediction, clustering
and dimensionality reduction. More recently, the framework of copula modeling
has gained popularity due to its modular parametrization of joint
distributions. Among other properties, copulas provide a recipe for combining
flexible models for univariate marginal distributions with parametric families
suitable for potentially high dimensional dependence structures. More
radically, the extended rank likelihood approach of Hoff (2007) bypasses
learning marginal models completely when such information is ancillary to the
learning task at hand as in, e.g., standard dimensionality reduction problems
or copula parameter estimation. The main idea is to represent data by their
observable rank statistics, ignoring any other information from the marginals.
Inference is typically done in a Bayesian framework with Gaussian copulas, and
it is complicated by the fact this implies sampling within a space where the
number of constraints increases quadratically with the number of data points.
The result is slow mixing when using off-the-shelf Gibbs sampling. We present
an efficient algorithm based on recent advances on constrained Hamiltonian
Markov chain Monte Carlo that is simple to implement and does not require
paying for a quadratic cost in sample size.Comment: An overhauled version of the experimental section moved to the main
paper. Old experimental section moved to supplementary materia
- …