37,413 research outputs found
Modeling Documents as Mixtures of Persons for Expert Finding
In this paper we address the problem of searching for knowledgeable
persons within the enterprise, known as the expert finding (or
expert search) task. We present a probabilistic algorithm using the assumption
that terms in documents are produced by people who are mentioned
in them.We represent documents retrieved to a query as mixtures
of candidate experts language models. Two methods of personal language
models extraction are proposed, as well as the way of combining
them with other evidences of expertise. Experiments conducted with the
TREC Enterprise collection demonstrate the superiority of our approach
in comparison with the best one among existing solutions
Distributed Information Retrieval using Keyword Auctions
This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions
Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search
Despite substantial interest in applications of neural networks to
information retrieval, neural ranking models have only been applied to standard
ad hoc retrieval tasks over web pages and newswire documents. This paper
proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network)
a novel neural ranking model specifically designed for ranking short social
media posts. We identify document length, informal language, and heterogeneous
relevance signals as features that distinguish documents in our domain, and
present a model specifically designed with these characteristics in mind. Our
model uses hierarchical convolutional layers to learn latent semantic
soft-match relevance signals at the character, word, and phrase levels. A
pooling-based similarity measurement layer integrates evidence from multiple
types of matches between the query, the social media post, as well as URLs
contained in the post. Extensive experiments using Twitter data from the TREC
Microblog Tracks 2011--2014 show that our model significantly outperforms prior
feature-based as well and existing neural ranking models. To our best
knowledge, this paper presents the first substantial work tackling search over
social media posts using neural ranking models.Comment: AAAI 2019, 10 page
Document Informed Neural Autoregressive Topic Models with Distributional Prior
We address two challenges in topic models: (1) Context information around
words helps in determining their actual meaning, e.g., "networks" used in the
contexts "artificial neural networks" vs. "biological neuron networks".
Generative topic models infer topic-word distributions, taking no or only
little context into account. Here, we extend a neural autoregressive topic
model to exploit the full context information around words in a document in a
language modeling fashion. The proposed model is named as iDocNADE. (2) Due to
the small number of word occurrences (i.e., lack of context) in short text and
data sparsity in a corpus of few documents, the application of topic models is
challenging on such texts. Therefore, we propose a simple and efficient way of
incorporating external knowledge into neural autoregressive topic models: we
use embeddings as a distributional prior. The proposed variants are named as
DocNADEe and iDocNADEe.
We present novel neural autoregressive topic model variants that consistently
outperform state-of-the-art generative topic models in terms of generalization,
interpretability (topic coherence) and applicability (retrieval and
classification) over 7 long-text and 8 short-text datasets from diverse
domains.Comment: AAAI2019. arXiv admin note: substantial text overlap with
arXiv:1808.0379
- …