34,652 research outputs found
Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval
Semantic similarity based retrieval is playing an increasingly important role
in many IR systems such as modern web search, question-answering, similar
document retrieval etc. Improvements in retrieval of semantically similar
content are very significant to applications like Quora, Stack Overflow, Siri
etc. We propose a novel unsupervised model for semantic similarity based
content retrieval, where we construct semantic flow graphs for each query, and
introduce the concept of "soft seeding" in graph based semi-supervised learning
(SSL) to convert this into an unsupervised model.
We demonstrate the effectiveness of our model on an equivalent question
retrieval problem on the Stack Exchange QA dataset, where our unsupervised
approach significantly outperforms the state-of-the-art unsupervised models,
and produces comparable results to the best supervised models. Our research
provides a method to tackle semantic similarity based retrieval without any
training data, and allows seamless extension to different domain QA
communities, as well as to other semantic equivalence tasks.Comment: Published in Proceedings of the 2017 ACM Conference on Information
and Knowledge Management (CIKM '17
A neural autoencoder approach for document ranking and query refinement in pharmacogenomic information retrieval
In this study, we investigate learning-to-
rank and query refinement approaches for
information retrieval in the pharmacogenomic domain. The goal is to improve the
information retrieval process of biomedical curators, who manually build knowledge bases for personalized medicine. We
study how to exploit the relationships be-
tween genes, variants, drugs, diseases and
outcomes as features for document ranking and query refinement.
For a supervised approach, we are faced with a
small amount of annotated data and a large
amount of unannotated data. Therefore,
we explore ways to use a neural document
auto-encoder in a semi-supervised approach. We show that a combination of established algorithms, feature-engineering
and a neural auto-encoder model yield
promising results in this setting
Learning to Rank from Samples of Variable Quality
Training deep neural networks requires many training samples, but in
practice, training labels are expensive to obtain and may be of varying
quality, as some may be from trusted expert labelers while others might be from
heuristics or other sources of weak supervision such as crowd-sourcing. This
creates a fundamental quality-versus quantity trade-off in the learning
process. Do we learn from the small amount of high-quality data or the
potentially large amount of weakly-labeled data? We argue that if the learner
could somehow know and take the label-quality into account when learning the
data representation, we could get the best of both worlds. To this end, we
introduce "fidelity-weighted learning" (FWL), a semi-supervised student-teacher
approach for training deep neural networks using weakly-labeled data. FWL
modulates the parameter updates to a student network (trained on the task we
care about) on a per-sample basis according to the posterior confidence of its
label-quality estimated by a teacher (who has access to the high-quality
labels). Both student and teacher are learned from the data. We evaluate FWL on
document ranking where we outperform state-of-the-art alternative
semi-supervised methods.Comment: Presented at The First International SIGIR2016 Workshop on Learning
From Limited Or Noisy Data For Information Retrieval. arXiv admin note:
substantial text overlap with arXiv:1711.0279
Adversarial Sampling and Training for Semi-Supervised Information Retrieval
Ad-hoc retrieval models with implicit feedback often have problems, e.g., the
imbalanced classes in the data set. Too few clicked documents may hurt
generalization ability of the models, whereas too many non-clicked documents
may harm effectiveness of the models and efficiency of training. In addition,
recent neural network-based models are vulnerable to adversarial examples due
to the linear nature in them. To solve the problems at the same time, we
propose an adversarial sampling and training framework to learn ad-hoc
retrieval models with implicit feedback. Our key idea is (i) to augment clicked
examples by adversarial training for better generalization and (ii) to obtain
very informational non-clicked examples by adversarial sampling and training.
Experiments are performed on benchmark data sets for common ad-hoc retrieval
tasks such as Web search, item recommendation, and question answering.
Experimental results indicate that the proposed approaches significantly
outperform strong baselines especially for high-ranked documents, and they
outperform IRGAN in NDCG@5 using only 5% of labeled data for the Web search
task.Comment: Published in WWW 201
Neural Networks for Information Retrieval
Machine learning plays a role in many aspects of modern IR systems, and deep
learning is applied in all of them. The fast pace of modern-day research has
given rise to many different approaches for many different IR problems. The
amount of information available can be overwhelming both for junior students
and for experienced researchers looking for new research topics and directions.
Additionally, it is interesting to see what key insights into IR problems the
new technologies are able to give us. The aim of this full-day tutorial is to
give a clear overview of current tried-and-trusted neural methods in IR and how
they benefit IR research. It covers key architectures, as well as the most
promising future directions.Comment: Overview of full-day tutorial at SIGIR 201
- …