71,316 research outputs found
Interpretable Low-Rank Document Representations with Label-Dependent Sparsity Patterns
In context of document classification, where in a corpus of documents their
label tags are readily known, an opportunity lies in utilizing label
information to learn document representation spaces with better discriminative
properties. To this end, in this paper application of a Variational Bayesian
Supervised Nonnegative Matrix Factorization (supervised vbNMF) with
label-driven sparsity structure of coefficients is proposed for learning of
discriminative nonsubtractive latent semantic components occuring in TF-IDF
document representations. Constraints are such that the components pursued are
made to be frequently occuring in a small set of labels only, making it
possible to yield document representations with distinctive label-specific
sparse activation patterns. A simple measure of quality of this kind of
sparsity structure, dubbed inter-label sparsity, is introduced and
experimentally brought into tight connection with classification performance.
Representing a great practical convenience, inter-label sparsity is shown to be
easily controlled in supervised vbNMF by a single parameter
Neural Vector Spaces for Unsupervised Information Retrieval
We propose the Neural Vector Space Model (NVSM), a method that learns
representations of documents in an unsupervised manner for news article
retrieval. In the NVSM paradigm, we learn low-dimensional representations of
words and documents from scratch using gradient descent and rank documents
according to their similarity with query representations that are composed from
word representations. We show that NVSM performs better at document ranking
than existing latent semantic vector space methods. The addition of NVSM to a
mixture of lexical language models and a state-of-the-art baseline vector space
model yields a statistically significant increase in retrieval effectiveness.
Consequently, NVSM adds a complementary relevance signal. Next to semantic
matching, we find that NVSM performs well in cases where lexical matching is
needed.
NVSM learns a notion of term specificity directly from the document
collection without feature engineering. We also show that NVSM learns
regularities related to Luhn significance. Finally, we give advice on how to
deploy NVSM in situations where model selection (e.g., cross-validation) is
infeasible. We find that an unsupervised ensemble of multiple models trained
with different hyperparameter values performs better than a single
cross-validated model. Therefore, NVSM can safely be used for ranking documents
without supervised relevance judgments.Comment: TOIS 201
Exploiting Sentence Embedding for Medical Question Answering
Despite the great success of word embedding, sentence embedding remains a
not-well-solved problem. In this paper, we present a supervised learning
framework to exploit sentence embedding for the medical question answering
task. The learning framework consists of two main parts: 1) a sentence
embedding producing module, and 2) a scoring module. The former is developed
with contextual self-attention and multi-scale techniques to encode a sentence
into an embedding tensor. This module is shortly called Contextual
self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two
scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association
Scoring (SAS). SMS measures similarity while SAS captures association between
sentence pairs: a medical question concatenated with a candidate choice, and a
piece of corresponding supportive evidence. The proposed framework is examined
by two Medical Question Answering(MedicalQA) datasets which are collected from
real-world applications: medical exam and clinical diagnosis based on
electronic medical records (EMR). The comparison results show that our proposed
framework achieved significant improvements compared to competitive baseline
approaches. Additionally, a series of controlled experiments are also conducted
to illustrate that the multi-scale strategy and the contextual self-attention
layer play important roles for producing effective sentence embedding, and the
two kinds of scoring strategies are highly complementary to each other for
question answering problems.Comment: 8 page
Recommended from our members
Visualising and animating visual information foraging in context
Optimal information foraging provides a potentially useful framework for modelling, analysing, and interpreting search strategies of users through a spatial-semantic interface. Improving the understanding of behavioural patterns of users in such environments has implications for the design and refinement of a range of user interfaces. In this article, we outline the role of optimal information foraging in the study of visual information retrieval and how one may use visualisation and animation techniques to put behavioural patterns in context. Behavioural patterns of information
foraging in an information space are visualised and animated to aid further in-depth analysis of search strategies
Recommended from our members
Enriching videos with light semantics
This paper describes an ongoing prototypical framework to annotate and retrieve web videos with light semantics. The proposed framework reuses many existing vocabularies along with a video model. The knowledge is captured from three different information spaces (media content, context, document). We also describe ways to extract the semantic content descriptions from the existing usergenerated content using multiple approaches of linguistic processing and Named Entity Recognition, which are later identified with DBpedia resources to establish meanings for the tags. Finally, the implemented prototype is described with multiple search interfaces and retrieval processes. Evaluation on semantic enrichment shows a considerable (50% of videos) improvement in content description
- β¦