2,893 research outputs found
Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization
Information retrieval (IR) for precision medicine (PM) often involves looking
for multiple pieces of evidence that characterize a patient case. This
typically includes at least the name of a condition and a genetic variation
that applies to the patient. Other factors such as demographic attributes,
comorbidities, and social determinants may also be pertinent. As such, the
retrieval problem is often formulated as ad hoc search but with multiple facets
(e.g., disease, mutation) that may need to be incorporated. In this paper, we
present a document reranking approach that combines neural query-document
matching and text summarization toward such retrieval scenarios. Our
architecture builds on the basic BERT model with three specific components for
reranking: (a). document-query matching (b). keyword extraction and (c).
facet-conditioned abstractive summarization. The outcomes of (b) and (c) are
used to essentially transform a candidate document into a concise summary that
can be compared with the query at hand to compute a relevance score. Component
(a) directly generates a matching score of a candidate document for a query.
The full architecture benefits from the complementary potential of
document-query matching and the novel document transformation approach based on
summarization along PM facets. Evaluations using NIST's TREC-PM track datasets
(2017--2019) show that our model achieves state-of-the-art performance. To
foster reproducibility, our code is made available here:
https://github.com/bionlproc/text-summ-for-doc-retrieval.Comment: Accepted to EMNLP 2020 Findings as Long Paper (11 page, 4 figures
Words are Malleable: Computing Semantic Shifts in Political and Media Discourse
Recently, researchers started to pay attention to the detection of temporal
shifts in the meaning of words. However, most (if not all) of these approaches
restricted their efforts to uncovering change over time, thus neglecting other
valuable dimensions such as social or political variability. We propose an
approach for detecting semantic shifts between different viewpoints--broadly
defined as a set of texts that share a specific metadata feature, which can be
a time-period, but also a social entity such as a political party. For each
viewpoint, we learn a semantic space in which each word is represented as a low
dimensional neural embedded vector. The challenge is to compare the meaning of
a word in one space to its meaning in another space and measure the size of the
semantic shifts. We compare the effectiveness of a measure based on optimal
transformations between the two spaces with a measure based on the similarity
of the neighbors of the word in the respective spaces. Our experiments
demonstrate that the combination of these two performs best. We show that the
semantic shifts not only occur over time, but also along different viewpoints
in a short period of time. For evaluation, we demonstrate how this approach
captures meaningful semantic shifts and can help improve other tasks such as
the contrastive viewpoint summarization and ideology detection (measured as
classification accuracy) in political texts. We also show that the two laws of
semantic change which were empirically shown to hold for temporal shifts also
hold for shifts across viewpoints. These laws state that frequent words are
less likely to shift meaning while words with many senses are more likely to do
so.Comment: In Proceedings of the 26th ACM International on Conference on
Information and Knowledge Management (CIKM2017
Collaborative Summarization of Topic-Related Videos
Large collections of videos are grouped into clusters by a topic keyword,
such as Eiffel Tower or Surfing, with many important visual concepts repeating
across them. Such a topically close set of videos have mutual influence on each
other, which could be used to summarize one of them by exploiting information
from others in the set. We build on this intuition to develop a novel approach
to extract a summary that simultaneously captures both important
particularities arising in the given video, as well as, generalities identified
from the set of videos. The topic-related videos provide visual context to
identify the important parts of the video being summarized. We achieve this by
developing a collaborative sparse optimization method which can be efficiently
solved by a half-quadratic minimization algorithm. Our work builds upon the
idea of collaborative techniques from information retrieval and natural
language processing, which typically use the attributes of other similar
objects to predict the attribute of a given object. Experiments on two
challenging and diverse datasets well demonstrate the efficacy of our approach
over state-of-the-art methods.Comment: CVPR 201
Literature Retrieval for Precision Medicine with Neural Matching and Faceted Summarization
Information retrieval (IR) for precision medicine (PM) often involves looking for multiple pieces of evidence that characterize a patient case. This typically includes at least the name of a condition and a genetic variation that applies to the patient. Other factors such as demographic attributes, comorbidities, and social determinants may also be pertinent. As such, the retrieval problem is often formulated as ad hoc search but with multiple facets (e.g., disease, mutation) that may need to be incorporated. In this paper, we present a document reranking approach that combines neural query-document matching and text summarization toward such retrieval scenarios. Our architecture builds on the basic BERT model with three specific components for reranking: (a). document-query matching (b). keyword extraction and (c). facet-conditioned abstractive summarization. The outcomes of (b) and (c) are used to essentially transform a candidate document into a concise summary that can be compared with the query at hand to compute a relevance score. Component (a) directly generates a matching score of a candidate document for a query. The full architecture benefits from the complementary potential of document-query matching and the novel document transformation approach based on summarization along PM facets. Evaluations using NIST’s TREC-PM track datasets (2017–2019) show that our model achieves state-of-the-art performance. To foster reproducibility, our code is made available here: https://github.com/bionlproc/text-summ-for-doc-retrieval
ImageSieve: Exploratory search of museum archives with named entity-based faceted browsing
Over the last few years, faceted search emerged as an attractive alternative to the traditional "text box" search and has become one of the standard ways of interaction on many e-commerce sites. However, these applications of faceted search are limited to domains where the objects of interests have already been classified along several independent dimensions, such as price, year, or brand. While automatic approaches to generate faceted search interfaces were proposed, it is not yet clear to what extent the automatically-produced interfaces will be useful to real users, and whether their quality can match or surpass their manually-produced predecessors. The goal of this paper is to introduce an exploratory search interface called ImageSieve, which shares many features with traditional faceted browsing, but can function without the use of traditional faceted metadata. ImageSieve uses automatically extracted and classified named entities, which play important roles in many domains (such as news collections, image archives, etc.). We describe one specific application of ImageSieve for image search. Here, named entities extracted from the descriptions of the retrieved images are used to organize a faceted browsing interface, which then helps users to make sense of and further explore the retrieved images. The results of a user study of ImageSieve demonstrate that a faceted search system based on named entities can help users explore large collections and find relevant information more effectively
- …