12,163 research outputs found
Self-Supervised and Controlled Multi-Document Opinion Summarization
We address the problem of unsupervised abstractive summarization of
collections of user generated reviews with self-supervision and control. We
propose a self-supervised setup that considers an individual document as a
target summary for a set of similar documents. This setting makes training
simpler than previous approaches by relying only on standard log-likelihood
loss. We address the problem of hallucinations through the use of control
codes, to steer the generation towards more coherent and relevant
summaries.Finally, we extend the Transformer architecture to allow for multiple
reviews as input. Our benchmarks on two datasets against graph-based and recent
neural abstractive unsupervised models show that our proposed method generates
summaries with a superior quality and relevance.This is confirmed in our human
evaluation which focuses explicitly on the faithfulness of generated summaries
We also provide an ablation study, which shows the importance of the control
setup in controlling hallucinations and achieve high sentiment and topic
alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi
Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017)
The large scale of scholarly publications poses a challenge for scholars in
information seeking and sensemaking. Bibliometrics, information retrieval (IR),
text mining and NLP techniques could help in these search and look-up
activities, but are not yet widely used. This workshop is intended to stimulate
IR researchers and digital library professionals to elaborate on new approaches
in natural language processing, information retrieval, scientometrics, text
mining and recommendation techniques that can advance the state-of-the-art in
scholarly document understanding, analysis, and retrieval at scale. The BIRNDL
workshop at SIGIR 2017 will incorporate an invited talk, paper sessions and the
third edition of the Computational Linguistics (CL) Scientific Summarization
Shared Task.Comment: 2 pages, workshop paper accepted at the SIGIR 201
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
Adaptive Representations for Tracking Breaking News on Twitter
Twitter is often the most up-to-date source for finding and tracking breaking
news stories. Therefore, there is considerable interest in developing filters
for tweet streams in order to track and summarize stories. This is a
non-trivial text analytics task as tweets are short, and standard retrieval
methods often fail as stories evolve over time. In this paper we examine the
effectiveness of adaptive mechanisms for tracking and summarizing breaking news
stories. We evaluate the effectiveness of these mechanisms on a number of
recent news events for which manually curated timelines are available.
Assessments based on ROUGE metrics indicate that an adaptive approaches are
best suited for tracking evolving stories on Twitter.Comment: 8 Pag
- âŠ