Search CORE

47,500 research outputs found

Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification

Author: Chen Yun-Nung
Lee Hung-Yi
Lee Lin-shan
Lu Bo-Ru
Shyu Frank
Publication venue
Publication date: 15/11/2017
Field of study

Connectionist temporal classification (CTC) is a powerful approach for sequence-to-sequence learning, and has been popularly used in speech recognition. The central ideas of CTC include adding a label "blank" during training. With this mechanism, CTC eliminates the need of segment alignment, and hence has been applied to various sequence-to-sequence learning problems. In this work, we applied CTC to abstractive summarization for spoken content. The "blank" in this case implies the corresponding input data are less important or noisy; thus it can be ignored. This approach was shown to outperform the existing methods in term of ROUGE scores over Chinese Gigaword and MATBN corpora. This approach also has the nice property that the ordering of words or characters in the input documents can be better preserved in the generated summaries.Comment: Accepted by Interspeech 201

arXiv.org e-Print Archive

Crossref

Building a semantically annotated corpus of clinical texts

Author: Andrea Setzer
Angus Roberts
Denny
Franzén
Friedman
Gennari
George Demetriou
Hersh
Hripcsak
Ian Roberts
Kim
Lindberg
Mark Hepple
Meystre
Pestian
Robert Gaizauskas
Roberts
Tanabe
Yikun Guo
Publication venue: 'Elsevier BV'
Publication date: 01/10/2009
Field of study

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

Elsevier - Publisher Connector

Crossref

White Rose Research Online

Development of a speech recognition system for Spanish broadcast news

Author: Jong Franciska de
Niculescu Andreea
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2008
Field of study

This paper reports on the development process of a speech recognition system for Spanish broadcast news within the MESH FP6 project. The system uses the SONIC recognizer developed at the Center for Spoken Language Research (CSLR), University of Colorado. Acoustic and language models were trained using Hub4 broadcast news data. Experiments and evaluation results are reported

University of Twente Research Information

Beyond aspect: will be -ing and shall be -ing

Author: Adamczewski
AGNÈS CELLE
Aijmer
Barber
Bellow
Biber
Biber
Blokh
Bouscaren
Boyd
Boyd
Brookner
Bybee
Celle
Coates
Comrie
Dahl
Declerck
Declerck
Denison
Fenning
Fischer
Gachelin
Garside
Guentchéva
Heine
Heine
Henri
Hirtle
Le Carré
Leech
Leech
Ljung
Mainwaring
Mair
Mair
McCarthy
Mettouchi
Miyahara
Mossé
Mustanoja
NICHOLAS SMITH
Palmer
Palmer
Palmer
Palmer
Quirk
Seoane
Smith
Smitterberg
Strang
Traugott
Trousdale
Visser
Wekker
Williams
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2010
Field of study

This article discusses the synchronic status and diachronic development of will be -ing and shall be -ing (as in I’ll be leaving at noon).2 Although available since at least Middle English, the constructions did not establish a significant foothold in standard English until the twentieth century. Both types are also more prevalent in British English (BrE) than American English (AmE). We argue that in present-day usage will/shall be -ing are aspectually underspecified: instances that clearly construe a situation as future-in-progress are in the minority. Similarly, although volition-neutrality has been identified as a key feature of will/shall be -ing, it is important to take account of other, generally richer meanings and associations, notably ‘future-as-matter-of-course’ (Leech 2004), ‘already-decided future’ (Huddleston & Pullum et al. 2002) and non-agentivity. Like volition-neutrality, these characteristics appear to be relevant not only in contemporary use, but also in their historical expansion. We show that the construction has evolved from progressive aspect towards more subjectivised evidential meaning

University of Salford Institutional Repository

Crossref

Hal-Diderot

DancingLines: An Analytical Scheme to Depict Cross-Platform Event Popularity

Author: B Bao
F Giummolè
H Sakoe
M Newman
N Liu
V Maus
X Zhou
Y Tang
YS Jeong
Publication venue
Publication date: 22/12/2017
Field of study

Nowadays, events usually burst and are propagated online through multiple modern media like social networks and search engines. There exists various research discussing the event dissemination trends on individual medium, while few studies focus on event popularity analysis from a cross-platform perspective. Challenges come from the vast diversity of events and media, limited access to aligned datasets across different media and a great deal of noise in the datasets. In this paper, we design DancingLines, an innovative scheme that captures and quantitatively analyzes event popularity between pairwise text media. It contains two models: TF-SW, a semantic-aware popularity quantification model, based on an integrated weight coefficient leveraging Word2Vec and TextRank; and wDTW-CD, a pairwise event popularity time series alignment model matching different event phases adapted from Dynamic Time Warping. We also propose three metrics to interpret event popularity trends between pairwise social platforms. Experimental results on eighteen real-world event datasets from an influential social network and a popular search engine validate the effectiveness and applicability of our scheme. DancingLines is demonstrated to possess broad application potentials for discovering the knowledge of various aspects related to events and different media

arXiv.org e-Print Archive

Crossref

Exploring narrativity in data visualization in journalism

Author: Weber Wibke
Publication venue: Amsterdam University Press
Publication date: 01/01/2020
Field of study

Many news stories are based on data visualization, and storytelling with data has become a buzzword in journalism. But what exactly does storytelling with data mean? When does a data visualization tell a story? And what are narrative constituents in data visualization? This chapter first defines the key terms in this context: story, narrative, narrativity, showing and telling. Then, it sheds light on the various forms of narrativity in data visualization and, based on a corpus analysis of 73 data visualizations, describes the basic visual elements that constitute narrativity: the instance of a narrator, sequentiality, temporal dimension, and tellability. The paper concludes that understanding how data are transformed into visual stories is key to understanding how facts are shaped and communicated in society

Crossref

ZHAW digitalcollection

Recommended from our members

A lightweight, pattern-based approach to identification and formalisation of TimeML expressions in clinical narratives

Author: Gooch P.
Publication venue
Publication date: 01/01/2012
Field of study

General Architecture for Text Engineering (GATE) components for identifying clinical events and temporal expressions are developed and evaluated against a corpus of 120 discharge summaries

City Research Online