225 research outputs found
Integrating lexical and prosodic features for automatic paragraph segmentation
Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically
identify their discourse structure is an important step to understanding what a spoken document is about. Moreover,
finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little
work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how
discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical
and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models
using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical
cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak
lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural
networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that
integrate representations generated by separate lexical and prosodic models while allowing interactions between these
features streams rather than treating them as independent information sources. Application to ASR outputs shows that
adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to
transcription errors.The second author was funded from the EU’s Horizon
2020 Research and Innovation Programme under the GA
H2020-RIA-645012 and the Spanish Ministry of Economy
and Competitivity Juan de la Cierva program. The other
authors were funded by the University of Edinburgh
Automatic Paragraph Segmentation with Lexical and Prosodic Features
Comunicació presentada a la Interspeech 2016, celebrada per la International Speech Communication Association (ISCA) els dies 8 a 12 de septembre de 2016 a San Francisco (EUA).As long-form spoken documents become more ubiquitous in everyday life, so does the need for automatic discourse segmentation in spoken language processing tasks. Although previous work has focused on broad topic segmentation, detection of finer-grained discourse units, such as paragraphs, is highly desirable for presenting and analyzing spoken content. To better understand how different aspects of speech cue these subtle discourse transitions, we investigate automatic paragraph segmentation of TED talks. We build lexical and prosodic paragraph segmenters using Support Vector Machines, AdaBoost, and Long Short Term Memory (LSTM) recurrent neural networks. In general, we find that induced cue words and supra-sentential prosodic features outperform features based on topical coherence, syntactic form and complexity. However, our best performance is achieved by combining a wide range of individually weak lexical and prosodic features, with the sequence modelling LSTM generally outperforming the other classifiers by a large margin. Moreover, we find that models that allow lower level interactions between different feature types produce better results than treating lexical and prosodic contributions as separate, independent information sources.The second author is funded from the EU’s Horizon 2020 Research and Innovation Programme under the GA H2020-RIA-645012 and the Spanish Ministry of Economy and Competitivity Juan de la Cierva program
Text Segmentation Using Exponential Models
This paper introduces a new statistical approach to partitioning text
automatically into coherent segments. Our approach enlists both short-range and
long-range language models to help it sniff out likely sites of topic changes
in text. To aid its search, the system consults a set of simple lexical hints
it has learned to associate with the presence of boundaries through inspection
of a large corpus of annotated data. We also propose a new probabilistically
motivated error metric for use by the natural language processing and
information retrieval communities, intended to supersede precision and recall
for appraising segmentation algorithms. Qualitative assessment of our algorithm
as well as evaluation using this new metric demonstrate the effectiveness of
our approach in two very different domains, Wall Street Journal articles and
the TDT Corpus, a collection of newswire articles and broadcast news
transcripts.Comment: 12 pages, LaTeX source and postscript figures for EMNLP-2 pape
Handwriting Recognition of Historical Documents with few labeled data
Historical documents present many challenges for offline handwriting
recognition systems, among them, the segmentation and labeling steps. Carefully
annotated textlines are needed to train an HTR system. In some scenarios,
transcripts are only available at the paragraph level with no text-line
information. In this work, we demonstrate how to train an HTR system with few
labeled data. Specifically, we train a deep convolutional recurrent neural
network (CRNN) system on only 10% of manually labeled text-line data from a
dataset and propose an incremental training procedure that covers the rest of
the data. Performance is further increased by augmenting the training set with
specially crafted multiscale data. We also propose a model-based normalization
scheme which considers the variability in the writing scale at the recognition
phase. We apply this approach to the publicly available READ dataset. Our
system achieved the second best result during the ICDAR2017 competition
InfoLink: analysis of Dutch broadcast news and cross-media browsing
In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-data are used to index and link items. The concept is visualised using SMIL-scripts for presenting the streaming broadcast news video and the information links
- …