27 research outputs found
Integrating lexical and prosodic features for automatic paragraph segmentation
Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically
identify their discourse structure is an important step to understanding what a spoken document is about. Moreover,
finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little
work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how
discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical
and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models
using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical
cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak
lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural
networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that
integrate representations generated by separate lexical and prosodic models while allowing interactions between these
features streams rather than treating them as independent information sources. Application to ASR outputs shows that
adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to
transcription errors.The second author was funded from the EU’s Horizon
2020 Research and Innovation Programme under the GA
H2020-RIA-645012 and the Spanish Ministry of Economy
and Competitivity Juan de la Cierva program. The other
authors were funded by the University of Edinburgh
Broadcast News Segmentation Using Automatic Speech Recognition System Combination With Rescoring And Noun Unification
Siaran berita memaklumkan perkembangan terbaru, peristiwa dan isu-isu terkini yang berlaku di dunia kepada penonton. Pada masa kini, berita yang disiarkan boleh diakses dengan mudah atas talian.
Broadcast news keeps viewers informed about the latest developments, events and issues occurring in the world. Nowadays, broadcast news can be easily accessed online
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Indexing and retrieval of broadcast news
This paper describes a spoken document retrieval (SDR) system for British and North American Broadcast News. The system is based on a connectionist large vocabulary speech recognizer and a probabilistic information retrieval
system. We discuss the development of a realtime Broadcast News speech recognizer, and its integration into an SDR system. Two advances were made for this task: automatic segmentation and statistical query expansion using a secondary
corpus. Precision and recall results using the Text Retrieval Conference (TREC) SDR evaluation infrastructure are reported throughout the paper, and we discuss the application of these developments to a large scale SDR task based on
an archive of British English broadcast news