540 research outputs found
A detection-based pattern recognition framework and its applications
The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation.
Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages.
A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts
can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage.
This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed
in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min
Integrating lexical and prosodic features for automatic paragraph segmentation
Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically
identify their discourse structure is an important step to understanding what a spoken document is about. Moreover,
finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little
work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how
discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical
and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models
using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical
cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak
lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural
networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that
integrate representations generated by separate lexical and prosodic models while allowing interactions between these
features streams rather than treating them as independent information sources. Application to ASR outputs shows that
adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to
transcription errors.The second author was funded from the EU’s Horizon
2020 Research and Innovation Programme under the GA
H2020-RIA-645012 and the Spanish Ministry of Economy
and Competitivity Juan de la Cierva program. The other
authors were funded by the University of Edinburgh
- …