141 research outputs found
A detection-based pattern recognition framework and its applications
The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation.
Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages.
A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts
can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage.
This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed
in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min
Soft margin estimation for automatic speech recognition
In this study, a new discriminative learning framework, called soft margin estimation (SME), is proposed for estimating the parameters of continuous density hidden Markov models (HMMs). The proposed method makes direct use of the successful ideas of margin in support vector machines to improve generalization capability and decision feedback learning in discriminative training to enhance model separation in classifier design. SME directly maximizes the separation of competing models to enhance the testing samples to approach a correct decision if the deviation from training samples is within a safe margin. Frame and utterance selections are integrated into a unified framework to select the training utterances and frames critical for discriminating competing models. SME offers a flexible and rigorous framework to facilitate the incorporation of new margin-based optimization criteria into HMMs training. The choice of various loss functions is illustrated and different kinds of separation measures are defined under a unified SME framework. SME is also shown to be able to jointly optimize feature extraction and HMMs. Both the generalized probabilistic descent algorithm and the Extended Baum-Welch algorithm are applied to solve SME.
SME has demonstrated its great advantage over other discriminative training methods in several speech recognition tasks. Tested on the TIDIGITS digit recognition task, the proposed SME approach achieves a string accuracy of 99.61%, the best result ever reported in literature. On the 5k-word Wall Street Journal task, SME reduced the word error rate (WER) from 5.06% of MLE models to 3.81%, with relative 25% WER reduction. This is the first attempt to show the effectiveness of margin-based acoustic modeling for large vocabulary continuous speech recognition in a HMMs framework. The generalization of SME was also well demonstrated on the Aurora 2 robust speech recognition task, with around 30% relative WER reduction from the clean-trained baseline.Ph.D.Committee Chair: Dr. Chin-Hui Lee; Committee Member: Dr. Anthony Joseph Yezzi; Committee Member: Dr. Biing-Hwang (Fred) Juang; Committee Member: Dr. Mark Clements; Committee Member: Dr. Ming Yua
Support Vector Machines for Speech Recognition
Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm centered on principles of structural risk minimization using a discriminative framework for speech recognition based on support vector machines (SVMs). SVMs have the ability to simultaneously optimize the representational and discriminative ability of the acoustic classifiers. We have developed the first SVM-based large vocabulary speech recognition system that improves performance over traditional HMM-based systems. This hybrid system achieves a state-of-the-art word error rate of 10.6% on a continuous alphadigit task ? a 10% improvement relative to an HMM system. On SWITCHBOARD, a large vocabulary task, the system improves performance over a traditional HMM system from 41.6% word error rate to 40.6%. This dissertation discusses several practical issues that arise when SVMs are incorporated into the hybrid system
Deep Learning Models For Biomedical Data Analysis
The field of biomedical data analysis is a vibrant area of research dedicated to extracting valuable insights from a wide range of biomedical data sources, including biomedical images and genomics data. The emergence of deep learning, an artificial intelligence approach, presents significant prospects for enhancing biomedical data analysis and knowledge discovery. This dissertation focused on exploring innovative deep-learning methods for biomedical image processing and gene data analysis.
During the COVID-19 pandemic, biomedical imaging data, including CT scans and chest x-rays, played a pivotal role in identifying COVID-19 cases by categorizing patient chest x-ray outcomes as COVID-19-positive or negative. While supervised deep learning methods have effectively recognized COVID-19 patterns in chest x-ray datasets, the availability of annotated training data remains limited. To address this challenge, the thesis introduced a semi-supervised deep learning model named ssResNet, built upon the Residual Neural Network (ResNet) architecture. The model combines supervised and unsupervised paths, incorporating a weighted supervised loss function to manage data imbalance. The strategies to diminish prediction uncertainty in deep learning models for critical applications like medical image processing is explore. It achieves this through an ensemble deep learning model, integrating bagging deep learning and model calibration techniques. This ensemble model not only boosts biomedical image segmentation accuracy but also reduces prediction uncertainty, as validated on a comprehensive chest x-ray image segmentation dataset.
Furthermore, the thesis introduced an ensemble model integrating Proformer and ensemble learning methodologies. This model constructs multiple independent Proformers for predicting gene expression, their predictions are combined through weighted averaging to generate final predictions. Experimental outcomes underscore the efficacy of this ensemble model in enhancing prediction performance across various metrics.
In conclusion, this dissertation advances biomedical data analysis by harnessing the potential of deep learning techniques. It devises innovative approaches for processing biomedical images and gene data. By leveraging deep learning\u27s capabilities, this work paves the way for further progress in biomedical data analytics and its applications within clinical contexts.
Index Terms- biomedical data analysis, COVID-19, deep learning, ensemble learning, gene data analytics, medical image segmentation, prediction uncertainty, Proformer, Residual Neural Network (ResNet), semi-supervised learning
Hardware-software codesign in a high-level synthesis environment
Interfacing hardware-oriented high-level synthesis to software development is a computationally hard problem for which no general solution exists. Under special conditions, the hardware-software codesign (system-level synthesis) problem may be analyzed with traditional tools and efficient heuristics. This dissertation introduces a new alternative to the currently used heuristic methods. The new approach combines the results of top-down hardware development with existing basic hardware units (bottom-up libraries) and compiler generation tools. The optimization goal is to maximize operating frequency or minimize cost with reasonable tradeoffs in other properties.
The dissertation research provides a unified approach to hardware-software codesign. The improvements over previously existing design methodologies are presented in the frame-work of an academic CAD environment (PIPE). This CAD environment implements a sufficient subset of functions of commercial microelectronics CAD packages. The results may be generalized for other general-purpose algorithms or environments.
Reference benchmarks are used to validate the new approach. Most of the well-known benchmarks are based on discrete-time numerical simulations, digital filtering applications, and cryptography (an emerging field in benchmarking). As there is a need for high-performance applications, an additional requirement for this dissertation is to investigate pipelined hardware-software systems\u27 performance and design methods. The results demonstrate that the quality of existing heuristics does not change in the enhanced, hardware-software environment
Framework for Human Computer Interaction for Learning Dialogue Strategies using Controlled Natural Language in Information Systems
Spoken Language systems are going to have a tremendous impact in all
the real world applications, be it healthcare enquiry, public transportation
system or airline booking system maintaining the language ethnicity for
interaction among users across the globe. These system have the capability
of interacting with the user in di erent languages that the system
supports. Normally when a person interacts with another person there are
many non-verbal clues which guide the dialogue and all the utterances have
a contextual relationship, which manage the dialogue as its mixed by the
two speakers. Human Computer Interaction has a wide impact on the design
of the applications and has become one of the emerging interest area of
the researchers. All of us are witness to an explosive electronic revolution
where lots of gadgets and gizmo's have surrounded us, advanced not only
in power, design, applications but the ease of access or what we call user
friendly interfaces are designed that we can easily use and control all the
functionality of the devices. Since speech is one of the most intuitive form
of interaction that humans use. It provides potential bene ts such as handfree
access to machines, ergonomics and greater e ciency of interaction.
Yet, speech-based interfaces design has been an expert job for a long time.
Lot of research has been done in building real spoken Dialogue Systems
which can interact with humans using voice interactions and help in performing
various tasks as are done by humans. Last two decades have seen
utmost advanced research in the automatic speech recognition, dialogue
management, text to speech synthesis and Natural Language Processing
for various applications which have shown positive results. This dissertation
proposes to apply machine learning (ML) techniques to the problem
of optimizing the dialogue management strategy selection in the Spoken
Dialogue system prototype design. Although automatic speech recognition
and system initiated dialogues where the system expects an answer in the
form of `yes' or `no' have already been applied to Spoken Dialogue Systems(
SDS), no real attempt to use those techniques in order to design a
new system from scratch has been made. In this dissertation, we propose
some novel ideas in order to achieve the goal of easing the design of Spoken
Dialogue Systems and allow novices to have access to voice technologies.
A framework for simulating and evaluating dialogues and learning optimal
dialogue strategies in a controlled Natural Language is proposed. The simulation
process is based on a probabilistic description of a dialogue and
on the stochastic modelling of both arti cial NLP modules composing a
SDS and the user. This probabilistic model is based on a set of parameters
that can be tuned from the prior knowledge from the discourse or learned
from data. The evaluation is part of the simulation process and is based
on objective measures provided by each module. Finally, the simulation
environment is connected to a learning agent using the supplied evaluation
metrics as an objective function in order to generate an optimal behaviour
for the SDS
- …