229 research outputs found
On discriminative semi-supervised incremental learning with a multi-view perspective for image concept modeling
This dissertation presents the development of a semi-supervised incremental learning framework with a multi-view perspective for image concept modeling. For reliable image concept characterization, having a large number of labeled images is crucial. However, the size of the training set is often limited due to the cost required for generating concept labels associated with objects in a large quantity of images. To address this issue, in this research, we propose to incrementally incorporate unlabeled samples into a learning process to enhance concept models originally learned with a small number of labeled samples. To tackle the sub-optimality problem of conventional techniques, the proposed incremental learning framework selects unlabeled samples based on an expected error reduction function that measures contributions of the unlabeled samples based on their ability to increase the modeling accuracy. To improve the convergence property of the proposed incremental learning framework, we further propose a multi-view learning approach that makes use of multiple features such as color, texture, etc., of images when including unlabeled samples. For robustness to mismatches between training and testing conditions, a discriminative learning algorithm, namely a kernelized maximal- figure-of-merit (kMFoM) learning approach is also developed. Combining individual techniques, we conduct a set of experiments on various image concept modeling problems, such as handwritten digit recognition, object recognition, and image spam detection to highlight the effectiveness of the proposed framework.PhDCommittee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Lee, Hsien-Hsin; Committee Member: McClellan, James; Committee Member: Yuan, Min
Master of Science
thesisPresently, speech recognition is gaining worldwide popularity in applications like Google Voice, speech-to-text reporter (speech-to-text transcription, video captioning, real-time transcriptions), hands-free computing, and video games. Research has been done for several years and many speech recognizers have been built. However, most of the speech recognizers fail to recognize the speech accurately. Consider the well-known application of Google Voice, which aids in users search of the web using voice. Though Google Voice does a good job in transcribing the spoken words, it does not accurately recognize the words spoken with different accents. With the fact that several accents are evolving around the world, it is essential to train the speech recognizer to recognize accented speech. Accent classification is defined as the problem of classifying the accents in a given language. This thesis explores various methods to identify the accents. We introduce a new concept of clustering windows of a speech signal and learn a distance metric using specific distance measure over phonetic strings to classify the accents. A language structure is incorporated to learn this distance metric. We also show how kernel approximation algorithms help in learning a distance metric
The DD-classifier in the functional setting
The Maximum Depth was the first attempt to use data depths instead of
multivariate raw data to construct a classification rule. Recently, the
DD-classifier has solved several serious limitations of the Maximum Depth
classifier but some issues still remain. This paper is devoted to extending the
DD-classifier in the following ways: first, to surpass the limitation of the
DD-classifier when more than two groups are involved. Second to apply regular
classification methods (like NN, linear or quadratic classifiers, recursive
partitioning,...) to DD-plots to obtain useful insights through the diagnostics
of these methods. And third, to integrate different sources of information
(data depths or multivariate functional data) in a unified way in the
classification procedure. Besides, as the DD-classifier trick is especially
useful in the functional framework, an enhanced revision of several functional
data depths is done in the paper. A simulation study and applications to some
classical real datasets are also provided showing the power of the new
proposal.Comment: 29 pages, 6 figures, 6 tables, Supplemental R Code and Dat
Scalable learning for geostatistics and speaker recognition
With improved data acquisition methods, the amount of data that is being collected has increased severalfold. One of the objectives in data collection is to learn useful underlying patterns. In order to work with data at this scale, the methods not only need to be effective with the underlying data, but also have to be scalable to handle larger data collections. This thesis focuses on developing scalable and effective methods targeted towards different domains, geostatistics and speaker recognition in particular.
Initially we focus on kernel based learning methods and develop a GPU based parallel framework for this class of problems. An improved numerical algorithm that utilizes the GPU parallelization to further enhance the computational performance of kernel regression is proposed. These methods are then demonstrated on problems arising in geostatistics and speaker recognition.
In geostatistics, data is often collected at scattered locations and factors like instrument malfunctioning lead to missing observations. Applications often require the ability interpolate this scattered spatiotemporal data on to a regular grid continuously over time. This problem can be formulated as a regression problem, and one of the most popular geostatistical interpolation techniques, kriging is analogous to a standard kernel method: Gaussian process regression. Kriging is computationally expensive and needs major modifications and accelerations in order to be used practically. The GPU framework developed for kernel methods is extended to kriging and further the GPU's texture memory is better utilized for enhanced computational performance.
Speaker recognition deals with the task of verifying a person's identity based on samples of his/her speech - "utterances". This thesis focuses on text-independent framework and three new recognition frameworks were developed for this problem. We proposed a kernelized Renyi distance based similarity scoring for speaker recognition. While its performance is promising, it does not generalize well for limited training data and therefore does not compare well to state-of-the-art recognition systems. These systems compensate for the variability in the speech data due to the message, channel variability, noise and reverberation. State-of-the-art systems model each speaker as a mixture of Gaussians (GMM) and compensate for the variability (termed "nuisance"). We propose a novel discriminative framework using a latent variable technique, partial least squares (PLS), for improved recognition. The kernelized version of this algorithm is used to achieve a state of the art speaker ID system, that shows results competitive with the best systems reported on in NIST's 2010 Speaker Recognition Evaluation
- …