66 research outputs found
Isometry and convexity in dimensionality reduction
The size of data generated every year follows an exponential growth. The number of data points as well as the dimensions have increased dramatically the past 15 years. The gap between the demand from the industry in data processing and the solutions provided by the machine learning community is increasing. Despite the growth in memory and computational power, advanced statistical processing on the order of gigabytes is beyond any possibility. Most sophisticated Machine Learning algorithms require at least quadratic complexity. With the current computer model architecture, algorithms with higher complexity than linear O(N) or O(N logN) are not considered practical. Dimensionality reduction is a challenging problem in machine learning. Often data represented as multidimensional points happen to have high dimensionality. It turns out that the information they carry can be expressed with much less dimensions. Moreover the reduced dimensions of the data can have better interpretability than the original ones. There is a great variety of dimensionality reduction algorithms under the theory of Manifold Learning. Most of the methods such as Isomap, Local Linear Embedding, Local Tangent Space Alignment, Diffusion Maps etc. have been extensively studied under the framework of Kernel Principal Component Analysis (KPCA). In this dissertation we study two current state of the art dimensionality reduction methods, Maximum Variance Unfolding (MVU) and Non-Negative Matrix Factorization (NMF). These two dimensionality reduction methods do not fit under the umbrella of Kernel PCA. MVU is cast as a Semidefinite Program, a modern convex nonlinear optimization algorithm, that offers more flexibility and power compared to iv KPCA. Although MVU and NMF seem to be two disconnected problems, we show that there is a connection between them. Both are special cases of a general nonlinear factorization algorithm that we developed. Two aspects of the algorithms are of particular interest: computational complexity and interpretability. In other words computational complexity answers the question of how fast we can find the best solution of MVU/NMF for large data volumes. Since we are dealing with optimization programs, we need to find the global optimum. Global optimum is strongly connected with the convexity of the problem. Interpretability is strongly connected with local isometry1 that gives meaning in relationships between data points. Another aspect of interpretability is association of data with labeled information. The contributions of this thesis are the following:
1. MVU is modified so that it can scale more efficient. Results are shown on 1 million speech datasets. Limitations of the method are highlighted.
2. An algorithm for fast computations for the furthest neighbors is presented for the first time in the literature.
3. Construction of optimal kernels for Kernel Density Estimation with modern convex programming is presented. For the first time we show that the Leave One Cross Validation (LOOCV) function is quasi-concave.
4. For the first time NMF is formulated as a convex optimization problem
5. An algorithm for the problem of Completely Positive Matrix Factorization is presented.
6. A hybrid algorithm of MVU and NMF the isoNMF is presented combining advantages of both methods.
7. The Isometric Separation Maps (ISM) a variation of MVU that contains classification information is presented.
8. Large scale nonlinear dimensional analysis on the TIMIT speech database is performed.
9. A general nonlinear factorization algorithm is presented based on sequential convex programming. Despite the efforts to scale the proposed methods up to 1 million data points in reasonable time, the gap between the industrial demand and the current state of the art is still orders of magnitude wide.Ph.D.Committee Chair: David Anderson; Committee Co-Chair: Alexander Gray; Committee Member: Anthony Yezzi; Committee Member: Hongyuan Zha; Committee Member: Justin Romberg; Committee Member: Ronald Schafe
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization
Automatic speech recognition (ASR) has recently become an important challenge
when using deep learning (DL). It requires large-scale training datasets and
high computational and storage resources. Moreover, DL techniques and machine
learning (ML) approaches in general, hypothesize that training and testing data
come from the same domain, with the same input feature space and data
distribution characteristics. This assumption, however, is not applicable in
some real-world artificial intelligence (AI) applications. Moreover, there are
situations where gathering real data is challenging, expensive, or rarely
occurring, which can not meet the data requirements of DL models. deep transfer
learning (DTL) has been introduced to overcome these issues, which helps
develop high-performing models using real datasets that are small or slightly
different but related to the training data. This paper presents a comprehensive
survey of DTL-based ASR frameworks to shed light on the latest developments and
helps academics and professionals understand current challenges. Specifically,
after presenting the DTL background, a well-designed taxonomy is adopted to
inform the state-of-the-art. A critical analysis is then conducted to identify
the limitations and advantages of each framework. Moving on, a comparative
study is introduced to highlight the current challenges before deriving
opportunities for future research
Recommended from our members
Synergizing human-machine intelligence: Visualizing, labeling, and mining the electronic health record
We live in a world where data surround us in every aspect of our lives. The key challenge for humans and machines is how we can make better use of such data. Imagine what would happen if you were to have intelligent machines that could give you insight into the data. Insight that will enable you to better 1) reason about, 2) learn, and 3) understand the underlying phenomena that produced the data. The possibilities of combined human-machine intelligence are endless and will impact our lives in ways we can not even imagine today.
Synergistic human-machine intelligence aims to facilitate the analytical reasoning and inference process of humans by creating machines that maximize a human's ability to 1) reason about, 2) learn, and 3) understand large, complex, and heterogeneous data. Combined human-machine intelligence is a powerful symbiosis of mutual benefit, in which we depend on the computational capabilities of the machine for the tasks we are not good at, and the machine requires human intervention for the tasks it performs poorly on.
This relationship provides a compelling alternative to either approach in isolation for solving today's and tomorrow's arising data challenges. In his regard, this dissertation proposes a diverse analytical framework that leverages synergistic human-machine intelligence to maximize a human's ability to better 1) reason about, 2) learn, and 3) understand different biomedical imaging and healthcare data present in the patient's electronic health record (EHR). Correspondingly, we approach the data analyses problem from the 1) visualization, 2) labeling, and 3) mining perspective and demonstrate the efficacy of our analytics on specific application scenarios and various data domains.
In the first part of this dissertation we explore the question how we can build intelligent imaging analytics that are commensurate with human capabilities and constraints, specifically for optimizing data visualization and automated labeling workflows. Our journey starts with heuristic rule-based analytical models that are derived from task-specific human knowledge. From this experience, we move on to data-driven analytics, where we adapt and combine the intelligence of the model based on prior information provided by the human and synthetic knowledge learned from partial data observations. Within this realm, we propose a novel Bayesian transductive Markov random field model that requires minimal human intervention and is able to cope with scarce label information to learn and infer object shapes in complex spatial, multimodal, spatio-temporal, and longitudinal data. We then study the question how machines can learn discriminative object representations from dense human provided label information by investigating learning and inference mechanisms that make use of deep learning architectures. The developed analytics can aid visualization and labeling tasks, which enables the interpretation and quantification of clinically relevant image information.
The second part explores the question how we can build data-driven analytics for exploratory analysis in longitudinal event data that are commensurate with human capabilities and constraints. We propose human-intuitive analytics that enable the representation and discovery of interpretable event patterns to ease knowledge absorption and comprehension of the employed analytics model and the underlying data. We propose a novel doubly-constrained convolutional sparse-coding framework that learns interpretable and shift-invariant latent temporal event patterns. We apply the model to mine complex event data in EHRs. By mapping the event space to heterogeneous patient encounters in the EHR we explore the linkage between healthcare resource utilization (HRU) in relation to disease severity. This linkage may help to better understand how disease specific co-morbidities and their clinical attributes incur different HRU patterns. Such insight helps to characterize the patient's care history, which then enables the comparison against clinical practice guidelines, the discovery of prevailing practices based on common HRU group patterns, and the identification of outliers that might indicate poor patient management
Single channel overlapped-speech detection and separation of spontaneous conversations
PhD ThesisIn the thesis, spontaneous conversation containing both speech mixture and speech dialogue is considered. The speech mixture refers to speakers speaking simultaneously (i.e. the overlapped-speech). The speech dialogue refers to only one speaker is actively speaking and the other is silent. That Input conversation is firstly processed by the overlapped-speech detection. Two output signals are then segregated into dialogue and mixture formats. The dialogue is processed by speaker diarization. Its outputs are the individual speech of each speaker. The mixture is processed by speech separation. Its outputs are independent separated speech signals of the speaker. When the separation input contains only the mixture, blind speech separation approach is used. When the separation is assisted by the outputs of the speaker diarization, it is informed speech separation. The research presents novel: overlapped-speech detection algorithm, and two speech separation algorithms.
The proposed overlapped-speech detection is an algorithm to estimate the switching instants of the input. Optimization loop is adapted to adopt the best capsulated audio features and to avoid the worst. The optimization depends on principles of the pattern recognition, and k-means clustering. For of 300 simulated conversations, averages of: False-Alarm Error is 1.9%, Missed-Speech Error is 0.4%, and Overlap-Speaker Error is 1%. Approximately, these errors equal the errors of best recent reliable speaker diarization corpuses.
The proposed blind speech separation algorithm consists of four sequential techniques: filter-bank analysis, Non-negative Matrix Factorization (NMF), speaker clustering and filter-bank synthesis. Instead of the required speaker segmentation, effective standard framing is contributed. Average obtained objective tests (SAR, SDR and SIR) of 51 simulated conversations are: 5.06dB, 4.87dB and 12.47dB respectively.
For the proposed informed speech separation algorithm, outputs of the speaker diarization are a generated-database. The database associated the speech separation by creating virtual targeted-speech and mixture. The contributed virtual signals are trained to facilitate the separation by homogenising them with the NMF-matrix elements of the real mixture. Contributed masking optimized the resulting speech. Average obtained SAR, SDR and SIR of 341 simulated conversations are 9.55dB, 1.12dB, and 2.97dB respectively.
Per the objective tests of the two speech separation algorithms, they are in the mid-range of the well-known NMF-based audio and speech separation methods
Text Classification: A Review, Empirical, and Experimental Evaluation
The explosive and widespread growth of data necessitates the use of text
classification to extract crucial information from vast amounts of data.
Consequently, there has been a surge of research in both classical and deep
learning text classification methods. Despite the numerous methods proposed in
the literature, there is still a pressing need for a comprehensive and
up-to-date survey. Existing survey papers categorize algorithms for text
classification into broad classes, which can lead to the misclassification of
unrelated algorithms and incorrect assessments of their qualities and behaviors
using the same metrics. To address these limitations, our paper introduces a
novel methodological taxonomy that classifies algorithms hierarchically into
fine-grained classes and specific techniques. The taxonomy includes methodology
categories, methodology techniques, and methodology sub-techniques. Our study
is the first survey to utilize this methodological taxonomy for classifying
algorithms for text classification. Furthermore, our study also conducts
empirical evaluation and experimental comparisons and rankings of different
algorithms that employ the same specific sub-technique, different
sub-techniques within the same technique, different techniques within the same
category, and categorie
Semi-supervised musical instrument recognition
The application areas of music information retrieval have been gaining popularity over the last decades. Musical instrument recognition is an example of a specific research topic in the field. In this thesis, semi-supervised learning techniques are explored in the context of musical instrument recognition. The conventional approaches employed for musical instrument recognition rely on annotated data, i.e., example recordings of the target instruments with associated information about the target labels in order to perform training. This implies a highly laborious and tedious work of manually annotating the collected training data. The semi-supervised methods enable incorporating additional unannotated data into training. Such data consists of merely the recordings of the instruments and is therefore significantly easier to acquire. Hence, these methods allow keeping the overall development cost at the same level while notably improving the performance of a system.
The implemented musical instrument recognition system utilises the mixture model semi-supervised learning scheme in the form of two EM-based algorithms. Furthermore, upgraded versions, namely, the additional labelled data weighting and class-wise retraining, for the improved performance and convergence criteria in terms of the particular classification scenario are proposed. The evaluation is performed on sets consisting of four and ten instruments and yields the overall average recognition accuracy rates of 95.3 and 68.4%, respectively. These correspond to the absolute gains of 6.1 and 9.7% compared to the initial, purely supervised cases. Additional experiments are conducted in terms of the effects of the proposed modifications, as well as the investigation of the optimal relative labelled dataset size. In general, the obtained performance improvement is quite noteworthy, and future research directions suggest to subsequently investigate the behaviour of the implemented algorithms along with the proposed and further extended approaches
- …