397 research outputs found

    Improving Deep Representation Learning with Complex and Multimodal Data.

    Full text link
    Representation learning has emerged as a way to learn meaningful representation from data and made a breakthrough in many applications including visual object recognition, speech recognition, and text understanding. However, learning representation from complex high-dimensional sensory data is challenging since there exist many irrelevant factors of variation (e.g., data transformation, random noise). On the other hand, to build an end-to-end prediction system for structured output variables, one needs to incorporate probabilistic inference to properly model a mapping from single input to possible configurations of output variables. This thesis addresses limitations of current representation learning in two parts. The first part discusses efficient learning algorithms of invariant representation based on restricted Boltzmann machines (RBMs). Pointing out the difficulty of learning, we develop an efficient initialization method for sparse and convolutional RBMs. On top of that, we develop variants of RBM that learn representations invariant to data transformations such as translation, rotation, or scale variation by pooling the filter responses of input data after a transformation, or to irrelevant patterns such as random or structured noise, by jointly performing feature selection and feature learning. We demonstrate improved performance on visual object recognition and weakly supervised foreground object segmentation. The second part discusses conditional graphical models and learning frameworks for structured output variables using deep generative models as prior. For example, we combine the best properties of the CRF and the RBM to enforce both local and global (e.g., object shape) consistencies for visual object segmentation. Furthermore, we develop a deep conditional generative model of structured output variables, which is an end-to-end system trainable by backpropagation. We demonstrate the importance of global prior and probabilistic inference for visual object segmentation. Second, we develop a novel multimodal learning framework by casting the problem into structured output representation learning problems, where the output is one data modality to be predicted from the other modalities, and vice versa. We explain as to how our method could be more effective than maximum likelihood learning and demonstrate the state-of-the-art performance on visual-text and visual-only recognition tasks.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113549/1/kihyuks_1.pd

    On discriminative semi-supervised incremental learning with a multi-view perspective for image concept modeling

    Get PDF
    This dissertation presents the development of a semi-supervised incremental learning framework with a multi-view perspective for image concept modeling. For reliable image concept characterization, having a large number of labeled images is crucial. However, the size of the training set is often limited due to the cost required for generating concept labels associated with objects in a large quantity of images. To address this issue, in this research, we propose to incrementally incorporate unlabeled samples into a learning process to enhance concept models originally learned with a small number of labeled samples. To tackle the sub-optimality problem of conventional techniques, the proposed incremental learning framework selects unlabeled samples based on an expected error reduction function that measures contributions of the unlabeled samples based on their ability to increase the modeling accuracy. To improve the convergence property of the proposed incremental learning framework, we further propose a multi-view learning approach that makes use of multiple features such as color, texture, etc., of images when including unlabeled samples. For robustness to mismatches between training and testing conditions, a discriminative learning algorithm, namely a kernelized maximal- figure-of-merit (kMFoM) learning approach is also developed. Combining individual techniques, we conduct a set of experiments on various image concept modeling problems, such as handwritten digit recognition, object recognition, and image spam detection to highlight the effectiveness of the proposed framework.PhDCommittee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Lee, Hsien-Hsin; Committee Member: McClellan, James; Committee Member: Yuan, Min

    Learning with Single View Co-training and Marginalized Dropout

    Get PDF
    The generalization properties of most existing machine learning techniques are predicated on the assumptions that 1) a sufficiently large quantity of training data is available; 2) the training and testing data come from some common distribution. Although these assumptions are often met in practice, there are also many scenarios in which training data from the relevant distribution is insufficient. We focus on making use of additional data, which is readily available or can be obtained easily but comes from a different distribution than the testing data, to aid learning. We present five learning scenarios, depending on how the distribution we used to sample the additional training data differs from the testing distribution: 1) learning with weak supervision; 2) domain adaptation; 3) learning from multiple domains; 4) learning from corrupted data; 5) learning with partial supervision. We introduce two strategies and manifest them in five ways to cope with the difference between the training and testing distribution. The first strategy, which gives rise to Pseudo Multi-view Co-training: PMC) and Co-training for Domain Adaptation: CODA), is inspired by the co-training algorithm for multi-view data. PMC generalizes co-training to the more common single view data and allows us to learn from weakly labeled data retrieved free from the web. CODA integrates PMC with an another feature selection component to address the feature incompatibility between domains for domain adaptation. PMC and CODA are evaluated on a variety of real datasets, and both yield record performance. The second strategy marginalized dropout leads to marginalized Stacked Denoising Autoencoders: mSDA), Marginalized Corrupted Features: MCF) and FastTag: FastTag). mSDA diminishes the difference between distributions associated with different domains by learning a new representation through marginalized corruption and reconstruciton. MCF learns from a known distribution which is created by corrupting a small set of training data, and improves robustness of learned classifiers by training on ``infinitely\u27\u27 many data sampled from the distribution. FastTag applies marginalized dropout to the output of partially labeled data to recover missing labels for multi-label tasks. These three algorithms not only achieve the state-of-art performance in various tasks, but also deliver orders of magnitude speed up at training and testing comparing to competing algorithms

    Contribution to supervised representation learning: algorithms and applications.

    Get PDF
    278 p.In this thesis, we focus on supervised learning methods for pattern categorization. In this context, itremains a major challenge to establish efficient relationships between the discriminant properties of theextracted features and the inter-class sparsity structure.Our first attempt to address this problem was to develop a method called "Robust Discriminant Analysiswith Feature Selection and Inter-class Sparsity" (RDA_FSIS). This method performs feature selectionand extraction simultaneously. The targeted projection transformation focuses on the most discriminativeoriginal features while guaranteeing that the extracted (or transformed) features belonging to the sameclass share a common sparse structure, which contributes to small intra-class distances.In a further study on this approach, some improvements have been introduced in terms of theoptimization criterion and the applied optimization process. In fact, we proposed an improved version ofthe original RDA_FSIS called "Enhanced Discriminant Analysis with Class Sparsity using GradientMethod" (EDA_CS). The basic improvement is twofold: on the first hand, in the alternatingoptimization, we update the linear transformation and tune it with the gradient descent method, resultingin a more efficient and less complex solution than the closed form adopted in RDA_FSIS.On the other hand, the method could be used as a fine-tuning technique for many feature extractionmethods. The main feature of this approach lies in the fact that it is a gradient descent based refinementapplied to a closed form solution. This makes it suitable for combining several extraction methods andcan thus improve the performance of the classification process.In accordance with the above methods, we proposed a hybrid linear feature extraction scheme called"feature extraction using gradient descent with hybrid initialization" (FE_GD_HI). This method, basedon a unified criterion, was able to take advantage of several powerful linear discriminant methods. Thelinear transformation is computed using a descent gradient method. The strength of this approach is thatit is generic in the sense that it allows fine tuning of the hybrid solution provided by different methods.Finally, we proposed a new efficient ensemble learning approach that aims to estimate an improved datarepresentation. The proposed method is called "ICS Based Ensemble Learning for Image Classification"(EM_ICS). Instead of using multiple classifiers on the transformed features, we aim to estimate multipleextracted feature subsets. These were obtained by multiple learned linear embeddings. Multiple featuresubsets were used to estimate the transformations, which were ranked using multiple feature selectiontechniques. The derived extracted feature subsets were concatenated into a single data representationvector with strong discriminative properties.Experiments conducted on various benchmark datasets ranging from face images, handwritten digitimages, object images to text datasets showed promising results that outperformed the existing state-ofthe-art and competing methods

    Kernel Methods for Machine Learning with Life Science Applications

    Get PDF

    Apprentissage de dictionnaires pour la reconnaissance de motifs en imagerie médicale

    Get PDF
    Most natural signals can be approximated by a linear combination of a few atoms in a dictionary. Such sparse representations of signals and dictionary learning (DL) methods have received a special attention over the past few years. While standard DL approaches are effective in applications such as image denoising or compression, several discriminative DL methods have been proposed to achieve better image classification. In this thesis, we have shown that the dictionary size for each class is an important factor in the pattern recognition applications where there exist variability difference between classes, in the case of both the standard and discriminative DL methods. We validated the proposition of using different dictionary size based on complexity of the class data in a computer vision application such as lips detection in face images, followed by more complex medical imaging application such as classification of multiple sclerosis (MS) lesions using MR images. The class specific dictionaries are learned for the lesions and individual healthy brain tissues, and the size of the dictionary for each class is adapted according to the complexity of the underlying data. The algorithm is validated using 52 multi-sequence MR images acquired from 13 MS patients.La plupart des signaux naturels peuvent être représentés par une combinaison linéaire de quelques atomes dans un dictionnaire. Ces représentations parcimonieuses et les méthodes d'apprentissage de dictionnaires (AD) ont suscité un vif intérêt au cours des dernières années. Bien que les méthodes d'AD classiques soient efficaces dans des applications telles que le débruitage d'images, plusieurs méthodes d'AD discriminatifs ont été proposées pour obtenir des dictionnaires mieux adaptés à la classification. Dans ce travail, nous démontrons que la taille des dictionnaires de chaque classe est un facteur crucial dans les applications de reconnaissance des formes lorsqu'il existe des différences de variabilité entre les classes, à la fois dans le cas des dictionnaires classiques et des dictionnaires discriminatifs. Nous avons validé la proposition d'utiliser différentes tailles de dictionnaires, dans une application de vision par ordinateur, la détection des lèvres dans des images de visages, ainsi que par une application médicale plus complexe, la classification des lésions de scléroses en plaques (SEP) dans des images IRM multimodales. Les dictionnaires spécifiques à chaque classe sont appris pour les lésions et les tissus cérébraux sains. La taille du dictionnaire pour chaque classe est adaptée en fonction de la complexité des données. L'algorithme est validé à l'aide de 52 séquences IRM multimodales de 13 patients atteints de SEP
    • …
    corecore