32 research outputs found

    Multi-view Regularized Gaussian Processes

    Full text link
    Gaussian processes (GPs) have been proven to be powerful tools in various areas of machine learning. However, there are very few applications of GPs in the scenario of multi-view learning. In this paper, we present a new GP model for multi-view learning. Unlike existing methods, it combines multiple views by regularizing marginal likelihood with the consistency among the posterior distributions of latent functions from different views. Moreover, we give a general point selection scheme for multi-view learning and improve the proposed model by this criterion. Experimental results on multiple real world data sets have verified the effectiveness of the proposed model and witnessed the performance improvement through employing this novel point selection scheme

    Semi-supervised Multi-sensor Classification via Consensus-based Multi-View Maximum Entropy Discrimination

    Full text link
    In this paper, we consider multi-sensor classification when there is a large number of unlabeled samples. The problem is formulated under the multi-view learning framework and a Consensus-based Multi-View Maximum Entropy Discrimination (CMV-MED) algorithm is proposed. By iteratively maximizing the stochastic agreement between multiple classifiers on the unlabeled dataset, the algorithm simultaneously learns multiple high accuracy classifiers. We demonstrate that our proposed method can yield improved performance over previous multi-view learning approaches by comparing performance on three real multi-sensor data sets.Comment: 5 pages, 4 figures, Accepted in 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 15

    Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data

    Full text link
    There are threefold challenges in emotion recognition. First, it is difficult to recognize human's emotional states only considering a single modality. Second, it is expensive to manually annotate the emotional data. Third, emotional data often suffers from missing modalities due to unforeseeable sensor malfunction or configuration issues. In this paper, we address all these problems under a novel multi-view deep generative framework. Specifically, we propose to model the statistical relationships of multi-modality emotional data using multiple modality-specific generative networks with a shared latent space. By imposing a Gaussian mixture assumption on the posterior approximation of the shared latent variables, our framework can learn the joint deep representation from multiple modalities and evaluate the importance of each modality simultaneously. To solve the labeled-data-scarcity problem, we extend our multi-view model to semi-supervised learning scenario by casting the semi-supervised classification problem as a specialized missing data imputation task. To address the missing-modality problem, we further extend our semi-supervised multi-view model to deal with incomplete data, where a missing view is treated as a latent variable and integrated out during inference. This way, the proposed overall framework can utilize all available (both labeled and unlabeled, as well as both complete and incomplete) data to improve its generalization ability. The experiments conducted on two real multi-modal emotion datasets demonstrated the superiority of our framework.Comment: arXiv admin note: text overlap with arXiv:1704.07548, 2018 ACM Multimedia Conference (MM'18

    Probabilistic models for multi-view semi-supervised learning and coding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 146-160).This thesis investigates the problem of classification from multiple noisy sensors or modalities. Examples include speech and gesture interfaces and multi-camera distributed sensor networks. Reliable recognition in such settings hinges upon the ability to learn accurate classification models in the face of limited supervision and to cope with the relatively large amount of potentially redundant information transmitted by each sensor or modality (i.e., view). We investigate and develop novel multi view learning algorithms capable of learning from semi-supervised noisy sensor data, for automatically adapting to new users and working conditions, and for performing distributed feature selection on bandwidth limited sensor networks. We propose probabilistic models built upon multi-view Gaussian Processes (GPs) for solving this class of problems, and demonstrate our approaches for solving audio-visual speech and gesture, and multi-view object classification problems. Multi-modal tasks are good candidates for multi-view learning, since each modality provides a potentially redundant view to the learning algorithm. On audio-visual speech unit classification, and user agreement recognition using spoken utterances and head gestures, we demonstrate that multi-modal co-training can be used to learn from only a few labeled examples in one or both of the audio-visual modalities. We also propose a co-adaptation algorithm, which adapts existing audio-visual classifiers to a particular user or noise condition by leveraging the redundancy in the unlabeled data. Existing methods typically assume constant per-channel noise models.(cont.) In contrast we develop co-training algorithms that are able to learn from noisy sensor data corrupted by complex per-sample noise processes, e.g., occlusion common to multi sensor classification problems. We propose a probabilistic heteroscedastic approach to co-training that simultaneously discovers the amount of noise on a per-sample basis, while solving the classification task. This results in accurate performance in the presence of occlusion or other complex noise processes. We also investigate an extension of this idea for supervised multi-view learning where we develop a Bayesian multiple kernel learning algorithm that can learn a local weighting over each view of the input space. We additionally consider the problem of distributed object recognition or indexing from multiple cameras, where the computational power available at each camera sensor is limited and communication between cameras is prohibitively expensive. In this scenario, it is desirable to avoid sending redundant visual features from multiple views. Traditional supervised feature selection approaches are inapplicable as the class label is unknown at each camera. In this thesis, we propose an unsupervised multi-view feature selection algorithm based on a distributed coding approach. With our method, a Gaussian Process model of the joint view statistics is used at the receiver to obtain a joint encoding of the views without directly sharing information across encoders. We demonstrate our approach on recognition and indexing tasks with multi-view image databases and show that our method compares favorably to an independent encoding of the features from each camera.by C. Mario Christoudias.Ph.D

    Learning multiple views with orthogonal denoising autoencoders

    Get PDF
    Multi-view learning techniques are necessary when data is described by multiple distinct feature sets because single-view learning algorithms tend to overt on these high-dimensional data. Prior successful approaches followed either consensus or complementary principles. Recent work has focused on learning both the shared and private latent spaces of views in order to take advantage of both principles. However, these methods can not ensure that the latent spaces are strictly independent through encouraging the orthogonality in their objective functions. Also little work has explored representation learning techniques for multiview learning. In this paper, we use the denoising autoencoder to learn shared and private latent spaces, with orthogonal constraints | disconnecting every private latent space from the remaining views. Instead of computationally expensive optimization, we adapt the backpropagation algorithm to train our model
    corecore