258,599 research outputs found

    Feature Partitioning for the Co-Traning Setting

    Get PDF
    Supervised learning algorithms rely on availability of labeled data. Labeled data is either scarce or involves substantial human effort in the labeling process. These two factors, along with the abundance of unlabeled data, have spurred research initiatives that exploit unlabeled data to boost supervised learning. This genre of learning algorithms that utilize unlabeled data alongside a small set of labeled data are known as semi-supervised learning algorithms. Data characteristics, such as the presence of a generative model, provide the foundation for applying these learning algorithms. Co-training is one such al gorithm that leverages existence of two redundant views for a data instance. Based on these two views, the co-training algorithm trains two classifiers using the labeled data. The small set of labeled data results in a pair of weak classi fiers. With the help of the unlabeled data the two classifiers alternately boost each other to achieve a high-accuracy classifier. The conditions imposed by the co-training algorithm regarding the data characteristics restrict its application to data that possesses a natural split of the feature set. In this thesis we study the co-training setting and propose to overcome the above mentioned constraint by manufacturing feature splits. We pose and investigate the following questions: 1 . Can a feature split be constructed for a dataset such that the co-training algorithm can be applied to it? 2. If a feature split can be engineered, would splitting the features into more than two partitions give a better classifier? In essence, does moving from co-training (2 classifiers) to k-training (k-classifiers) help? 3. Is there an optimal number of views for a dataset such that k-training leads to an optimal classifier? The task of obtaining feature splits is approached by modeling the problem as a graph partitioning problem. Experiments are conducted on a breadth of text datasets. Results of k-training using constructed feature sets are compared with that of the expectation-maximization algorithm, which has been successful in a semi-supervised setting

    CD-CNN: A Partially Supervised Cross-Domain Deep Learning Model for Urban Resident Recognition

    Full text link
    Driven by the wave of urbanization in recent decades, the research topic about migrant behavior analysis draws great attention from both academia and the government. Nevertheless, subject to the cost of data collection and the lack of modeling methods, most of existing studies use only questionnaire surveys with sparse samples and non-individual level statistical data to achieve coarse-grained studies of migrant behaviors. In this paper, a partially supervised cross-domain deep learning model named CD-CNN is proposed for migrant/native recognition using mobile phone signaling data as behavioral features and questionnaire survey data as incomplete labels. Specifically, CD-CNN features in decomposing the mobile data into location domain and communication domain, and adopts a joint learning framework that combines two convolutional neural networks with a feature balancing scheme. Moreover, CD-CNN employs a three-step algorithm for training, in which the co-training step is of great value to partially supervised cross-domain learning. Comparative experiments on the city Wuxi demonstrate the high predictive power of CD-CNN. Two interesting applications further highlight the ability of CD-CNN for in-depth migrant behavioral analysis.Comment: 8 pages, 5 figures, conferenc

    Cooperative Learning and its Application to Emotion Recognition from Speech

    Get PDF
    In this paper, we propose a novel method for highly efficient exploitation of unlabeled data-Cooperative Learning. Our approach consists of combining Active Learning and Semi-Supervised Learning techniques, with the aim of reducing the costly effects of human annotation. The core underlying idea of Cooperative Learning is to share the labeling work between human and machine efficiently in such a way that instances predicted with insufficient confidence value are subject to human labeling, and those with high confidence values are machine labeled. We conducted various test runs on two emotion recognition tasks with a variable number of initial supervised training instances and two different feature sets. The results show that Cooperative Learning consistently outperforms individual Active and Semi-Supervised Learning techniques in all test cases. In particular, we show that our method based on the combination of Active Learning and Co-Training leads to the same performance of a model trained on the whole training set, but using 75% fewer labeled instances. Therefore, our method efficiently and robustly reduces the need for human annotations
    • …
    corecore