258,599 research outputs found
Feature Partitioning for the Co-Traning Setting
Supervised learning algorithms rely on availability of labeled data. Labeled data is either scarce or involves substantial human effort in the labeling process. These two factors, along with the abundance of unlabeled data, have spurred research initiatives that exploit unlabeled data to boost supervised learning. This genre of learning algorithms that utilize unlabeled data alongside a small set of labeled data are known as semi-supervised learning algorithms. Data characteristics, such as the presence of a generative model, provide the foundation for applying these learning algorithms. Co-training is one such al gorithm that leverages existence of two redundant views for a data instance. Based on these two views, the co-training algorithm trains two classifiers using the labeled data. The small set of labeled data results in a pair of weak classi fiers. With the help of the unlabeled data the two classifiers alternately boost each other to achieve a high-accuracy classifier. The conditions imposed by the co-training algorithm regarding the data characteristics restrict its application to data that possesses a natural split of the feature set. In this thesis we study the co-training setting and propose to overcome the above mentioned constraint by manufacturing feature splits. We pose and investigate the following questions: 1 . Can a feature split be constructed for a dataset such that the co-training algorithm can be applied to it? 2. If a feature split can be engineered, would splitting the features into more than two partitions give a better classifier? In essence, does moving from co-training (2 classifiers) to k-training (k-classifiers) help? 3. Is there an optimal number of views for a dataset such that k-training leads to an optimal classifier? The task of obtaining feature splits is approached by modeling the problem as a graph partitioning problem. Experiments are conducted on a breadth of text datasets. Results of k-training using constructed feature sets are compared with that of the expectation-maximization algorithm, which has been successful in a semi-supervised setting
CD-CNN: A Partially Supervised Cross-Domain Deep Learning Model for Urban Resident Recognition
Driven by the wave of urbanization in recent decades, the research topic
about migrant behavior analysis draws great attention from both academia and
the government. Nevertheless, subject to the cost of data collection and the
lack of modeling methods, most of existing studies use only questionnaire
surveys with sparse samples and non-individual level statistical data to
achieve coarse-grained studies of migrant behaviors. In this paper, a partially
supervised cross-domain deep learning model named CD-CNN is proposed for
migrant/native recognition using mobile phone signaling data as behavioral
features and questionnaire survey data as incomplete labels. Specifically,
CD-CNN features in decomposing the mobile data into location domain and
communication domain, and adopts a joint learning framework that combines two
convolutional neural networks with a feature balancing scheme. Moreover, CD-CNN
employs a three-step algorithm for training, in which the co-training step is
of great value to partially supervised cross-domain learning. Comparative
experiments on the city Wuxi demonstrate the high predictive power of CD-CNN.
Two interesting applications further highlight the ability of CD-CNN for
in-depth migrant behavioral analysis.Comment: 8 pages, 5 figures, conferenc
Cooperative Learning and its Application to Emotion Recognition from Speech
In this paper, we propose a novel method for highly efficient exploitation of unlabeled data-Cooperative Learning. Our approach consists of combining Active Learning and Semi-Supervised Learning techniques, with the aim of reducing the costly effects of human annotation. The core underlying idea of Cooperative Learning is to share the labeling work between human and machine efficiently in such a way that instances predicted with insufficient confidence value are subject to human labeling, and those with high confidence values are machine labeled. We conducted various test runs on two emotion recognition tasks with a variable number of initial supervised training instances and two different feature sets. The results show that Cooperative Learning consistently outperforms individual Active and Semi-Supervised Learning techniques in all test cases. In particular, we show that our method based on the combination of Active Learning and Co-Training leads to the same performance of a model trained on the whole training set, but using 75% fewer labeled instances. Therefore, our method efficiently and robustly reduces the need for human annotations
- …