459 research outputs found
Unsupervised feature learning with discriminative encoder
In recent years, deep discriminative models have achieved extraordinary
performance on supervised learning tasks, significantly outperforming their
generative counterparts. However, their success relies on the presence of a
large amount of labeled data. How can one use the same discriminative models
for learning useful features in the absence of labels? We address this question
in this paper, by jointly modeling the distribution of data and latent features
in a manner that explicitly assigns zero probability to unobserved data. Rather
than maximizing the marginal probability of observed data, we maximize the
joint probability of the data and the latent features using a two step EM-like
procedure. To prevent the model from overfitting to our initial selection of
latent features, we use adversarial regularization. Depending on the task, we
allow the latent features to be one-hot or real-valued vectors and define a
suitable prior on the features. For instance, one-hot features correspond to
class labels and are directly used for the unsupervised and semi-supervised
classification task, whereas real-valued feature vectors are fed as input to
simple classifiers for auxiliary supervised discrimination tasks. The proposed
model, which we dub discriminative encoder (or DisCoder), is flexible in the
type of latent features that it can capture. The proposed model achieves
state-of-the-art performance on several challenging tasks.Comment: 10 pages, 4 figures, International Conference on Data Mining, 201
A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics
The combination of multiple classifiers using ensemble methods is
increasingly important for making progress in a variety of difficult prediction
problems. We present a comparative analysis of several ensemble methods through
two case studies in genomics, namely the prediction of genetic interactions and
protein functions, to demonstrate their efficacy on real-world datasets and
draw useful conclusions about their behavior. These methods include simple
aggregation, meta-learning, cluster-based meta-learning, and ensemble selection
using heterogeneous classifiers trained on resampled data to improve the
diversity of their predictions. We present a detailed analysis of these methods
across 4 genomics datasets and find the best of these methods offer
statistically significant improvements over the state of the art in their
respective domains. In addition, we establish a novel connection between
ensemble selection and meta-learning, demonstrating how both of these disparate
methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013
International Conference on Data Minin
To go deep or wide in learning?
To achieve acceptable performance for AI tasks, one can either use
sophisticated feature extraction methods as the first layer in a two-layered
supervised learning model, or learn the features directly using a deep
(multi-layered) model. While the first approach is very problem-specific, the
second approach has computational overheads in learning multiple layers and
fine-tuning of the model. In this paper, we propose an approach called wide
learning based on arc-cosine kernels, that learns a single layer of infinite
width. We propose exact and inexact learning strategies for wide learning and
show that wide learning with single layer outperforms single layer as well as
deep architectures of finite width for some benchmark datasets.Comment: 9 pages, 1 figure, Accepted for publication in Seventeenth
International Conference on Artificial Intelligence and Statistic
Learning to segment with image-level supervision
Deep convolutional networks have achieved the state-of-the-art for semantic
image segmentation tasks. However, training these networks requires access to
densely labeled images, which are known to be very expensive to obtain. On the
other hand, the web provides an almost unlimited source of images annotated at
the image level. How can one utilize this much larger weakly annotated set for
tasks that require dense labeling? Prior work often relied on localization
cues, such as saliency maps, objectness priors, bounding boxes etc., to address
this challenging problem. In this paper, we propose a model that generates
auxiliary labels for each image, while simultaneously forcing the output of the
CNN to satisfy the mean-field constraints imposed by a conditional random
field. We show that one can enforce the CRF constraints by forcing the
distribution at each pixel to be close to the distribution of its neighbors.
This is in stark contrast with methods that compute a recursive expansion of
the mean-field distribution using a recurrent architecture and train the
resultant distribution. Instead, the proposed model adds an extra loss term to
the output of the CNN, and hence, is faster than recursive implementations. We
achieve the state-of-the-art for weakly supervised semantic image segmentation
on VOC 2012 dataset, assuming no manually labeled pixel level information is
available. Furthermore, the incorporation of conditional random fields in CNN
incurs little extra time during training.Comment: Published in WACV 201
- …