86,106 research outputs found
PLUME: Polyhedral Learning Using Mixture of Experts
In this paper, we propose a novel mixture of expert architecture for learning
polyhedral classifiers. We learn the parameters of the classifierusing an
expectation maximization algorithm. Wederive the generalization bounds of the
proposedapproach. Through an extensive simulation study, we show that the
proposed method performs comparably to other state-of-the-art approaches
Nesti-Net: Normal Estimation for Unstructured 3D Point Clouds using Convolutional Neural Networks
In this paper, we propose a normal estimation method for unstructured 3D
point clouds. This method, called Nesti-Net, builds on a new local point cloud
representation which consists of multi-scale point statistics (MuPS), estimated
on a local coarse Gaussian grid. This representation is a suitable input to a
CNN architecture. The normals are estimated using a mixture-of-experts (MoE)
architecture, which relies on a data-driven approach for selecting the optimal
scale around each point and encourages sub-network specialization. Interesting
insights into the network's resource distribution are provided. The scale
prediction significantly improves robustness to different noise levels, point
density variations and different levels of detail. We achieve state-of-the-art
results on a benchmark synthetic dataset and present qualitative results on
real scanned scenes.Comment: Code will be available after publication. Figure quality reduced to
fit size requirement. Higher quality images will be available in the final
pape
Learning Gating ConvNet for Two-Stream based Methods in Action Recognition
For the two-stream style methods in action recognition, fusing the two
streams' predictions is always by the weighted averaging scheme. This fusion
method with fixed weights lacks of pertinence to different action videos and
always needs trial and error on the validation set. In order to enhance the
adaptability of two-stream ConvNets and improve its performance, an end-to-end
trainable gated fusion method, namely gating ConvNet, for the two-stream
ConvNets is proposed in this paper based on the MoE (Mixture of Experts)
theory. The gating ConvNet takes the combination of feature maps from the same
layer of the spatial and the temporal nets as input and adopts ReLU (Rectified
Linear Unit) as the gating output activation function. To reduce the
over-fitting of gating ConvNet caused by the redundancy of parameters, a new
multi-task learning method is designed, which jointly learns the gating fusion
weights for the two streams and learns the gating ConvNet for action
classification. With our gated fusion method and multi-task learning approach,
a high accuracy of 94.5% is achieved on the dataset UCF101.Comment: 8 pages, 4 figure
Making Tree Ensembles Interpretable
Tree ensembles, such as random forest and boosted trees, are renowned for
their high prediction performance, whereas their interpretability is critically
limited. In this paper, we propose a post processing method that improves the
model interpretability of tree ensembles. After learning a complex tree
ensembles in a standard way, we approximate it by a simpler model that is
interpretable for human. To obtain the simpler model, we derive the EM
algorithm minimizing the KL divergence from the complex ensemble. A synthetic
experiment showed that a complicated tree ensemble was approximated reasonably
as interpretable.Comment: presented at 2016 ICML Workshop on Human Interpretability in Machine
Learning (WHI 2016), New York, N
Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments
Object detection is an essential task for autonomous robots operating in
dynamic and changing environments. A robot should be able to detect objects in
the presence of sensor noise that can be induced by changing lighting
conditions for cameras and false depth readings for range sensors, especially
RGB-D cameras. To tackle these challenges, we propose a novel adaptive fusion
approach for object detection that learns weighting the predictions of
different sensor modalities in an online manner. Our approach is based on a
mixture of convolutional neural network (CNN) experts and incorporates multiple
modalities including appearance, depth and motion. We test our method in
extensive robot experiments, in which we detect people in a combined indoor and
outdoor scenario from RGB-D data, and we demonstrate that our method can adapt
to harsh lighting changes and severe camera motion blur. Furthermore, we
present a new RGB-D dataset for people detection in mixed in- and outdoor
environments, recorded with a mobile robot. Code, pretrained models and dataset
are available at http://adaptivefusion.cs.uni-freiburg.deComment: Published at the 2016 IEEE/RSJ International Conference on
Intelligent Robots and Systems. Added a new baseline with respect to the IROS
version. Project page with code, pretrained models and our InOutDoorPeople
RGB-D dataset at http://adaptivefusion.cs.uni-freiburg.de
Boosted Generative Models
We propose a novel approach for using unsupervised boosting to create an
ensemble of generative models, where models are trained in sequence to correct
earlier mistakes. Our meta-algorithmic framework can leverage any existing base
learner that permits likelihood evaluation, including recent deep expressive
models. Further, our approach allows the ensemble to include discriminative
models trained to distinguish real data from model-generated data. We show
theoretical conditions under which incorporating a new model in the ensemble
will improve the fit and empirically demonstrate the effectiveness of our
black-box boosting algorithms on density estimation, classification, and sample
generation on benchmark datasets for a wide range of generative models.Comment: AAAI 201
On missing label patterns in semi-supervised learning
We investigate model based classification with partially labelled training
data. In many biostatistical applications, labels are manually assigned by
experts, who may leave some observations unlabelled due to class uncertainty.
We analyse semi-supervised learning as a missing data problem and identify
situations where the missing label pattern is non-ignorable for the purposes of
maximum likelihood estimation. In particular, we find that a relationship
between classification difficulty and the missing label pattern implies a
non-ignorable missingness mechanism. We examine a number of real datasets and
conclude the pattern of missing labels is related to the difficulty of
classification. We propose a joint modelling strategy involving the observed
data and the missing label mechanism to account for the systematic missing
labels. Full likelihood inference including the missing label mechanism can
improve the efficiency of parameter estimation, and increase classification
accuracy
Generative Models of Visually Grounded Imagination
It is easy for people to imagine what a man with pink hair looks like, even
if they have never seen such a person before. We call the ability to create
images of novel semantic concepts visually grounded imagination. In this paper,
we show how we can modify variational auto-encoders to perform this task. Our
method uses a novel training objective, and a novel product-of-experts
inference network, which can handle partially specified (abstract) concepts in
a principled and efficient way. We also propose a set of easy-to-compute
evaluation metrics that capture our intuitive notions of what it means to have
good visual imagination, namely correctness, coverage, and compositionality
(the 3 C's). Finally, we perform a detailed comparison of our method with two
existing joint image-attribute VAE methods (the JMVAE method of Suzuki et.al.
and the BiVCCA method of Wang et.al.) by applying them to two datasets: the
MNIST-with-attributes dataset (which we introduce here), and the CelebA
dataset.Comment: International Conference on Learning Representations (ICLR), 201
K-Plane Regression
In this paper, we present a novel algorithm for piecewise linear regression
which can learn continuous as well as discontinuous piecewise linear functions.
The main idea is to repeatedly partition the data and learn a liner model in in
each partition. While a simple algorithm incorporating this idea does not work
well, an interesting modification results in a good algorithm. The proposed
algorithm is similar in spirit to -means clustering algorithm. We show that
our algorithm can also be viewed as an EM algorithm for maximum likelihood
estimation of parameters under a reasonable probability model. We empirically
demonstrate the effectiveness of our approach by comparing its performance with
the state of art regression learning algorithms on some real world datasets
Hierarchical Deep Recurrent Architecture for Video Understanding
This paper introduces the system we developed for the Youtube-8M Video
Understanding Challenge, in which a large-scale benchmark dataset was used for
multi-label video classification. The proposed framework contains hierarchical
deep architecture, including the frame-level sequence modeling part and the
video-level classification part. In the frame-level sequence modelling part, we
explore a set of methods including Pooling-LSTM (PLSTM), Hierarchical-LSTM
(HLSTM), Random-LSTM (RLSTM) in order to address the problem of large amount of
frames in a video. We also introduce two attention pooling methods, single
attention pooling (ATT) and multiply attention pooling (Multi-ATT) so that we
can pay more attention to the informative frames in a video and ignore the
useless frames. In the video-level classification part, two methods are
proposed to increase the classification performance, i.e.
Hierarchical-Mixture-of-Experts (HMoE) and Classifier Chains (CC). Our final
submission is an ensemble consisting of 18 sub-models. In terms of the official
evaluation metric Global Average Precision (GAP) at 20, our best submission
achieves 0.84346 on the public 50% of test dataset and 0.84333 on the private
50% of test data.Comment: Accepted as Classification Challenge Track paper in CVPR 2017
Workshop on YouTube-8M Large-Scale Video Understandin
- …