9,071 research outputs found
The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge
This article describes the final solution of team monkeytyping, who finished
in second place in the YouTube-8M video understanding challenge. The dataset
used in this challenge is a large-scale benchmark for multi-label video
classification. We extend the work in [1] and propose several improvements for
frame sequence modeling. We propose a network structure called Chaining that
can better capture the interactions between labels. Also, we report our
approaches in dealing with multi-scale information and attention pooling. In
addition, We find that using the output of model ensemble as a side target in
training can boost single model performance. We report our experiments in
bagging, boosting, cascade, and stacking, and propose a stacking algorithm
called attention weighted stacking. Our final submission is an ensemble that
consists of 74 sub models, all of which are listed in the appendix.Comment: Submitted to the CVPR 2017 Workshop on YouTube-8M Large-Scale Video
Understandin
Designing mixture of deep experts
Mixture of Experts (MoE) is a classical architecture for ensembles where each
member is specialised in a given part of the input space or its expertise area.
Working in this manner, we aim to specialise the experts on smaller problems,
solving the original problem through some type of divide and conquer approach.
The goal of our research is to initially reproduce the work done by Collobert et
al[1] , 2002 followed by extending this work by using neural networks as experts
on different datasets. Specialised representations will be learned over different
aspects of the problem, and the results of the different members will be merged
according to their specific expertise. This expertise can then be learned itself by a
given network acting as a gating function.
MOE architecture composed on N expert networks. These experts are combined
via a gating network, which partition the input space accordingly. It is based on
divide and conquer strategy supervised by a gating network. Using a specialised
cost function the experts specialise in their sub-space. Using the discriminative
power of experts is much better than simply clustering. The gating network needs
to needs to learn how to assign examples to different specialists.
Such models show promise for building larger networks that are still cheap to
compute at test time, and more parallelizable at training time. We were able to
reproduce the work by the author and implemented a multi-class gater to classify
images.
We know that Neural Networks perform the best with lots of data. However,
some of our experiments require us to divide the dataset and train multiple Neural
Networks. We observe that in data deprived condition our MoE are almost on
par and compete with ensembles trained on complete data.
Keywords : Machine Learning, Multi Layer Perceptrons, Mixture of Experts, Support
Vector Machines, Divide and Conquer, Stochastic Gradient Descent, Optimization
- …