2,002 research outputs found
The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge
This article describes the final solution of team monkeytyping, who finished
in second place in the YouTube-8M video understanding challenge. The dataset
used in this challenge is a large-scale benchmark for multi-label video
classification. We extend the work in [1] and propose several improvements for
frame sequence modeling. We propose a network structure called Chaining that
can better capture the interactions between labels. Also, we report our
approaches in dealing with multi-scale information and attention pooling. In
addition, We find that using the output of model ensemble as a side target in
training can boost single model performance. We report our experiments in
bagging, boosting, cascade, and stacking, and propose a stacking algorithm
called attention weighted stacking. Our final submission is an ensemble that
consists of 74 sub models, all of which are listed in the appendix.Comment: Submitted to the CVPR 2017 Workshop on YouTube-8M Large-Scale Video
Understandin
Recommended from our members
Explainable improved ensembling for natural language and vision
Ensemble methods are well-known in machine learning for improving prediction
accuracy. However, they do not adequately discriminate among underlying
component models. The measure of how good a model is can sometimes be estimated
from “why” it made a specific prediction. We propose a novel approach
called Stacking With Auxiliary Features (SWAF) that effectively leverages component
models by integrating such relevant information from context to improve
ensembling. Using auxiliary features, our algorithm learns to rely on systems that
not just agree on an output prediction but also the source or origin of that output.
We demonstrate our approach to challenging structured prediction problems
in Natural Language Processing and Vision including Information Extraction, Object
Detection, and Visual Question Answering. We also present a variant of SWAF
for combining systems that do not have training data in an unsupervised ensemble
with systems that do have training data. Our combined approach obtains a new
state-of-the-art, beating our prior performance on Information Extraction.
The state-of-the-art systems on many AI applications are ensembles of deeplearning
models. These models are hard to interpret and can sometimes make odd
mistakes. Explanations make AI systems more transparent and also justify their
predictions. We propose a scalable approach to generate visual explanations for
ensemble methods using the localization maps of the component systems. Crowdsourced
human evaluation on two new metrics indicates that our ensemble’s explanation
significantly qualitatively outperforms individual systems’ explanations.Computer Science
Neural Collaborative Filtering
In recent years, deep neural networks have yielded immense success on speech
recognition, computer vision and natural language processing. However, the
exploration of deep neural networks on recommender systems has received
relatively less scrutiny. In this work, we strive to develop techniques based
on neural networks to tackle the key problem in recommendation -- collaborative
filtering -- on the basis of implicit feedback. Although some recent work has
employed deep learning for recommendation, they primarily used it to model
auxiliary information, such as textual descriptions of items and acoustic
features of musics. When it comes to model the key factor in collaborative
filtering -- the interaction between user and item features, they still
resorted to matrix factorization and applied an inner product on the latent
features of users and items. By replacing the inner product with a neural
architecture that can learn an arbitrary function from data, we present a
general framework named NCF, short for Neural network-based Collaborative
Filtering. NCF is generic and can express and generalize matrix factorization
under its framework. To supercharge NCF modelling with non-linearities, we
propose to leverage a multi-layer perceptron to learn the user-item interaction
function. Extensive experiments on two real-world datasets show significant
improvements of our proposed NCF framework over the state-of-the-art methods.
Empirical evidence shows that using deeper layers of neural networks offers
better recommendation performance.Comment: 10 pages, 7 figure
- …