3,715 research outputs found
Deep Architectures and Ensembles for Semantic Video Classification
This work addresses the problem of accurate semantic labelling of short
videos. To this end, a multitude of different deep nets, ranging from
traditional recurrent neural networks (LSTM, GRU), temporal agnostic networks
(FV,VLAD,BoW), fully connected neural networks mid-stage AV fusion and others.
Additionally, we also propose a residual architecture-based DNN for video
classification, with state-of-the art classification performance at
significantly reduced complexity. Furthermore, we propose four new approaches
to diversity-driven multi-net ensembling, one based on fast correlation measure
and three incorporating a DNN-based combiner. We show that significant
performance gains can be achieved by ensembling diverse nets and we investigate
factors contributing to high diversity. Based on the extensive YouTube8M
dataset, we provide an in-depth evaluation and analysis of their behaviour. We
show that the performance of the ensemble is state-of-the-art achieving the
highest accuracy on the YouTube-8M Kaggle test data. The performance of the
ensemble of classifiers was also evaluated on the HMDB51 and UCF101 datasets,
and show that the resulting method achieves comparable accuracy with
state-of-the-art methods using similar input features
A large-scale evaluation framework for EEG deep learning architectures
EEG is the most common signal source for noninvasive BCI applications. For
such applications, the EEG signal needs to be decoded and translated into
appropriate actions. A recently emerging EEG decoding approach is deep learning
with Convolutional or Recurrent Neural Networks (CNNs, RNNs) with many
different architectures already published. Here we present a novel framework
for the large-scale evaluation of different deep-learning architectures on
different EEG datasets. This framework comprises (i) a collection of EEG
datasets currently including 100 examples (recording sessions) from six
different classification problems, (ii) a collection of different EEG decoding
algorithms, and (iii) a wrapper linking the decoders to the data as well as
handling structured documentation of all settings and (hyper-) parameters and
statistics, designed to ensure transparency and reproducibility. As an
applications example we used our framework by comparing three publicly
available CNN architectures: the Braindecode Deep4 ConvNet, Braindecode Shallow
ConvNet, and two versions of EEGNet. We also show how our framework can be used
to study similarities and differences in the performance of different decoding
methods across tasks. We argue that the deep learning EEG framework as
described here could help to tap the full potential of deep learning for BCI
applications.Comment: 7 pages, 3 figures, final version accepted for presentation at IEEE
SMC 2018 conferenc
The Long-Short Story of Movie Description
Generating descriptions for videos has many applications including assisting
blind people and human-robot interaction. The recent advances in image
captioning as well as the release of large-scale movie description datasets
such as MPII Movie Description allow to study this task in more depth. Many of
the proposed methods for image captioning rely on pre-trained object classifier
CNNs and Long-Short Term Memory recurrent networks (LSTMs) for generating
descriptions. While image description focuses on objects, we argue that it is
important to distinguish verbs, objects, and places in the challenging setting
of movie description. In this work we show how to learn robust visual
classifiers from the weak annotations of the sentence descriptions. Based on
these visual classifiers we learn how to generate a description using an LSTM.
We explore different design choices to build and train the LSTM and achieve the
best performance to date on the challenging MPII-MD dataset. We compare and
analyze our approach and prior work along various dimensions to better
understand the key challenges of the movie description task
- …