4,506 research outputs found
Learning and Interpreting Multi-Multi-Instance Learning Networks
We introduce an extension of the multi-instance learning problem where
examples are organized as nested bags of instances (e.g., a document could be
represented as a bag of sentences, which in turn are bags of words). This
framework can be useful in various scenarios, such as text and image
classification, but also supervised learning over graphs. As a further
advantage, multi-multi instance learning enables a particular way of
interpreting predictions and the decision function. Our approach is based on a
special neural network layer, called bag-layer, whose units aggregate bags of
inputs of arbitrary size. We prove theoretically that the associated class of
functions contains all Boolean functions over sets of sets of instances and we
provide empirical evidence that functions of this kind can be actually learned
on semi-synthetic datasets. We finally present experiments on text
classification, on citation graphs, and social graph data, which show that our
model obtains competitive results with respect to accuracy when compared to
other approaches such as convolutional networks on graphs, while at the same
time it supports a general approach to interpret the learnt model, as well as
explain individual predictions.Comment: JML
A Practitioners' Guide to Transfer Learning for Text Classification using Convolutional Neural Networks
Transfer Learning (TL) plays a crucial role when a given dataset has
insufficient labeled examples to train an accurate model. In such scenarios,
the knowledge accumulated within a model pre-trained on a source dataset can be
transferred to a target dataset, resulting in the improvement of the target
model. Though TL is found to be successful in the realm of image-based
applications, its impact and practical use in Natural Language Processing (NLP)
applications is still a subject of research. Due to their hierarchical
architecture, Deep Neural Networks (DNN) provide flexibility and customization
in adjusting their parameters and depth of layers, thereby forming an apt area
for exploiting the use of TL. In this paper, we report the results and
conclusions obtained from extensive empirical experiments using a Convolutional
Neural Network (CNN) and try to uncover thumb rules to ensure a successful
positive transfer. In addition, we also highlight the flawed means that could
lead to a negative transfer. We explore the transferability of various layers
and describe the effect of varying hyper-parameters on the transfer
performance. Also, we present a comparison of accuracy value and model size
against state-of-the-art methods. Finally, we derive inferences from the
empirical results and provide best practices to achieve a successful positive
transfer.Comment: 9 pages, 2 figures, accepted in SDM 201
Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks
One of the challenges in modeling cognitive events from electroencephalogram
(EEG) data is finding representations that are invariant to inter- and
intra-subject differences, as well as to inherent noise associated with such
data. Herein, we propose a novel approach for learning such representations
from multi-channel EEG time-series, and demonstrate its advantages in the
context of mental load classification task. First, we transform EEG activities
into a sequence of topology-preserving multi-spectral images, as opposed to
standard EEG analysis techniques that ignore such spatial information. Next, we
train a deep recurrent-convolutional network inspired by state-of-the-art video
classification to learn robust representations from the sequence of images. The
proposed approach is designed to preserve the spatial, spectral, and temporal
structure of EEG which leads to finding features that are less sensitive to
variations and distortions within each dimension. Empirical evaluation on the
cognitive load classification task demonstrated significant improvements in
classification accuracy over current state-of-the-art approaches in this field.Comment: To be published as a conference paper at ICLR 201
Adversarial Multi-task Learning for Text Classification
Neural network models have shown their promising opportunities for multi-task
learning, which focus on learning the shared layers to extract the common and
task-invariant features. However, in most existing approaches, the extracted
shared features are prone to be contaminated by task-specific features or the
noise brought by other tasks. In this paper, we propose an adversarial
multi-task learning framework, alleviating the shared and private latent
feature spaces from interfering with each other. We conduct extensive
experiments on 16 different text classification tasks, which demonstrates the
benefits of our approach. Besides, we show that the shared knowledge learned by
our proposed model can be regarded as off-the-shelf knowledge and easily
transferred to new tasks. The datasets of all 16 tasks are publicly available
at \url{http://nlp.fudan.edu.cn/data/}Comment: Accepted by ACL201
A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning
Multi-task learning leverages potential correlations among related tasks to
extract common features and yield performance gains. However, most previous
works only consider simple or weak interactions, thereby failing to model
complex correlations among three or more tasks. In this paper, we propose a
multi-task learning architecture with four types of recurrent neural layers to
fuse information across multiple related tasks. The architecture is
structurally flexible and considers various interactions among tasks, which can
be regarded as a generalized case of many previous works. Extensive experiments
on five benchmark datasets for text classification show that our model can
significantly improve performances of related tasks with additional information
from others
- …