11,683 research outputs found
Learning Student Networks via Feature Embedding
Deep convolutional neural networks have been widely used in numerous
applications, but their demanding storage and computational resource
requirements prevent their applications on mobile devices. Knowledge
distillation aims to optimize a portable student network by taking the
knowledge from a well-trained heavy teacher network. Traditional
teacher-student based methods used to rely on additional fully-connected layers
to bridge intermediate layers of teacher and student networks, which brings in
a large number of auxiliary parameters. In contrast, this paper aims to
propagate information from teacher to student without introducing new variables
which need to be optimized. We regard the teacher-student paradigm from a new
perspective of feature embedding. By introducing the locality preserving loss,
the student network is encouraged to generate the low-dimensional features
which could inherit intrinsic properties of their corresponding
high-dimensional features from teacher network. The resulting portable network
thus can naturally maintain the performance as that of the teacher network.
Theoretical analysis is provided to justify the lower computation complexity of
the proposed method. Experiments on benchmark datasets and well-trained
networks suggest that the proposed algorithm is superior to state-of-the-art
teacher-student learning methods in terms of computational and storage
complexity
Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference
Modern convolutional neural networks apply the same operations on every pixel
in an image. However, not all image regions are equally important. To address
this inefficiency, we propose a method to dynamically apply convolutions
conditioned on the input image. We introduce a residual block where a small
gating branch learns which spatial positions should be evaluated. These
discrete gating decisions are trained end-to-end using the Gumbel-Softmax
trick, in combination with a sparsity criterion. Our experiments on CIFAR,
ImageNet and MPII show that our method has better focus on the region of
interest and better accuracy than existing methods, at a lower computational
complexity. Moreover, we provide an efficient CUDA implementation of our
dynamic convolutions using a gather-scatter approach, achieving a significant
improvement in inference speed with MobileNetV2 residual blocks. On human pose
estimation, a task that is inherently spatially sparse, the processing speed is
increased by 60% with no loss in accuracy.Comment: CVPR 2020 (poster) https://github.com/thomasverelst/dyncon
Dual Path Networks
In this work, we present a simple, highly efficient and modularized Dual Path
Network (DPN) for image classification which presents a new topology of
connection paths internally. By revealing the equivalence of the
state-of-the-art Residual Network (ResNet) and Densely Convolutional Network
(DenseNet) within the HORNN framework, we find that ResNet enables feature
re-usage while DenseNet enables new features exploration which are both
important for learning good representations. To enjoy the benefits from both
path topologies, our proposed Dual Path Network shares common features while
maintaining the flexibility to explore new features through dual path
architectures. Extensive experiments on three benchmark datasets, ImagNet-1k,
Places365 and PASCAL VOC, clearly demonstrate superior performance of the
proposed DPN over state-of-the-arts. In particular, on the ImagNet-1k dataset,
a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model
size, 25% less computational cost and 8% lower memory consumption, and a deeper
DPN (DPN-131) further pushes the state-of-the-art single model performance with
about 2 times faster training speed. Experiments on the Places365 large-scale
scene dataset, PASCAL VOC detection dataset, and PASCAL VOC segmentation
dataset also demonstrate its consistently better performance than DenseNet,
ResNet and the latest ResNeXt model over various applications.Comment: for code and models, see https://github.com/cypw/DPN
PILAE: A Non-gradient Descent Learning Scheme for Deep Feedforward Neural Networks
In this work, a non-gradient descent learning scheme is proposed for deep
feedforward neural networks (DNN). As we known, autoencoder can be used as the
building blocks of the multi-layer perceptron (MLP) deep neural network. So,
the MLP will be taken as an example to illustrate the proposed scheme of
pseudoinverse learning algorithm for autoencoder (PILAE) training. The PILAE
with low rank approximation is a non-gradient based learning algorithm, and the
encoder weight matrix is set to be the low rank approximation of the
pseudoinverse of the input matrix, while the decoder weight matrix is
calculated by the pseudoinverse learning algorithm. It is worth to note that
only few network structure hyperparameters need to be tuned. Hence, the
proposed algorithm can be regarded as a quasi-automated training algorithm
which can be utilized in autonomous machine learning research field. The
experimental results show that the proposed learning scheme for DNN can achieve
better performance on considering the tradeoff between training efficiency and
classification accuracy.Comment: This work is our effort toward to realize AutoM
Doing the impossible: Why neural networks can be trained at all
As deep neural networks grow in size, from thousands to millions to billions
of weights, the performance of those networks becomes limited by our ability to
accurately train them. A common naive question arises: if we have a system with
billions of degrees of freedom, don't we also need billions of samples to train
it? Of course, the success of deep learning indicates that reliable models can
be learned with reasonable amounts of data. Similar questions arise in protein
folding, spin glasses and biological neural networks. With effectively infinite
potential folding/spin/wiring configurations, how does the system find the
precise arrangement that leads to useful and robust results? Simple sampling of
the possible configurations until an optimal one is reached is not a viable
option even if one waited for the age of the universe. On the contrary, there
appears to be a mechanism in the above phenomena that forces them to achieve
configurations that live on a low-dimensional manifold, avoiding the curse of
dimensionality. In the current work we use the concept of mutual information
between successive layers of a deep neural network to elucidate this mechanism
and suggest possible ways of exploiting it to accelerate training. We show that
adding structure to the neural network that enforces higher mutual information
between layers speeds training and leads to more accurate results. High mutual
information between layers implies that the effective number of free parameters
is exponentially smaller than the raw number of tunable weights.Comment: The material is based on a poster from the 15th Neural Computation
and Psychology Workshop "Contemporary Neural Network Models: Machine
Learning, Artificial Intelligence, and Cognition" August 8-9, 2016, Drexel
University, Philadelphia, PA, US
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
The ability to act in multiple environments and transfer previous knowledge
to new situations can be considered a critical aspect of any intelligent agent.
Towards this goal, we define a novel method of multitask and transfer learning
that enables an autonomous agent to learn how to behave in multiple tasks
simultaneously, and then generalize its knowledge to new domains. This method,
termed "Actor-Mimic", exploits the use of deep reinforcement learning and model
compression techniques to train a single policy network that learns how to act
in a set of distinct tasks by using the guidance of several expert teachers. We
then show that the representations learnt by the deep policy network are
capable of generalizing to new tasks with no prior expert guidance, speeding up
learning in novel environments. Although our method can in general be applied
to a wide range of problems, we use Atari games as a testing environment to
demonstrate these methods.Comment: Accepted as a conference paper at ICLR 201
Integrating kinematics and environment context into deep inverse reinforcement learning for predicting off-road vehicle trajectories
Predicting the motion of a mobile agent from a third-person perspective is an
important component for many robotics applications, such as autonomous
navigation and tracking. With accurate motion prediction of other agents,
robots can plan for more intelligent behaviors to achieve specified objectives,
instead of acting in a purely reactive way. Previous work addresses motion
prediction by either only filtering kinematics, or using hand-designed and
learned representations of the environment. Instead of separating kinematic and
environmental context, we propose a novel approach to integrate both into an
inverse reinforcement learning (IRL) framework for trajectory prediction.
Instead of exponentially increasing the state-space complexity with kinematics,
we propose a two-stage neural network architecture that considers motion and
environment together to recover the reward function. The first-stage network
learns feature representations of the environment using low-level LiDAR
statistics and the second-stage network combines those learned features with
kinematics data. We collected over 30 km of off-road driving data and validated
experimentally that our method can effectively extract useful environmental and
kinematic features. We generate accurate predictions of the distribution of
future trajectories of the vehicle, encoding complex behaviors such as
multi-modal distributions at road intersections, and even show different
predictions at the same intersection depending on the vehicle's speed.Comment: CoRL 201
Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification
The x-vector based deep neural network (DNN) embedding systems have
demonstrated effectiveness for text-independent speaker verification. This
paper presents a multi-task learning architecture for training the speaker
embedding DNN with the primary task of classifying the target speakers, and the
auxiliary task of reconstructing the first- and higher-order statistics of the
original input utterance. The proposed training strategy aggregates both the
supervised and unsupervised learning into one framework to make the speaker
embeddings more discriminative and robust. Experiments are carried out using
the NIST SRE16 evaluation dataset and the VOiCES dataset. The results
demonstrate that our proposed method outperforms the original x-vector approach
with very low additional complexity added.Comment: 5 pages,2 figures, submitted to INTERSPEECH 201
Multi-Level Recurrent Residual Networks for Action Recognition
Most existing Convolutional Neural Networks(CNNs) used for action recognition
are either difficult to optimize or underuse crucial temporal information.
Inspired by the fact that the recurrent model consistently makes breakthroughs
in the task related to sequence, we propose a novel Multi-Level Recurrent
Residual Networks(MRRN) which incorporates three recognition streams. Each
stream consists of a Residual Networks(ResNets) and a recurrent model. The
proposed model captures spatiotemporal information by employing both
alternative ResNets to learn spatial representations from static frames and
stacked Simple Recurrent Units(SRUs) to model temporal dynamics. Three
distinct-level streams learned low-, mid-, high-level representations
independently are fused by computing a weighted average of their softmax scores
to obtain the complementary representations of the video. Unlike previous
models which boost performance at the cost of time complexity and space
complexity, our models have a lower complexity by employing shortcut connection
and are trained end-to-end with greater efficiency. MRRN displays significant
performance improvements compared to CNN-RNN framework baselines and obtains
comparable performance with the state-of-the-art, achieving 51.3% on HMDB-51
dataset and 81.9% on UCF-101 dataset although no additional data
Fast and Accurate Person Re-Identification with RMNet
In this paper we introduce a new neural network architecture designed to use
in embedded vision applications. It merges the best working practices of
network architectures like MobileNets and ResNets to our named RMNet
architecture. We also focus on key moments of building mobile architectures to
carry out in the limited computation budget. Additionally, to demonstrate the
effectiveness of our architecture we evaluate the RMNet backbone on Person
Re-identification task. The proposed approach is in top 3 of state of the art
solutions on Market-1501 challenge, however our method significantly
outperforms them by the inference speed
- …