1,327 research outputs found
Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition
We study in this paper how to initialize the parameters of multinomial
logistic regression (a fully connected layer followed with softmax and cross
entropy loss), which is widely used in deep neural network (DNN) models for
classification problems. As logistic regression is widely known not having a
closed-form solution, it is usually randomly initialized, leading to several
deficiencies especially in transfer learning where all the layers except for
the last task-specific layer are initialized using a pre-trained model. The
deficiencies include slow convergence speed, possibility of stuck in local
minimum, and the risk of over-fitting. To address those deficiencies, we first
study the properties of logistic regression and propose a closed-form
approximate solution named regularized Gaussian classifier (RGC). Then we adopt
this approximate solution to initialize the task-specific linear layer and
demonstrate superior performance over random initialization in terms of both
accuracy and convergence speed on various tasks and datasets. For example, for
image classification, our approach can reduce the training time by 10 times and
achieve 3.2% gain in accuracy for Flickr-style classification. For object
detection, our approach can also be 10 times faster in training for the same
accuracy, or 5% better in terms of mAP for VOC 2007 with slightly longer
training.Comment: tech repor
Frank-Wolfe Network: An Interpretable Deep Structure for Non-Sparse Coding
The problem of -norm constrained coding is to convert signal into code
that lies inside an -ball and most faithfully reconstructs the signal.
Previous works under the name of sparse coding considered the cases of
and norms. The cases with values, i.e. non-sparse coding studied in
this paper, remain a difficulty. We propose an interpretable deep structure
namely Frank-Wolfe Network (F-W Net), whose architecture is inspired by
unrolling and truncating the Frank-Wolfe algorithm for solving an -norm
constrained problem with . We show that the Frank-Wolfe solver for the
-norm constraint leads to a novel closed-form nonlinear unit, which is
parameterized by and termed . The unit links the
conventional pooling, activation, and normalization operations, making F-W Net
distinct from existing deep networks either heuristically designed or converted
from projected gradient descent algorithms. We further show that the
hyper-parameter can be made learnable instead of pre-chosen in F-W Net,
which gracefully solves the non-sparse coding problem even with unknown . We
evaluate the performance of F-W Net on an extensive range of simulations as
well as the task of handwritten digit recognition, where F-W Net exhibits
strong learning capability. We then propose a convolutional version of F-W Net,
and apply the convolutional F-W Net into image denoising and super-resolution
tasks, where F-W Net all demonstrates impressive effectiveness, flexibility,
and robustness.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video
Technology. Code and pretrained models: https://github.com/sunke123/FW-Ne
Training Deeper Convolutional Networks with Deep Supervision
One of the most promising ways of improving the performance of deep
convolutional neural networks is by increasing the number of convolutional
layers. However, adding layers makes training more difficult and
computationally expensive. In order to train deeper networks, we propose to add
auxiliary supervision branches after certain intermediate layers during
training. We formulate a simple rule of thumb to determine where these branches
should be added. The resulting deeply supervised structure makes the training
much easier and also produces better classification results on ImageNet and the
recently released, larger MIT Places datase
Deep Neural Networks
Deep Neural Networks (DNNs) are universal function approximators providing
state-of- the-art solutions on wide range of applications. Common perceptual
tasks such as speech recognition, image classification, and object tracking are
now commonly tackled via DNNs. Some fundamental problems remain: (1) the lack
of a mathematical framework providing an explicit and interpretable
input-output formula for any topology, (2) quantification of DNNs stability
regarding adversarial examples (i.e. modified inputs fooling DNN predictions
whilst undetectable to humans), (3) absence of generalization guarantees and
controllable behaviors for ambiguous patterns, (4) leverage unlabeled data to
apply DNNs to domains where expert labeling is scarce as in the medical field.
Answering those points would provide theoretical perspectives for further
developments based on a common ground. Furthermore, DNNs are now deployed in
tremendous societal applications, pushing the need to fill this theoretical gap
to ensure control, reliability, and interpretability.Comment: Technical Repor
Knowledge Matters: Importance of Prior Information for Optimization
We explore the effect of introducing prior information into the intermediate
level of neural networks for a learning task on which all the state-of-the-art
machine learning algorithms tested failed to learn. We motivate our work from
the hypothesis that humans learn such intermediate concepts from other
individuals via a form of supervision or guidance using a curriculum. The
experiments we have conducted provide positive evidence in favor of this
hypothesis. In our experiments, a two-tiered MLP architecture is trained on a
dataset with 64x64 binary inputs images, each image with three sprites. The
final task is to decide whether all the sprites are the same or one of them is
different. Sprites are pentomino tetris shapes and they are placed in an image
with different locations using scaling and rotation transformations. The first
part of the two-tiered MLP is pre-trained with intermediate-level targets being
the presence of sprites at each location, while the second part takes the
output of the first part as input and predicts the final task's target binary
event. The two-tiered MLP architecture, with a few tens of thousand examples,
was able to learn the task perfectly, whereas all other algorithms (include
unsupervised pre-training, but also traditional algorithms like SVMs, decision
trees and boosting) all perform no better than chance. We hypothesize that the
optimization difficulty involved when the intermediate pre-training is not
performed is due to the {\em composition} of two highly non-linear tasks. Our
findings are also consistent with hypotheses on cultural learning inspired by
the observations of optimization problems with deep learning, presumably
because of effective local minima.Comment: 37 Pages, 5 figures, 5 tables JMLR Special Topics on Representation
Learning Submissio
Predicting Adversarial Examples with High Confidence
It has been suggested that adversarial examples cause deep learning models to
make incorrect predictions with high confidence. In this work, we take the
opposite stance: an overly confident model is more likely to be vulnerable to
adversarial examples. This work is one of the most proactive approaches taken
to date, as we link robustness with non-calibrated model confidence on noisy
images, providing a data-augmentation-free path forward. The adversarial
examples phenomenon is most easily explained by the trend of increasing
non-regularized model capacity, while the diversity and number of samples in
common datasets has remained flat. Test accuracy has incorrectly been
associated with true generalization performance, ignoring that training and
test splits are often extremely similar in terms of the overall representation
space. The transferability property of adversarial examples was previously used
as evidence against overfitting arguments, a perceived random effect, but
overfitting is not always random.Comment: Under review by the International Conference on Machine Learning
(ICML
Fast and Accurate Person Re-Identification with RMNet
In this paper we introduce a new neural network architecture designed to use
in embedded vision applications. It merges the best working practices of
network architectures like MobileNets and ResNets to our named RMNet
architecture. We also focus on key moments of building mobile architectures to
carry out in the limited computation budget. Additionally, to demonstrate the
effectiveness of our architecture we evaluate the RMNet backbone on Person
Re-identification task. The proposed approach is in top 3 of state of the art
solutions on Market-1501 challenge, however our method significantly
outperforms them by the inference speed
Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition
Research in human action recognition has accelerated significantly since the
introduction of powerful machine learning tools such as Convolutional Neural
Networks (CNNs). However, effective and efficient methods for incorporation of
temporal information into CNNs are still being actively explored in the recent
literature. Motivated by the popular recurrent attention models in the research
area of natural language processing, we propose the Attention-based Temporal
Weighted CNN (ATW), which embeds a visual attention model into a temporal
weighted multi-stream CNN. This attention model is simply implemented as
temporal weighting yet it effectively boosts the recognition performance of
video representations. Besides, each stream in the proposed ATW framework is
capable of end-to-end training, with both network parameters and temporal
weights optimized by stochastic gradient descent (SGD) with backpropagation.
Our experiments show that the proposed attention mechanism contributes
substantially to the performance gains with the more discriminative snippets by
focusing on more relevant video segments.Comment: 14th International Conference on Artificial Intelligence Applications
and Innovations (AIAI 2018), May 25-27, 2018, Rhodes, Greec
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
Harnessing Deep Neural Networks with Logic Rules
Combining deep neural networks with structured logic rules is desirable to
harness flexibility and reduce uninterpretability of the neural models. We
propose a general framework capable of enhancing various types of neural
networks (e.g., CNNs and RNNs) with declarative first-order logic rules.
Specifically, we develop an iterative distillation method that transfers the
structured information of logic rules into the weights of neural networks. We
deploy the framework on a CNN for sentiment analysis, and an RNN for named
entity recognition. With a few highly intuitive rules, we obtain substantial
improvements and achieve state-of-the-art or comparable results to previous
best-performing systems.Comment: Fix typos in appendix. ACL 201
- …