Search CORE

1,327 research outputs found

Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition

Author: Cheng Bowen
Guo Yandong
Hu Yuxiao
Wang Jianfeng
Xiao Rong
Zhang Lei
Publication venue
Publication date: 17/09/2018
Field of study

We study in this paper how to initialize the parameters of multinomial logistic regression (a fully connected layer followed with softmax and cross entropy loss), which is widely used in deep neural network (DNN) models for classification problems. As logistic regression is widely known not having a closed-form solution, it is usually randomly initialized, leading to several deficiencies especially in transfer learning where all the layers except for the last task-specific layer are initialized using a pre-trained model. The deficiencies include slow convergence speed, possibility of stuck in local minimum, and the risk of over-fitting. To address those deficiencies, we first study the properties of logistic regression and propose a closed-form approximate solution named regularized Gaussian classifier (RGC). Then we adopt this approximate solution to initialize the task-specific linear layer and demonstrate superior performance over random initialization in terms of both accuracy and convergence speed on various tasks and datasets. For example, for image classification, our approach can reduce the training time by 10 times and achieve 3.2% gain in accuracy for Flickr-style classification. For object detection, our approach can also be 10 times faster in training for the same accuracy, or 5% better in terms of mAP for VOC 2007 with slightly longer training.Comment: tech repor

arXiv.org e-Print Archive

Frank-Wolfe Network: An Interpretable Deep Structure for Non-Sparse Coding

Author: Liu Dong
Liu Runsheng
Sun Ke
Wang Zhangyang
Zha Zheng-Jun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/08/2019
Field of study

The problem of

L_p

-norm constrained coding is to convert signal into code that lies inside an

L_p

-ball and most faithfully reconstructs the signal. Previous works under the name of sparse coding considered the cases of

L_0

and

L_1

norms. The cases with

p>1

values, i.e. non-sparse coding studied in this paper, remain a difficulty. We propose an interpretable deep structure namely Frank-Wolfe Network (F-W Net), whose architecture is inspired by unrolling and truncating the Frank-Wolfe algorithm for solving an

L_p

-norm constrained problem with

p\geq 1

. We show that the Frank-Wolfe solver for the

L_p

-norm constraint leads to a novel closed-form nonlinear unit, which is parameterized by

p

and termed

pool_p

. The

pool_p

unit links the conventional pooling, activation, and normalization operations, making F-W Net distinct from existing deep networks either heuristically designed or converted from projected gradient descent algorithms. We further show that the hyper-parameter

p

can be made learnable instead of pre-chosen in F-W Net, which gracefully solves the non-sparse coding problem even with unknown

p

. We evaluate the performance of F-W Net on an extensive range of simulations as well as the task of handwritten digit recognition, where F-W Net exhibits strong learning capability. We then propose a convolutional version of F-W Net, and apply the convolutional F-W Net into image denoising and super-resolution tasks, where F-W Net all demonstrates impressive effectiveness, flexibility, and robustness.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video Technology. Code and pretrained models: https://github.com/sunke123/FW-Ne

arXiv.org e-Print Archive

Training Deeper Convolutional Networks with Deep Supervision

Author: Lazebnik Svetlana
Lee Chen-Yu
Tu Zhuowen
Wang Liwei
Publication venue
Publication date: 11/05/2015
Field of study

One of the most promising ways of improving the performance of deep convolutional neural networks is by increasing the number of convolutional layers. However, adding layers makes training more difficult and computationally expensive. In order to train deeper networks, we propose to add auxiliary supervision branches after certain intermediate layers during training. We formulate a simple rule of thumb to determine where these branches should be added. The resulting deeply supervised structure makes the training much easier and also produces better classification results on ImageNet and the recently released, larger MIT Places datase

arXiv.org e-Print Archive

Deep Neural Networks

Author: Balestriero Randall
Baraniuk Richard
Publication venue
Publication date: 06/11/2017
Field of study

Deep Neural Networks (DNNs) are universal function approximators providing state-of- the-art solutions on wide range of applications. Common perceptual tasks such as speech recognition, image classification, and object tracking are now commonly tackled via DNNs. Some fundamental problems remain: (1) the lack of a mathematical framework providing an explicit and interpretable input-output formula for any topology, (2) quantification of DNNs stability regarding adversarial examples (i.e. modified inputs fooling DNN predictions whilst undetectable to humans), (3) absence of generalization guarantees and controllable behaviors for ambiguous patterns, (4) leverage unlabeled data to apply DNNs to domains where expert labeling is scarce as in the medical field. Answering those points would provide theoretical perspectives for further developments based on a common ground. Furthermore, DNNs are now deployed in tremendous societal applications, pushing the need to fill this theoretical gap to ensure control, reliability, and interpretability.Comment: Technical Repor

arXiv.org e-Print Archive

Knowledge Matters: Importance of Prior Information for Optimization

Author: Bengio Yoshua
Gülçehre Çağlar
Publication venue
Publication date: 13/07/2013
Field of study

We explore the effect of introducing prior information into the intermediate level of neural networks for a learning task on which all the state-of-the-art machine learning algorithms tested failed to learn. We motivate our work from the hypothesis that humans learn such intermediate concepts from other individuals via a form of supervision or guidance using a curriculum. The experiments we have conducted provide positive evidence in favor of this hypothesis. In our experiments, a two-tiered MLP architecture is trained on a dataset with 64x64 binary inputs images, each image with three sprites. The final task is to decide whether all the sprites are the same or one of them is different. Sprites are pentomino tetris shapes and they are placed in an image with different locations using scaling and rotation transformations. The first part of the two-tiered MLP is pre-trained with intermediate-level targets being the presence of sprites at each location, while the second part takes the output of the first part as input and predicts the final task's target binary event. The two-tiered MLP architecture, with a few tens of thousand examples, was able to learn the task perfectly, whereas all other algorithms (include unsupervised pre-training, but also traditional algorithms like SVMs, decision trees and boosting) all perform no better than chance. We hypothesize that the optimization difficulty involved when the intermediate pre-training is not performed is due to the {\em composition} of two highly non-linear tasks. Our findings are also consistent with hypotheses on cultural learning inspired by the observations of optimization problems with deep learning, presumably because of effective local minima.Comment: 37 Pages, 5 figures, 5 tables JMLR Special Topics on Representation Learning Submissio

arXiv.org e-Print Archive

Predicting Adversarial Examples with High Confidence

Author: Galloway Angus
Moussa Medhat
Taylor Graham W.
Publication venue
Publication date: 12/02/2018
Field of study

It has been suggested that adversarial examples cause deep learning models to make incorrect predictions with high confidence. In this work, we take the opposite stance: an overly confident model is more likely to be vulnerable to adversarial examples. This work is one of the most proactive approaches taken to date, as we link robustness with non-calibrated model confidence on noisy images, providing a data-augmentation-free path forward. The adversarial examples phenomenon is most easily explained by the trend of increasing non-regularized model capacity, while the diversity and number of samples in common datasets has remained flat. Test accuracy has incorrectly been associated with true generalization performance, ignoring that training and test splits are often extremely similar in terms of the overall representation space. The transferability property of adversarial examples was previously used as evidence against overfitting arguments, a perceived random effect, but overfitting is not always random.Comment: Under review by the International Conference on Machine Learning (ICML

arXiv.org e-Print Archive

Fast and Accurate Person Re-Identification with RMNet

Author: Izutov Evgeny
Publication venue
Publication date: 06/12/2018
Field of study

In this paper we introduce a new neural network architecture designed to use in embedded vision applications. It merges the best working practices of network architectures like MobileNets and ResNets to our named RMNet architecture. We also focus on key moments of building mobile architectures to carry out in the limited computation budget. Additionally, to demonstrate the effectiveness of our architecture we evaluate the RMNet backbone on Person Re-identification task. The proposed approach is in top 3 of state of the art solutions on Market-1501 challenge, however our method significantly outperforms them by the inference speed

arXiv.org e-Print Archive

Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition

Author: Hua Gang
Liu Ziyi
Niu Zhenxing
Wang Le
Zang Jinliang
Zhang Qilin
Zheng Nanning
Publication venue
Publication date: 19/03/2018
Field of study

Research in human action recognition has accelerated significantly since the introduction of powerful machine learning tools such as Convolutional Neural Networks (CNNs). However, effective and efficient methods for incorporation of temporal information into CNNs are still being actively explored in the recent literature. Motivated by the popular recurrent attention models in the research area of natural language processing, we propose the Attention-based Temporal Weighted CNN (ATW), which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is simply implemented as temporal weighting yet it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW framework is capable of end-to-end training, with both network parameters and temporal weights optimized by stochastic gradient descent (SGD) with backpropagation. Our experiments show that the proposed attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments.Comment: 14th International Conference on Artificial Intelligence Applications and Innovations (AIAI 2018), May 25-27, 2018, Rhodes, Greec

arXiv.org e-Print Archive

Recommended from our members

Towards Informed Exploration for Deep Reinforcement Learning

Author: Tang Haoran
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact

eScholarship - University of California

Harnessing Deep Neural Networks with Logic Rules

Author: Hovy Eduard
Hu Zhiting
Liu Zhengzhong
Ma Xuezhe
Xing Eric
Publication venue
Publication date: 08/08/2020
Field of study

Combining deep neural networks with structured logic rules is desirable to harness flexibility and reduce uninterpretability of the neural models. We propose a general framework capable of enhancing various types of neural networks (e.g., CNNs and RNNs) with declarative first-order logic rules. Specifically, we develop an iterative distillation method that transfers the structured information of logic rules into the weights of neural networks. We deploy the framework on a CNN for sentiment analysis, and an RNN for named entity recognition. With a few highly intuitive rules, we obtain substantial improvements and achieve state-of-the-art or comparable results to previous best-performing systems.Comment: Fix typos in appendix. ACL 201

arXiv.org e-Print Archive