32,409 research outputs found
Understanding Deep Learning Generalization by Maximum Entropy
Deep learning achieves remarkable generalization capability with overwhelming
number of model parameters. Theoretical understanding of deep learning
generalization receives recent attention yet remains not fully explored. This
paper attempts to provide an alternative understanding from the perspective of
maximum entropy. We first derive two feature conditions that softmax regression
strictly apply maximum entropy principle. DNN is then regarded as approximating
the feature conditions with multilayer feature learning, and proved to be a
recursive solution towards maximum entropy principle. The connection between
DNN and maximum entropy well explains why typical designs such as shortcut and
regularization improves model generalization, and provides instructions for
future model development.Comment: 13 pages,2 figure
An Optimal Transport View on Generalization
We derive upper bounds on the generalization error of learning algorithms
based on their \emph{algorithmic transport cost}: the expected Wasserstein
distance between the output hypothesis and the output hypothesis conditioned on
an input example. The bounds provide a novel approach to study the
generalization of learning algorithms from an optimal transport view and impose
less constraints on the loss function, such as sub-gaussian or bounded. We
further provide several upper bounds on the algorithmic transport cost in terms
of total variation distance, relative entropy (or KL-divergence), and VC
dimension, thus further bridging optimal transport theory and information
theory with statistical learning theory. Moreover, we also study different
conditions for loss functions under which the generalization error of a
learning algorithm can be upper bounded by different probability metrics
between distributions relating to the output hypothesis and/or the input data.
Finally, under our established framework, we analyze the generalization in deep
learning and conclude that the generalization error in deep neural networks
(DNNs) decreases exponentially to zero as the number of layers increases. Our
analyses of generalization error in deep learning mainly exploit the
hierarchical structure in DNNs and the contraction property of -divergence,
which may be of independent interest in analyzing other learning models with
hierarchical structure.Comment: 27 pages, 2 figures, 1 tabl
Short-term Load Forecasting with Deep Residual Networks
We present in this paper a model for forecasting short-term power loads based
on deep residual networks. The proposed model is able to integrate domain
knowledge and researchers' understanding of the task by virtue of different
neural network building blocks. Specifically, a modified deep residual network
is formulated to improve the forecast results. Further, a two-stage ensemble
strategy is used to enhance the generalization capability of the proposed
model. We also apply the proposed model to probabilistic load forecasting using
Monte Carlo dropout. Three public datasets are used to prove the effectiveness
of the proposed model. Multiple test cases and comparison with existing models
show that the proposed model is able to provide accurate load forecasting
results and has high generalization capability.Comment: This paper is currently accepted by IEEE Transactions on Smart Gri
Synchronous locating and imaging behind scattering medium in a large depth based on deep learning
Scattering medium brings great difficulties to locate and image planar
objects especially when the object has a large depth. In this letter, a novel
learning-based method is presented to locate and image the object hidden behind
a thin scattering diffuser. A multi-task network, named DINet, is constructed
to predict the depth and the image of the hidden object from the captured
speckle patterns. The provided experiments verify that the proposed method
enables to locate the object with a depth mean error less than 0.05 mm, and
image the object with an average PSNR above 24 dB, in a large depth ranging
from 350 mm to 1150 mm. The constructed DINet can obtain multiple physical
information via a single speckle pattern, including both the depth and image.
Comparing with the traditional methods, it paves the way to the practical
applications requiring large imaging depth of field behind scattering media
Gradient-Free Learning Based on the Kernel and the Range Space
In this article, we show that solving the system of linear equations by
manipulating the kernel and the range space is equivalent to solving the
problem of least squares error approximation. This establishes the ground for a
gradient-free learning search when the system can be expressed in the form of a
linear matrix equation. When the nonlinear activation function is invertible,
the learning problem of a fully-connected multilayer feedforward neural network
can be easily adapted for this novel learning framework. By a series of kernel
and range space manipulations, it turns out that such a network learning boils
down to solving a set of cross-coupling equations. By having the weights
randomly initialized, the equations can be decoupled and the network solution
shows relatively good learning capability for real world data sets of small to
moderate dimensions. Based on the structural information of the matrix
equation, the network representation is found to be dependent on the number of
data samples and the output dimension.Comment: The idea of kernel and range projection was first introduced in the
IEEE/ACIS ICIS conference which was held in Singapore in June 2018. This
article presents a full development of the method supported by extensive
numerical result
A Global Algorithm for Training Multilayer Neural Networks
We present a global algorithm for training multilayer neural networks in this
Letter. The algorithm is focused on controlling the local fields of neurons
induced by the input of samples by random adaptations of the synaptic weights.
Unlike the backpropagation algorithm, the networks may have discrete-state
weights, and may apply either differentiable or nondifferentiable neural
transfer functions. A two-layer network is trained as an example to separate a
linearly inseparable set of samples into two categories, and its powerful
generalization capacity is emphasized. The extension to more general cases is
straightforward
(Yet) Another Theoretical Model of Thinking
This paper presents a theoretical, idealized model of the thinking process
with the following characteristics: 1) the model can produce complex thought
sequences and can be generalized to new inputs, 2) it can receive and maintain
input information indefinitely for the generation of thoughts and later use,
and 3) it supports learning while executing. The crux of the model lies within
the concept of internal consistency, or the generated thoughts should always be
consistent with the inputs from which they are created. Its merit, apart from
the capability to generate new creative thoughts from an internal mechanism,
depends on the potential to help training to generalize better. This is
consequently enabled by separating input information into several parts to be
handled by different processing components with a focus mechanism to fetch
information for each. This modularized view with the focus binds the model with
the computationally capable Turing machines. And as a final remark, this paper
constructively shows that the computational complexity of the model is at
least, if not surpass, that of a universal Turing machine
Human-Like Autonomous Car-Following Model with Deep Reinforcement Learning
This study proposes a framework for human-like autonomous car-following
planning based on deep reinforcement learning (deep RL). Historical driving
data are fed into a simulation environment where an RL agent learns from trial
and error interactions based on a reward function that signals how much the
agent deviates from the empirical data. Through these interactions, an optimal
policy, or car-following model that maps in a human-like way from speed,
relative speed between a lead and following vehicle, and inter-vehicle spacing
to acceleration of a following vehicle is finally obtained. The model can be
continuously updated when more data are fed in. Two thousand car-following
periods extracted from the 2015 Shanghai Naturalistic Driving Study were used
to train the model and compare its performance with that of traditional and
recent data-driven car-following models. As shown by this study results, a deep
deterministic policy gradient car-following model that uses disparity between
simulated and observed speed as the reward function and considers a reaction
delay of 1s, denoted as DDPGvRT, can reproduce human-like car-following
behavior with higher accuracy than traditional and recent data-driven
car-following models. Specifically, the DDPGvRT model has a spacing validation
error of 18% and speed validation error of 5%, which are less than those of
other models, including the intelligent driver model, models based on locally
weighted regression, and conventional neural network-based models. Moreover,
the DDPGvRT demonstrates good capability of generalization to various driving
situations and can adapt to different drivers by continuously learning. This
study demonstrates that reinforcement learning methodology can offer insight
into driver behavior and can contribute to the development of human-like
autonomous driving algorithms and traffic-flow models
Generalization and Expressivity for Deep Nets
Along with the rapid development of deep learning in practice, the
theoretical explanations for its success become urgent. Generalization and
expressivity are two widely used measurements to quantify theoretical behaviors
of deep learning. The expressivity focuses on finding functions expressible by
deep nets but cannot be approximated by shallow nets with the similar number of
neurons. It usually implies the large capacity. The generalization aims at
deriving fast learning rate for deep nets. It usually requires small capacity
to reduce the variance. Different from previous studies on deep learning,
pursuing either expressivity or generalization, we take both factors into
account to explore the theoretical advantages of deep nets. For this purpose,
we construct a deep net with two hidden layers possessing excellent
expressivity in terms of localized and sparse approximation. Then, utilizing
the well known covering number to measure the capacity, we find that deep nets
possess excellent expressive power (measured by localized and sparse
approximation) without enlarging the capacity of shallow nets. As a
consequence, we derive near optimal learning rates for implementing empirical
risk minimization (ERM) on the constructed deep nets. These results
theoretically exhibit the advantage of deep nets from learning theory
viewpoints
1D Convolutional Neural Networks and Applications: A Survey
During the last decade, Convolutional Neural Networks (CNNs) have become the
de facto standard for various Computer Vision and Machine Learning operations.
CNNs are feed-forward Artificial Neural Networks (ANNs) with alternating
convolutional and subsampling layers. Deep 2D CNNs with many hidden layers and
millions of parameters have the ability to learn complex objects and patterns
providing that they can be trained on a massive size visual database with
ground-truth labels. With a proper training, this unique ability makes them the
primary tool for various engineering applications for 2D signals such as images
and video frames. Yet, this may not be a viable option in numerous applications
over 1D signals especially when the training data is scarce or
application-specific. To address this issue, 1D CNNs have recently been
proposed and immediately achieved the state-of-the-art performance levels in
several applications such as personalized biomedical data classification and
early diagnosis, structural health monitoring, anomaly detection and
identification in power electronics and motor-fault detection. Another major
advantage is that a real-time and low-cost hardware implementation is feasible
due to the simple and compact configuration of 1D CNNs that perform only 1D
convolutions (scalar multiplications and additions). This paper presents a
comprehensive review of the general architecture and principals of 1D CNNs
along with their major engineering applications, especially focused on the
recent progress in this field. Their state-of-the-art performance is
highlighted concluding with their unique properties. The benchmark datasets and
the principal 1D CNN software used in those applications are also publically
shared in a dedicated website.Comment: 20 pages, 17 figures, MSSP (Elsevier) submissio
- …