2,204 research outputs found
Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions
Quantification of the stationary points and the associated basins of
attraction of neural network loss surfaces is an important step towards a
better understanding of neural network loss surfaces at large. This work
proposes a novel method to visualise basins of attraction together with the
associated stationary points via gradient-based random sampling. The proposed
technique is used to perform an empirical study of the loss surfaces generated
by two different error metrics: quadratic loss and entropic loss. The empirical
observations confirm the theoretical hypothesis regarding the nature of neural
network attraction basins. Entropic loss is shown to exhibit stronger gradients
and fewer stationary points than quadratic loss, indicating that entropic loss
has a more searchable landscape. Quadratic loss is shown to be more resilient
to overfitting than entropic loss. Both losses are shown to exhibit local
minima, but the number of local minima is shown to decrease with an increase in
dimensionality. Thus, the proposed visualisation technique successfully
captures the local minima properties exhibited by the neural network loss
surfaces, and can be used for the purpose of fitness landscape analysis of
neural networks.Comment: Preprint submitted to the Neural Networks journa
Mish: A Self Regularized Non-Monotonic Activation Function
We propose , a novel self-regularized non-monotonic activation
function which can be mathematically defined as: . As
activation functions play a crucial role in the performance and training
dynamics in neural networks, we validated experimentally on several well-known
benchmarks against the best combinations of architectures and activation
functions. We also observe that data augmentation techniques have a favorable
effect on benchmarks like ImageNet-1k and MS-COCO across multiple
architectures. For example, Mish outperformed Leaky ReLU on YOLOv4 with a
CSP-DarkNet-53 backbone on average precision () by 2.1 in
MS-COCO object detection and ReLU on ResNet-50 on ImageNet-1k in Top-1 accuracy
by 1 while keeping all other network parameters and
hyperparameters constant. Furthermore, we explore the mathematical formulation
of Mish in relation with the Swish family of functions and propose an intuitive
understanding on how the first derivative behavior may be acting as a
regularizer helping the optimization of deep neural networks. Code is publicly
available at https://github.com/digantamisra98/Mish.Comment: Accepted to BMVC 202
Deep Built-Structure Counting in Satellite Imagery Using Attention Based Re-Weighting
In this paper, we attempt to address the challenging problem of counting
built-structures in the satellite imagery. Building density is a more accurate
estimate of the population density, urban area expansion and its impact on the
environment, than the built-up area segmentation. However, building shape
variances, overlapping boundaries, and variant densities make this a complex
task. To tackle this difficult problem, we propose a deep learning based
regression technique for counting built-structures in satellite imagery. Our
proposed framework intelligently combines features from different regions of
satellite image using attention based re-weighting techniques. Multiple
parallel convolutional networks are designed to capture information at
different granulates. These features are combined into the FusionNet which is
trained to weigh features from different granularity differently, allowing us
to predict a precise building count. To train and evaluate the proposed method,
we put forward a new large-scale and challenging built-structure-count dataset.
Our dataset is constructed by collecting satellite imagery from diverse
geographical areas (planes, urban centers, deserts, etc.,) across the globe
(Asia, Europe, North America, and Africa) and captures the wide density of
built structures. Detailed experimental results and analysis validate the
proposed technique. FusionNet has Mean Absolute Error of 3.65 and R-squared
measure of 88% over the testing data. Finally, we perform the test on the 274:3
? 103 m2 of the unseen region, with the error of 19 buildings off the 656
buildings in that area
Denoising Autoencoders for fast Combinatorial Black Box Optimization
Estimation of Distribution Algorithms (EDAs) require flexible probability
models that can be efficiently learned and sampled. Autoencoders (AE) are
generative stochastic networks with these desired properties. We integrate a
special type of AE, the Denoising Autoencoder (DAE), into an EDA and evaluate
the performance of DAE-EDA on several combinatorial optimization problems with
a single objective. We asses the number of fitness evaluations as well as the
required CPU times. We compare the results to the performance to the Bayesian
Optimization Algorithm (BOA) and RBM-EDA, another EDA which is based on a
generative neural network which has proven competitive with BOA. For the
considered problem instances, DAE-EDA is considerably faster than BOA and
RBM-EDA, sometimes by orders of magnitude. The number of fitness evaluations is
higher than for BOA, but competitive with RBM-EDA. These results show that DAEs
can be useful tools for problems with low but non-negligible fitness evaluation
costs.Comment: corrected typos and small inconsistencie
An Approximate Backpropagation Learning Rule for Memristor Based Neural Networks Using Synaptic Plasticity
We describe an approximation to backpropagation algorithm for training deep
neural networks, which is designed to work with synapses implemented with
memristors. The key idea is to represent the values of both the input signal
and the backpropagated delta value with a series of pulses that trigger
multiple positive or negative updates of the synaptic weight, and to use the
min operation instead of the product of the two signals. In computational
simulations, we show that the proposed approximation to backpropagation is well
converged and may be suitable for memristor implementations of multilayer
neural networks.Comment: 21 pages, 6 figures, 1 table, title changed, manuscript thoroughly
rewritte
Application of Deep Learning on Predicting Prognosis of Acute Myeloid Leukemia with Cytogenetics, Age, and Mutations
We explore how Deep Learning (DL) can be utilized to predict prognosis of
acute myeloid leukemia (AML). Out of TCGA (The Cancer Genome Atlas) database,
94 AML cases are used in this study. Input data include age, 10 common
cytogenetic and 23 most common mutation results; output is the prognosis
(diagnosis to death, DTD). In our DL network, autoencoders are stacked to form
a hierarchical DL model from which raw data are compressed and organized and
high-level features are extracted. The network is written in R language and is
designed to predict prognosis of AML for a given case (DTD of more than or less
than 730 days). The DL network achieves an excellent accuracy of 83% in
predicting prognosis. As a proof-of-concept study, our preliminary results
demonstrate a practical application of DL in future practice of prognostic
prediction using next-gen sequencing (NGS) data.Comment: 11 pages, 1 table, 1 figure. arXiv admin note: substantial text
overlap with arXiv:1801.0101
Learning, Memory, and the Role of Neural Network Architecture
The performance of information processing systems, from artificial neural networks to natural neuronal ensembles, depends heavily on the underlying system architecture. In this study, we compare the performance of parallel and layered network architectures during sequential tasks that require both acquisition and retention of information, thereby identifying tradeoffs between learning and memory processes. During the task of supervised, sequential function approximation, networks produce and adapt representations of external information. Performance is evaluated by statistically analyzing the error in these representations while varying the initial network state, the structure of the external information, and the time given to learn the information. We link performance to complexity in network architecture by characterizing local error landscape curvature. We find that variations in error landscape structure give rise to tradeoffs in performance; these include the ability of the network to maximize accuracy versus minimize inaccuracy and produce specific versus generalizable representations of information. Parallel networks generate smooth error landscapes with deep, narrow minima, enabling them to find highly specific representations given sufficient time. While accurate, however, these representations are difficult to generalize. In contrast, layered networks generate rough error landscapes with a variety of local minima, allowing them to quickly find coarse representations. Although less accurate, these representations are easily adaptable. The presence of measurable performance tradeoffs in both layered and parallel networks has implications for understanding the behavior of a wide variety of natural and artificial learning systems
On the energy landscape of deep networks
We introduce "AnnealSGD", a regularized stochastic gradient descent algorithm
motivated by an analysis of the energy landscape of a particular class of deep
networks with sparse random weights. The loss function of such networks can be
approximated by the Hamiltonian of a spherical spin glass with Gaussian
coupling. While different from currently-popular architectures such as
convolutional ones, spin glasses are amenable to analysis, which provides
insights on the topology of the loss function and motivates algorithms to
minimize it. Specifically, we show that a regularization term akin to a
magnetic field can be modulated with a single scalar parameter to transition
the loss function from a complex, non-convex landscape with exponentially many
local minima, to a phase with a polynomial number of minima, all the way down
to a trivial landscape with a unique minimum. AnnealSGD starts training in the
relaxed polynomial regime and gradually tightens the regularization parameter
to steer the energy towards the original exponential regime. Even for
convolutional neural networks, which are quite unlike sparse random networks,
we empirically show that AnnealSGD improves the generalization error using
competitive baselines on MNIST and CIFAR-10
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
We describe an approach to understand the peculiar and counterintuitive
generalization properties of deep neural networks. The approach involves going
beyond worst-case theoretical capacity control frameworks that have been
popular in machine learning in recent years to revisit old ideas in the
statistical mechanics of neural networks. Within this approach, we present a
prototypical Very Simple Deep Learning (VSDL) model, whose behavior is
controlled by two control parameters, one describing an effective amount of
data, or load, on the network (that decreases when noise is added to the
input), and one with an effective temperature interpretation (that increases
when algorithms are early stopped). Using this model, we describe how a very
simple application of ideas from the statistical mechanics theory of
generalization provides a strong qualitative description of recently-observed
empirical results regarding the inability of deep neural networks not to
overfit training data, discontinuous learning and sharp transitions in the
generalization properties of learning algorithms, etc.Comment: 31 pages; added brief discussion of recent papers that use/extend
these idea
High Dimensional Spaces, Deep Learning and Adversarial Examples
In this paper, we analyze deep learning from a mathematical point of view and
derive several novel results. The results are based on intriguing mathematical
properties of high dimensional spaces. We first look at perturbation based
adversarial examples and show how they can be understood using topological and
geometrical arguments in high dimensions. We point out mistake in an argument
presented in prior published literature, and we present a more rigorous,
general and correct mathematical result to explain adversarial examples in
terms of topology of image manifolds. Second, we look at optimization
landscapes of deep neural networks and examine the number of saddle points
relative to that of local minima. Third, we show how multiresolution nature of
images explains perturbation based adversarial examples in form of a stronger
result. Our results state that expectation of -norm of adversarial
perturbations is and therefore shrinks to 0
as image resolution becomes arbitrarily large. Finally, by incorporating
the parts-whole manifold learning hypothesis for natural images, we investigate
the working of deep neural networks and root causes of adversarial examples and
discuss how future improvements can be made and how adversarial examples can be
eliminated.Comment: 29 pages, 15 figure
- …