7,471 research outputs found
Training Faster by Separating Modes of Variation in Batch-normalized Models
Batch Normalization (BN) is essential to effectively train state-of-the-art
deep Convolutional Neural Networks (CNN). It normalizes inputs to the layers
during training using the statistics of each mini-batch. In this work, we study
BN from the viewpoint of Fisher kernels. We show that assuming samples within a
mini-batch are from the same probability density function, then BN is identical
to the Fisher vector of a Gaussian distribution. That means BN can be explained
in terms of kernels that naturally emerge from the probability density function
of the underlying data distribution. However, given the rectifying
non-linearities employed in CNN architectures, distribution of inputs to the
layers show heavy tail and asymmetric characteristics. Therefore, we propose
approximating underlying data distribution not with one, but a mixture of
Gaussian densities. Deriving Fisher vector for a Gaussian Mixture Model (GMM),
reveals that BN can be improved by independently normalizing with respect to
the statistics of disentangled sub-populations. We refer to our proposed soft
piecewise version of BN as Mixture Normalization (MN). Through extensive set of
experiments on CIFAR-10 and CIFAR-100, we show that MN not only effectively
accelerates training image classification and Generative Adversarial networks,
but also reaches higher quality models
A probabilistic model for learning in cortical microcircuit motifs with data-based divisive inhibition
Previous theoretical studies on the interaction of excitatory and inhibitory
neurons proposed to model this cortical microcircuit motif as a so-called
Winner-Take-All (WTA) circuit. A recent modeling study however found that the
WTA model is not adequate for data-based softer forms of divisive inhibition as
found in a microcircuit motif in cortical layer 2/3. We investigate here
through theoretical analysis the role of such softer divisive inhibition for
the emergence of computational operations and neural codes under spike-timing
dependent plasticity (STDP). We show that in contrast to WTA models - where the
network activity has been interpreted as probabilistic inference in a
generative mixture distribution - this network dynamics approximates inference
in a noisy-OR-like generative model that explains the network input based on
multiple hidden causes. Furthermore, we show that STDP optimizes the parameters
of this model by approximating online the expectation maximization (EM)
algorithm. This theoretical analysis corroborates a preceding modelling study
which suggested that the learning dynamics of this layer 2/3 microcircuit motif
extracts a specific modular representation of the input and thus performs blind
source separation on the input statistics.Comment: 24 pages, 5 figure
On Generalized Bayesian Data Fusion with Complex Models in Large Scale Networks
Recent advances in communications, mobile computing, and artificial
intelligence have greatly expanded the application space of intelligent
distributed sensor networks. This in turn motivates the development of
generalized Bayesian decentralized data fusion (DDF) algorithms for robust and
efficient information sharing among autonomous agents using probabilistic
belief models. However, DDF is significantly challenging to implement for
general real-world applications requiring the use of dynamic/ad hoc network
topologies and complex belief models, such as Gaussian mixtures or hybrid
Bayesian networks. To tackle these issues, we first discuss some new key
mathematical insights about exact DDF and conservative approximations to DDF.
These insights are then used to develop novel generalized DDF algorithms for
complex beliefs based on mixture pdfs and conditional factors. Numerical
examples motivated by multi-robot target search demonstrate that our methods
lead to significantly better fusion results, and thus have great potential to
enhance distributed intelligent reasoning in sensor networks.Comment: Revised version of paper submitted to 2013 Workshop on Wireless
Intelligent Sensor Networks (WISeNET 2013) at Duke University, June 5, 201
HyperAdam: A Learnable Task-Adaptive Adam for Network Training
Deep neural networks are traditionally trained using human-designed
stochastic optimization algorithms, such as SGD and Adam. Recently, the
approach of learning to optimize network parameters has emerged as a promising
research topic. However, these learned black-box optimizers sometimes do not
fully utilize the experience in human-designed optimizers, therefore have
limitation in generalization ability. In this paper, a new optimizer, dubbed as
\textit{HyperAdam}, is proposed that combines the idea of "learning to
optimize" and traditional Adam optimizer. Given a network for training, its
parameter update in each iteration generated by HyperAdam is an adaptive
combination of multiple updates generated by Adam with varying decay rates. The
combination weights and decay rates in HyperAdam are adaptively learned
depending on the task. HyperAdam is modeled as a recurrent neural network with
AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for
various network training, such as multilayer perceptron, CNN and LSTM
Machine Learning for Wireless Communications in the Internet of Things: A Comprehensive Survey
The Internet of Things (IoT) is expected to require more effective and
efficient wireless communications than ever before. For this reason, techniques
such as spectrum sharing, dynamic spectrum access, extraction of signal
intelligence and optimized routing will soon become essential components of the
IoT wireless communication paradigm. Given that the majority of the IoT will be
composed of tiny, mobile, and energy-constrained devices, traditional
techniques based on a priori network optimization may not be suitable, since
(i) an accurate model of the environment may not be readily available in
practical scenarios; (ii) the computational requirements of traditional
optimization techniques may prove unbearable for IoT devices. To address the
above challenges, much research has been devoted to exploring the use of
machine learning to address problems in the IoT wireless communications domain.
This work provides a comprehensive survey of the state of the art in the
application of machine learning techniques to address key problems in IoT
wireless communications with an emphasis on its ad hoc networking aspect.
First, we present extensive background notions of machine learning techniques.
Then, by adopting a bottom-up approach, we examine existing work on machine
learning for the IoT at the physical, data-link and network layer of the
protocol stack. Thereafter, we discuss directions taken by the community
towards hardware implementation to ensure the feasibility of these techniques.
Additionally, before concluding, we also provide a brief discussion of the
application of machine learning in IoT beyond wireless communication. Finally,
each of these discussions is accompanied by a detailed analysis of the related
open problems and challenges.Comment: Ad Hoc Networks Journa
Neural Simpletrons - Minimalistic Directed Generative Networks for Learning with Few Labels
Classifiers for the semi-supervised setting often combine strong supervised
models with additional learning objectives to make use of unlabeled data. This
results in powerful though very complex models that are hard to train and that
demand additional labels for optimal parameter tuning, which are often not
given when labeled data is very sparse. We here study a minimalistic
multi-layer generative neural network for semi-supervised learning in a form
and setting as similar to standard discriminative networks as possible. Based
on normalized Poisson mixtures, we derive compact and local learning and neural
activation rules. Learning and inference in the network can be scaled using
standard deep learning tools for parallelized GPU implementation. With the
single objective of likelihood optimization, both labeled and unlabeled data
are naturally incorporated into learning. Empirical evaluations on standard
benchmarks show, that for datasets with few labels the derived minimalistic
network improves on all classical deep learning approaches and is competitive
with their recent variants without the need of additional labels for parameter
tuning. Furthermore, we find that the studied network is the best performing
monolithic (`non-hybrid') system for few labels, and that it can be applied in
the limit of very few labels, where no other system has been reported to
operate so far
From Bayesian Sparsity to Gated Recurrent Nets
The iterations of many first-order algorithms, when applied to minimizing
common regularized regression functions, often resemble neural network layers
with pre-specified weights. This observation has prompted the development of
learning-based approaches that purport to replace these iterations with
enhanced surrogates forged as DNN models from available training data. For
example, important NP-hard sparse estimation problems have recently benefitted
from this genre of upgrade, with simple feedforward or recurrent networks
ousting proximal gradient-based iterations. Analogously, this paper
demonstrates that more powerful Bayesian algorithms for promoting sparsity,
which rely on complex multi-loop majorization-minimization techniques, mirror
the structure of more sophisticated long short-term memory (LSTM) networks, or
alternative gated feedback networks previously designed for sequence
prediction. As part of this development, we examine the parallels between
latent variable trajectories operating across multiple time-scales during
optimization, and the activations within deep network structures designed to
adaptively model such characteristic sequences. The resulting insights lead to
a novel sparse estimation system that, when granted training data, can estimate
optimal solutions efficiently in regimes where other algorithms fail, including
practical direction-of-arrival (DOA) and 3D geometry recovery problems. The
underlying principles we expose are also suggestive of a learning process for a
richer class of multi-loop algorithms in other domains
High-dimensional Time Series Prediction with Missing Values
High-dimensional time series prediction is needed in applications as diverse
as demand forecasting and climatology. Often, such applications require methods
that are both highly scalable, and deal with noisy data in terms of corruptions
or missing values. Classical time series methods usually fall short of handling
both these issues. In this paper, we propose to adapt matrix matrix completion
approaches that have previously been successfully applied to large scale noisy
data, but which fail to adequately model high-dimensional time series due to
temporal dependencies. We present a novel temporal regularized matrix
factorization (TRMF) framework which supports data-driven temporal dependency
learning and enables forecasting ability to our new matrix factorization
approach. TRMF is highly general, and subsumes many existing matrix
factorization approaches for time series data. We make interesting connections
to graph regularized matrix factorization methods in the context of learning
the dependencies. Experiments on both real and synthetic data show that TRMF
outperforms several existing approaches for common time series tasks
Multiscale CNN based Deep Metric Learning for Bioacoustic Classification: Overcoming Training Data Scarcity Using Dynamic Triplet Loss
This paper proposes multiscale convolutional neural network (CNN)-based deep
metric learning for bioacoustic classification, under low training data
conditions. The proposed CNN is characterized by the utilization of four
different filter sizes at each level to analyze input feature maps. This
multiscale nature helps in describing different bioacoustic events effectively:
smaller filters help in learning the finer details of bioacoustic events,
whereas, larger filters help in analyzing a larger context leading to global
details. A dynamic triplet loss is employed in the proposed CNN architecture to
learn a transformation from the input space to the embedding space, where
classification is performed. The triplet loss helps in learning this
transformation by analyzing three examples, referred to as triplets, at a time
where intra-class distance is minimized while maximizing the inter-class
separation by a dynamically increasing margin. The number of possible triplets
increases cubically with the dataset size, making triplet loss more suitable
than the softmax cross-entropy loss in low training data conditions.
Experiments on three different publicly available datasets show that the
proposed framework performs better than existing bioacoustic classification
frameworks. Experimental results also confirm the superiority of the triplet
loss over the cross-entropy loss in low training data conditionsComment: Under Review at JASA. Primitive version of paper. We are still
working on getting better performances out of the comparative method
Machine learning in acoustics: theory and applications
Acoustic data provide scientific and engineering insights in fields ranging
from biology and communications to ocean and Earth science. We survey the
recent advances and transformative potential of machine learning (ML),
including deep learning, in the field of acoustics. ML is a broad family of
techniques, which are often based in statistics, for automatically detecting
and utilizing patterns in data. Relative to conventional acoustics and signal
processing, ML is data-driven. Given sufficient training data, ML can discover
complex relationships between features and desired labels or actions, or
between features themselves. With large volumes of training data, ML can
discover models describing complex acoustic phenomena such as human speech and
reverberation. ML in acoustics is rapidly developing with compelling results
and significant future promise. We first introduce ML, then highlight ML
developments in four acoustics research areas: source localization in speech
processing, source localization in ocean acoustics, bioacoustics, and
environmental sounds in everyday scenes.Comment: Published with free access in Journal of the Acoustical Society of
America, 27 Nov. 201
- …