5,016 research outputs found
A Scale Mixture Perspective of Multiplicative Noise in Neural Networks
Corrupting the input and hidden layers of deep neural networks (DNNs) with
multiplicative noise, often drawn from the Bernoulli distribution (or
'dropout'), provides regularization that has significantly contributed to deep
learning's success. However, understanding how multiplicative corruptions
prevent overfitting has been difficult due to the complexity of a DNN's
functional form. In this paper, we show that when a Gaussian prior is placed on
a DNN's weights, applying multiplicative noise induces a Gaussian scale
mixture, which can be reparameterized to circumvent the problematic likelihood
function. Analysis can then proceed by using a type-II maximum likelihood
procedure to derive a closed-form expression revealing how regularization
evolves as a function of the network's weights. Results show that
multiplicative noise forces weights to become either sparse or invariant to
rescaling. We find our analysis has implications for model compression as it
naturally reveals a weight pruning rule that starkly contrasts with the
commonly used signal-to-noise ratio (SNR). While the SNR prunes weights with
large variances, seeing them as noisy, our approach recognizes their robustness
and retains them. We empirically demonstrate our approach has a strong
advantage over the SNR heuristic and is competitive to retraining with soft
targets produced from a teacher model
Generalization of Extended Baum-Welch Parameter Estimation for Discriminative Training and Decoding
We demonstrate the generalizability of the Extended Baum-Welch (EBW) algorithm not only for HMM parameter estimation but for decoding as well.\ud
We show that there can exist a general function associated with the objective function under EBW that reduces to the well-known auxiliary function used in the Baum-Welch algorithm for maximum likelihood estimates.\ud
We generalize representation for the updates of model parameters by making use of a differentiable function (such as arithmetic or geometric\ud
mean) on the updated and current model parameters and describe their effect on the learning rate during HMM parameter estimation. Improvements on speech recognition tasks are also presented here
Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines
Recent studies have shown that synaptic unreliability is a robust and
sufficient mechanism for inducing the stochasticity observed in cortex. Here,
we introduce Synaptic Sampling Machines, a class of neural network models that
uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised
learning. Similar to the original formulation of Boltzmann machines, these
models can be viewed as a stochastic counterpart of Hopfield networks, but
where stochasticity is induced by a random mask over the connections. Synaptic
stochasticity plays the dual role of an efficient mechanism for sampling, and a
regularizer during learning akin to DropConnect. A local synaptic plasticity
rule implementing an event-driven form of contrastive divergence enables the
learning of generative models in an on-line fashion. Synaptic sampling machines
perform equally well using discrete-timed artificial units (as in Hopfield
networks) or continuous-timed leaky integrate & fire neurons. The learned
representations are remarkably sparse and robust to reductions in bit precision
and synapse pruning: removal of more than 75% of the weakest connections
followed by cursory re-learning causes a negligible performance loss on
benchmark classification tasks. The spiking neuron-based synaptic sampling
machines outperform existing spike-based unsupervised learners, while
potentially offering substantial advantages in terms of power and complexity,
and are thus promising models for on-line learning in brain-inspired hardware
Normal-Mixture-of-Inverse-Gamma Priors for Bayesian Regularization and Model Selection in Structured Additive Regression Models
In regression models with many potential predictors, choosing an appropriate subset of covariates and their interactions at the same time as determining whether linear or more flexible functional forms are required is a challenging and important task. We propose a spike-and-slab prior structure in order to include or exclude single coefficients as well as blocks of coefficients associated
with factor variables, random effects or basis expansions
of smooth functions. Structured additive models with this prior structure are estimated with Markov Chain Monte Carlo using a redundant multiplicative parameter expansion. We discuss shrinkage properties of the novel prior induced by the redundant parameterization, investigate its sensitivity to hyperparameter settings and compare performance of the proposed method in terms of model selection, sparsity recovery, and estimation error for Gaussian, binomial and Poisson responses on real and simulated data sets with that of component-wise boosting and other approaches
Statistical inference with anchored Bayesian mixture of regressions models: A case study analysis of allometric data
We present a case study in which we use a mixture of regressions model to
improve on an ill-fitting simple linear regression model relating log brain
mass to log body mass for 100 placental mammalian species. The slope of this
regression model is of particular scientific interest because it corresponds to
a constant that governs a hypothesized allometric power law relating brain mass
to body mass. A specific line of investigation is to determine whether the
regression parameters vary across subgroups of related species.
We model these data using an anchored Bayesian mixture of regressions model,
which modifies the standard Bayesian Gaussian mixture by pre-assigning small
subsets of observations to given mixture components with probability one. These
observations (called anchor points) break the relabeling invariance typical of
exchangeable model specifications (the so-called label-switching problem). A
careful choice of which observations to pre-classify to which mixture
components is key to the specification of a well-fitting anchor model.
In the article we compare three strategies for the selection of anchor
points. The first assumes that the underlying mixture of regressions model
holds and assigns anchor points to different components to maximize the
information about their labeling. The second makes no assumption about the
relationship between x and y and instead identifies anchor points using a
bivariate Gaussian mixture model. The third strategy begins with the assumption
that there is only one mixture regression component and identifies anchor
points that are representative of a clustering structure based on case-deletion
importance sampling weights. We compare the performance of the three strategies
on the allometric data set and use auxiliary taxonomic information about the
species to evaluate the model-based classifications estimated from these
models
Extended Object Tracking: Introduction, Overview and Applications
This article provides an elaborate overview of current research in extended
object tracking. We provide a clear definition of the extended object tracking
problem and discuss its delimitation to other types of object tracking. Next,
different aspects of extended object modelling are extensively discussed.
Subsequently, we give a tutorial introduction to two basic and well used
extended object tracking approaches - the random matrix approach and the Kalman
filter-based approach for star-convex shapes. The next part treats the tracking
of multiple extended objects and elaborates how the large number of feasible
association hypotheses can be tackled using both Random Finite Set (RFS) and
Non-RFS multi-object trackers. The article concludes with a summary of current
applications, where four example applications involving camera, X-band radar,
light detection and ranging (lidar), red-green-blue-depth (RGB-D) sensors are
highlighted.Comment: 30 pages, 19 figure
- …