27,958 research outputs found
Structural Drift: The Population Dynamics of Sequential Learning
We introduce a theory of sequential causal inference in which learners in a
chain estimate a structural model from their upstream teacher and then pass
samples from the model to their downstream student. It extends the population
dynamics of genetic drift, recasting Kimura's selectively neutral theory as a
special case of a generalized drift process using structured populations with
memory. We examine the diffusion and fixation properties of several drift
processes and propose applications to learning, inference, and evolution. We
also demonstrate how the organization of drift process space controls fidelity,
facilitates innovations, and leads to information loss in sequential learning
with and without memory.Comment: 15 pages, 9 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/sdrift.ht
Inherent Weight Normalization in Stochastic Neural Networks
Multiplicative stochasticity such as Dropout improves the robustness and
generalizability of deep neural networks. Here, we further demonstrate that
always-on multiplicative stochasticity combined with simple threshold neurons
are sufficient operations for deep neural networks. We call such models Neural
Sampling Machines (NSM). We find that the probability of activation of the NSM
exhibits a self-normalizing property that mirrors Weight Normalization, a
previously studied mechanism that fulfills many of the features of Batch
Normalization in an online fashion. The normalization of activities during
training speeds up convergence by preventing internal covariate shift caused by
changes in the input distribution. The always-on stochasticity of the NSM
confers the following advantages: the network is identical in the inference and
learning phases, making the NSM suitable for online learning, it can exploit
stochasticity inherent to a physical substrate such as analog non-volatile
memories for in-memory computing, and it is suitable for Monte Carlo sampling,
while requiring almost exclusively addition and comparison operations. We
demonstrate NSMs on standard classification benchmarks (MNIST and CIFAR) and
event-based classification benchmarks (N-MNIST and DVS Gestures). Our results
show that NSMs perform comparably or better than conventional artificial neural
networks with the same architecture
Inference of the sparse kinetic Ising model using the decimation method
In this paper we study the inference of the kinetic Ising model on sparse
graphs by the decimation method. The decimation method, which was first
proposed in [Phys. Rev. Lett. 112, 070603] for the static inverse Ising
problem, tries to recover the topology of the inferred system by setting the
weakest couplings to zero iteratively. During the decimation process the
likelihood function is maximized over the remaining couplings. Unlike the
-optimization based methods, the decimation method does not use the
Laplace distribution as a heuristic choice of prior to select a sparse
solution. In our case, the whole process can be done automatically without
fixing any parameters by hand. We show that in the dynamical inference problem,
where the task is to reconstruct the couplings of an Ising model given the
data, the decimation process can be applied naturally into a maximum-likelihood
optimization algorithm, as opposed to the static case where pseudo-likelihood
method needs to be adopted. We also use extensive numerical studies to validate
the accuracy of our methods in dynamical inference problems. Our results
illustrate that on various topologies and with different distribution of
couplings, the decimation method outperforms the widely-used -optimization based methods.Comment: 11 pages, 5 figure
Massively parallel approximate Gaussian process regression
We explore how the big-three computing paradigms -- symmetric multi-processor
(SMC), graphical processing units (GPUs), and cluster computing -- can together
be brought to bare on large-data Gaussian processes (GP) regression problems
via a careful implementation of a newly developed local approximation scheme.
Our methodological contribution focuses primarily on GPU computation, as this
requires the most care and also provides the largest performance boost.
However, in our empirical work we study the relative merits of all three
paradigms to determine how best to combine them. The paper concludes with two
case studies. One is a real data fluid-dynamics computer experiment which
benefits from the local nature of our approximation; the second is a synthetic
data example designed to find the largest design for which (accurate) GP
emulation can performed on a commensurate predictive set under an hour.Comment: 24 pages, 6 figures, 1 tabl
Approximate Message Passing with Restricted Boltzmann Machine Priors
Approximate Message Passing (AMP) has been shown to be an excellent
statistical approach to signal inference and compressed sensing problem. The
AMP framework provides modularity in the choice of signal prior; here we
propose a hierarchical form of the Gauss-Bernouilli prior which utilizes a
Restricted Boltzmann Machine (RBM) trained on the signal support to push
reconstruction performance beyond that of simple iid priors for signals whose
support can be well represented by a trained binary RBM. We present and analyze
two methods of RBM factorization and demonstrate how these affect signal
reconstruction performance within our proposed algorithm. Finally, using the
MNIST handwritten digit dataset, we show experimentally that using an RBM
allows AMP to approach oracle-support performance
- …