338 research outputs found
Multi-dimensional Boltzmann Sampling of Languages
This paper addresses the uniform random generation of words from a
context-free language (over an alphabet of size ), while constraining every
letter to a targeted frequency of occurrence. Our approach consists in a
multidimensional extension of Boltzmann samplers \cite{Duchon2004}. We show
that, under mostly \emph{strong-connectivity} hypotheses, our samplers return a
word of size in and exact frequency in
expected time. Moreover, if we accept tolerance
intervals of width in for the number of occurrences of each
letters, our samplers perform an approximate-size generation of words in
expected time. We illustrate these techniques on the
generation of Tetris tessellations with uniform statistics in the different
types of tetraminoes.Comment: 12p
Exact-size Sampling for Motzkin Trees in Linear Time via Boltzmann Samplers and Holonomic Specification
International audienceBoltzmann samplers are a kind of random samplers; in 2004, Duchon, Flajolet, Louchard and Schaeffer showed that given a combinatorial class and a combinatorial specification for that class, one can automatically build a Boltzmann sampler. In this paper, we introduce a Boltzmann sampler for Motzkin trees built from a holonomic specification, that is, a specification that uses the pointing operator. This sampler is inspired by Rémy's algorithm on binary trees. We show that our algorithm gives an exact size sampler with a linear time and space complexity in average
Stochastic Synapses Enable Efficient Brain-Inspired Learning Machines
Recent studies have shown that synaptic unreliability is a robust and
sufficient mechanism for inducing the stochasticity observed in cortex. Here,
we introduce Synaptic Sampling Machines, a class of neural network models that
uses synaptic stochasticity as a means to Monte Carlo sampling and unsupervised
learning. Similar to the original formulation of Boltzmann machines, these
models can be viewed as a stochastic counterpart of Hopfield networks, but
where stochasticity is induced by a random mask over the connections. Synaptic
stochasticity plays the dual role of an efficient mechanism for sampling, and a
regularizer during learning akin to DropConnect. A local synaptic plasticity
rule implementing an event-driven form of contrastive divergence enables the
learning of generative models in an on-line fashion. Synaptic sampling machines
perform equally well using discrete-timed artificial units (as in Hopfield
networks) or continuous-timed leaky integrate & fire neurons. The learned
representations are remarkably sparse and robust to reductions in bit precision
and synapse pruning: removal of more than 75% of the weakest connections
followed by cursory re-learning causes a negligible performance loss on
benchmark classification tasks. The spiking neuron-based synaptic sampling
machines outperform existing spike-based unsupervised learners, while
potentially offering substantial advantages in terms of power and complexity,
and are thus promising models for on-line learning in brain-inspired hardware
Dynamics of Genome Rearrangement in Bacterial Populations
Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes
Self-Adapting Noise-Contrastive Estimation for Energy-Based Models
Training energy-based models (EBMs) with noise-contrastive estimation (NCE)
is theoretically feasible but practically challenging. Effective learning
requires the noise distribution to be approximately similar to the target
distribution, especially in high-dimensional domains. Previous works have
explored modelling the noise distribution as a separate generative model, and
then concurrently training this noise model with the EBM. While this method
allows for more effective noise-contrastive estimation, it comes at the cost of
extra memory and training complexity. Instead, this thesis proposes a
self-adapting NCE algorithm which uses static instances of the EBM along its
training trajectory as the noise distribution. During training, these static
instances progressively converge to the target distribution, thereby
circumventing the need to simultaneously train an auxiliary noise model.
Moreover, we express this self-adapting NCE algorithm in the framework of
Bregman divergences and show that it is a generalization of maximum likelihood
learning for EBMs. The performance of our algorithm is evaluated across a range
of noise update intervals, and experimental results show that shorter update
intervals are conducive to higher synthesis quality.Comment: MSc thesis submitted to Tsinghua University in July 202
- …