7,727 research outputs found
Particle algorithms for optimization on binary spaces
We discuss a unified approach to stochastic optimization of pseudo-Boolean
objective functions based on particle methods, including the cross-entropy
method and simulated annealing as special cases. We point out the need for
auxiliary sampling distributions, that is parametric families on binary spaces,
which are able to reproduce complex dependency structures, and illustrate their
usefulness in our numerical experiments. We provide numerical evidence that
particle-driven optimization algorithms based on parametric families yield
superior results on strongly multi-modal optimization problems while local
search heuristics outperform them on easier problems
On clustering procedures and nonparametric mixture estimation
This paper deals with nonparametric estimation of conditional den-sities in
mixture models in the case when additional covariates are available. The
proposed approach consists of performing a prelim-inary clustering algorithm on
the additional covariates to guess the mixture component of each observation.
Conditional densities of the mixture model are then estimated using kernel
density estimates ap-plied separately to each cluster. We investigate the
expected L 1 -error of the resulting estimates and derive optimal rates of
convergence over classical nonparametric density classes provided the
clustering method is accurate. Performances of clustering algorithms are
measured by the maximal misclassification error. We obtain upper bounds of this
quantity for a single linkage hierarchical clustering algorithm. Lastly,
applications of the proposed method to mixture models involving elec-tricity
distribution data and simulated data are presented
Haplotype frequency inference from pooled genetic data with a latent multinomial model
In genetic studies, haplotype data provide more refined information than data
about separate genetic markers. However, large-scale studies that genotype
hundreds to thousands of individuals may only provide results of pooled data,
where only the total allele counts of each marker in each pool are reported.
Methods for inferring haplotype frequencies from pooled genetic data that scale
well with pool size rely on a normal approximation, which we observe to produce
unreliable inference when applied to real data. We illustrate cases where the
approximation breaks down, due to the normal covariance matrix being
near-singular. As an alternative to approximate methods, in this paper we
propose exact methods to infer haplotype frequencies from pooled genetic data
based on a latent multinomial model, where the observed allele counts are
considered integer combinations of latent, unobserved haplotype counts. One of
our methods, latent count sampling via Markov bases, achieves approximately
linear runtime with respect to pool size. Our exact methods produce more
accurate inference over existing approximate methods for synthetic data and for
data based on haplotype information from the 1000 Genomes Project. We also
demonstrate how our methods can be applied to time-series of pooled genetic
data, as a proof of concept of how our methods are relevant to more complex
hierarchical settings, such as spatiotemporal models.Comment: 35 pages, 16 figures, 3 algorithms, submitted to Biometrics journa
- …