35 research outputs found

    Scaling analysis of MCMC algorithms

    Get PDF
    Markov Chain Monte Carlo (MCMC) methods have become a workhorse for modern scientific computations. Practitioners utilize MCMC in many different areas of applied science yet very few rigorous results are available for justifying the use of these methods. The purpose of this dissertation is to analyse random walk type MCMC algorithms in several limiting regimes that frequently occur in applications. Scaling limits arguments are used as a unifying method for studying the asymptotic complexity of these MCMC algorithms. Two distinct strands of research are developed: (a) We analyse and prove diffusion limit results for MCMC algorithms in high or infinite dimensional state spaces. Contrarily to previous results in the literature, the target distributions that we consider do not have a product structure; this leads to Stochastic Partial Differential Equation (SPDE) limits. This proves among other things that optimal proposals results already known for product form target distributions extend to much more general settings. We then show how to use these MCMC algorithms in an infinite dimensional Hilbert space in order to imitate a gradient descent without computing any derivative. (b) We analyse the behaviour of the Random Walk Metropolis (RWM) algorithm when used to explore target distributions concentrating on the neighbourhood of a low dimensional manifold of Rn. We prove that the algorithm behaves, after being suitably rescaled, as a diffusion process evolving on a manifold

    Noisy gradient flow from a random walk in Hilbert space

    Get PDF
    Consider a probability measure on a Hilbert space defined via its density with respect to a Gaussian. The purpose of this paper is to demonstrate that an appropriately defined Markov chain, which is reversible with respect to the measure in question, exhibits a diffusion limit to a noisy gradient flow, also reversible with respect to the same measure. The Markov chain is defined by applying a Metropolis–Hastings accept–reject mechanism (Tierney, Ann Appl Probab 8:1–9, 1998) to an Ornstein–Uhlenbeck (OU) proposal which is itself reversible with respect to the underlying Gaussian measure. The resulting noisy gradient flow is a stochastic partial differential equation driven by a Wiener process with spatial correlation given by the underlying Gaussian structure. There are two primary motivations for this work. The first concerns insight into Monte Carlo Markov Chain (MCMC) methods for sampling of measures on a Hilbert space defined via a density with respect to a Gaussian measure. These measures must be approximated on finite dimensional spaces of dimension N in order to be sampled. A conclusion of the work herein is that MCMC methods based on prior-reversible OU proposals will explore the target measure in O(1) steps with respect to dimension N. This is to be contrasted with standard MCMC methods based on the random walk or Langevin proposals which require O(N) and O(N^(1/3)) steps respectively (Mattingly et al., Ann Appl Prob 2011; Pillai et al., Ann Appl Prob 22:2320–2356 2012). The second motivation relates to optimization. There are many applications where it is of interest to find global or local minima of a functional defined on an infinite dimensional Hilbert space. Gradient flow or steepest descent is a natural approach to this problem, but in its basic form requires computation of a gradient which, in some applications, may be an expensive or complex task. This paper shows that a stochastic gradient descent described by a stochastic partial differential equation can emerge from certain carefully specified Markov chains. This idea is well-known in the finite state (Kirkpatricket al., Science 220:671–680, 1983; Cerny, J Optim Theory Appl 45:41–51, 1985) or finite dimensional context (German, IEEE Trans Geosci Remote Sens 1:269–276, 1985; German, SIAM J Control Optim 24:1031, 1986; Chiang, SIAM J Control Optim 25:737–753, 1987; J Funct Anal 83:333–347, 1989). The novelty of the work in this paper is that the emergence of the noisy gradient flow is developed on an infinite dimensional Hilbert space. In the context of global optimization, when the noise level is also adjusted as part of the algorithm, methods of the type studied here go by the name of simulated–annealing; see the review (Bertsimas and Tsitsiklis, Stat Sci 8:10–15, 1993) for further references. Although we do not consider adjusting the noise-level as part of the algorithm, the noise strength is a tuneable parameter in our construction and the methods developed here could potentially be used to study simulated annealing in a Hilbert space setting. The transferable idea behind this work is that conceiving of algorithms directly in the infinite dimensional setting leads to methods which are robust to finite dimensional approximation. We emphasize that discretizing, and then applying standard finite dimensional techniques in ℝ^N, to either sample or optimize, can lead to algorithms which degenerate as the dimension N increases

    Noisy gradient flow from a random walk in Hilbert space

    Get PDF
    Consider a probability measure on a Hilbert space defined via its density with respect to a Gaussian. The purpose of this paper is to demonstrate that an appropriately defined Markov chain, which is reversible with respect to the measure in question, exhibits a diffusion limit to a noisy gradient flow, also reversible with respect to the same measure. The Markov chain is defined by applying a Metropolis–Hastings accept–reject mechanism (Tierney, Ann Appl Probab 8:1–9, 1998) to an Ornstein–Uhlenbeck (OU) proposal which is itself reversible with respect to the underlying Gaussian measure. The resulting noisy gradient flow is a stochastic partial differential equation driven by a Wiener process with spatial correlation given by the underlying Gaussian structure. There are two primary motivations for this work. The first concerns insight into Monte Carlo Markov Chain (MCMC) methods for sampling of measures on a Hilbert space defined via a density with respect to a Gaussian measure. These measures must be approximated on finite dimensional spaces of dimension N in order to be sampled. A conclusion of the work herein is that MCMC methods based on prior-reversible OU proposals will explore the target measure in O(1) steps with respect to dimension N. This is to be contrasted with standard MCMC methods based on the random walk or Langevin proposals which require O(N) and O(N^(1/3)) steps respectively (Mattingly et al., Ann Appl Prob 2011; Pillai et al., Ann Appl Prob 22:2320–2356 2012). The second motivation relates to optimization. There are many applications where it is of interest to find global or local minima of a functional defined on an infinite dimensional Hilbert space. Gradient flow or steepest descent is a natural approach to this problem, but in its basic form requires computation of a gradient which, in some applications, may be an expensive or complex task. This paper shows that a stochastic gradient descent described by a stochastic partial differential equation can emerge from certain carefully specified Markov chains. This idea is well-known in the finite state (Kirkpatricket al., Science 220:671–680, 1983; Cerny, J Optim Theory Appl 45:41–51, 1985) or finite dimensional context (German, IEEE Trans Geosci Remote Sens 1:269–276, 1985; German, SIAM J Control Optim 24:1031, 1986; Chiang, SIAM J Control Optim 25:737–753, 1987; J Funct Anal 83:333–347, 1989). The novelty of the work in this paper is that the emergence of the noisy gradient flow is developed on an infinite dimensional Hilbert space. In the context of global optimization, when the noise level is also adjusted as part of the algorithm, methods of the type studied here go by the name of simulated–annealing; see the review (Bertsimas and Tsitsiklis, Stat Sci 8:10–15, 1993) for further references. Although we do not consider adjusting the noise-level as part of the algorithm, the noise strength is a tuneable parameter in our construction and the methods developed here could potentially be used to study simulated annealing in a Hilbert space setting. The transferable idea behind this work is that conceiving of algorithms directly in the infinite dimensional setting leads to methods which are robust to finite dimensional approximation. We emphasize that discretizing, and then applying standard finite dimensional techniques in ℝ^N, to either sample or optimize, can lead to algorithms which degenerate as the dimension N increases

    Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

    Full text link
    Rapid advances in high-throughput DNA sequencing technologies have enabled the conduct of whole genome sequencing (WGS) studies, and several bioinformatics pipelines have become available. The aim of this study was the comparison of 6 WGS data pre-processing pipelines, involving two mapping and alignment approaches (GATK utilizing BWA-MEM2 2.2.1, and DRAGEN 3.8.4) and three variant calling pipelines (GATK 4.2.4.1, DRAGEN 3.8.4 and DeepVariant 1.1.0). We sequenced one genome in a bottle (GIAB) sample 70 times in different runs, and one GIAB trio in triplicate. The truth set of the GIABs was used for comparison, and performance was assessed by computation time, F1 score, precision, and recall. In the mapping and alignment step, the DRAGEN pipeline was faster than the GATK with BWA-MEM2 pipeline. DRAGEN showed systematically higher F1 score, precision, and recall values than GATK for single nucleotide variations (SNVs) and Indels in simple-to-map, complex-to-map, coding and non-coding regions. In the variant calling step, DRAGEN was fastest. In terms of accuracy, DRAGEN and DeepVariant performed similarly and both superior to GATK, with slight advantages for DRAGEN for Indels and for DeepVariant for SNVs. The DRAGEN pipeline showed the lowest Mendelian inheritance error fraction for the GIAB trios. Mapping and alignment played a key role in variant calling of WGS, with the DRAGEN outperforming GATK

    OCT-GAN: single step shadow and noise removal from optical coherence tomography images of the human optic nerve head

    Get PDF
    Speckle noise and retinal shadows within OCT B-scans occlude important edges, fine textures and deep tissues, preventing accurate and robust diagnosis by algorithms and clinicians. We developed a single process that successfully removed both noise and retinal shadows from unseen single-frame B-scans within 10.4ms. Mean average gradient magnitude (AGM) for the proposed algorithm was 57.2% higher than current state-of-the-art, while mean peak signal to noise ratio (PSNR), contrast to noise ratio (CNR), and structural similarity index metric (SSIM) increased by 11.1%, 154% and 187% respectively compared to single-frame B-scans. Mean intralayer contrast (ILC) improvement for the retinal nerve fiber layer (RNFL), photoreceptor layer (PR) and retinal pigment epithelium (RPE) layers decreased from 0.362 ± 0.133 to 0.142 ± 0.102, 0.449 ± 0.116 to 0.0904 ± 0.0769, 0.381 ± 0.100 to 0.0590 ± 0.0451 respectively. The proposed algorithm reduces the necessity for long image acquisition times, minimizes expensive hardware requirements and reduces motion artifacts in OCT images

    The genomic origins of the world’s first farmers

    Get PDF
    The precise genetic origins of the first Neolithic farming populations in Europe and Southwest Asia, as well as the processes and the timing of their differentiation, remain largely unknown. Demogenomic modeling of high-quality ancient genomes reveals that the early farmers of Anatolia and Europe emerged from a multiphase mixing of a Southwest Asian population with a strongly bottlenecked western hunter-gatherer population after the last glacial maximum. Moreover, the ancestors of the first farmers of Europe and Anatolia went through a period of extreme genetic drift during their westward range expansion, contributing highly to their genetic distinctiveness. This modeling elucidates the demographic processes at the root of the Neolithic transition and leads to a spatial interpretation of the population history of Southwest Asia and Europe during the late Pleistocene and early Holocene.Open access articleThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Defining the importance of landscape metrics for large branchiopod biodiversity and conservation: the case of the Iberian Peninsula and Balearic Islands

    Get PDF
    The deficiency in the distributional data of invertebrate taxa is one of the major impediments acting on the bias towards the low awareness of its conservation status. The present study sets a basic framework to understand the large branchiopods distribution in the Iberian Peninsula and Balearic Islands. Since the extensive surveys performed in the late 1980s, no more studies existed updating the information for the whole studied area. The present study fills the gap, gathering together all available information on large branchiopods distribution since 1995, and analysing the effect of human population density and several landscape characteristics on their distribution, taking into consideration different spatial scales (100 m, 1 km and 10 km). In overall, 28 large branchiopod taxa (17 anostracans, 7 notostracans and 4 spinicaudatans) are known to occur in the area. Approximately 30% of the sites hosted multiple species, with a maximum of 6 species. Significant positive co-occurring species pairs were found clustered together, forming 4 different associations of large branchiopod species. In general, species clustered in the same group showed similar responses to analysed landscape characteristics, usually showing a better fit at higher spatial scales.Brazilian Conselho Nacional de Desenvolvimento Cientifico e Tecnologico-CNPq [401045/2014-5]Spanish Ministry of Education, Culture and Sport [FPU014/06783]info:eu-repo/semantics/publishedVersio
    corecore