686 research outputs found

    Orthogonal parallel MCMC methods for sampling and optimization

    Full text link
    Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called {\it orthogonal MCMC} (O-MCMC), where a set of "vertical" parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes in order to reduce the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and the choice of the parameters

    On Identifying the Optimal Number of Population Clusters via the Deviance Information Criterion

    Get PDF
    Inferring population structure using Bayesian clustering programs often requires a priori specification of the number of subpopulations, , from which the sample has been drawn. Here, we explore the utility of a common Bayesian model selection criterion, the Deviance Information Criterion (DIC), for estimating . We evaluate the accuracy of DIC, as well as other popular approaches, on datasets generated by coalescent simulations under various demographic scenarios. We find that DIC outperforms competing methods in many genetic contexts, validating its application in assessing population structure

    Gene loss and lineage specific restriction-modification systems associated with niche differentiation in the Campylobacter jejuni Sequence Type 403 clonal complex

    Get PDF
    Campylobacter jejuni is a highly diverse species of bacteria commonly associated with infectious intestinal disease of humans and zoonotic carriage in poultry, cattle, pigs, and other animals. The species contains a large number of distinct clonal complexes that vary from host generalist lineages commonly found in poultry, livestock, and human disease cases to host-adapted specialized lineages primarily associated with livestock or poultry. Here, we present novel data on the ST403 clonal complex of C. jejuni, a lineage that has not been reported in avian hosts. Our data show that the lineage exhibits a distinctive pattern of intralineage recombination that is accompanied by the presence of lineage-specific restriction-modification systems. Furthermore, we show that the ST403 complex has undergone gene decay at a number of loci. Our data provide a putative link between the lack of association with avian hosts of C. jejuni ST403 and both gene gain and gene loss through nonsense mutations in coding sequences of genes, resulting in pseudogene formation

    Bayesian modeling of recombination events in bacterial populations

    Get PDF
    Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases. Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites. Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/ mnf//mate/jc/software/brat.html

    Directional gene flow and ecological separation in Yersinia enterocolitica

    Get PDF
    Yersinia enterocolitica is a common cause of food-borne gastroenteritis worldwide. Recent work defining the phylogeny of the genus Yersinia subdivided Y. enterocolitica into six distinct phylogroups. Here, we provide detailed analyses of the evolutionary processes leading to the emergence of these phylogroups. The dominant phylogroups isolated from human infections, PG3–5, show very little diversity at the sequence level, but do present marked patterns of gain and loss of functions, including those involved in pathogenicity and metabolism, including the acquisition of phylogroup-specific O-antigen loci. We tracked gene flow across the species in the core and accessory genome, and show that the non-pathogenic PG1 strains act as a reservoir for diversity, frequently acting as donors in recombination events. Analysis of the core and accessory genome also suggested that the different Y. enterocolitica phylogroups may be ecologically separated, in contrast to the long-held belief of common shared ecological niches across the Y. enterocolitica species

    Machine learning accelerated likelihood-free event reconstruction in dark matter direct detection

    Get PDF
    Reconstructing the position of an interaction for any dual-phase time projection chamber (TPC) with the best precision is key to directly detecting Dark Matter. Using the likelihood-free framework, a newalgorithm to reconstruct the 2-D (x; y) position and the size of the charge signal (e) of an interaction is presented. The algorithm uses the secondary scintillation light distribution (S2) obtained by simulating events using a waveform generator. To deal with the computational effort required by the likelihood-free approach, we employ the Bayesian Optimization for LikelihoodFree Inference (BOLFI) algorithm. Together with BOLFI, prior distributions for the parameters of interest (x; y; e) and highly informative discrepancy measures to performthe analyses are introduced. We evaluate the quality of the proposed algorithm by a comparison against the currently existing alternative methods using a large-scale simulation study. BOLFI provides a natural probabilistic uncertainty measure for the reconstruction and it improved the accuracy of the reconstruction over the next best algorithm by up to 15% when focusing on events at large radii (R > 30 cm, the outer 37% of the detector). In addition, BOLFI provides the smallest uncertainties among all the tested methods.Peer reviewe

    Integrating genetic analysis of mixed populations with a spatially explicit population dynamics model

    Get PDF
    1. Inferring the dynamics of populations in time and space is a central challenge in ecology. Intra-specific structure (for example genetically distinct sub-populations or meta-populations) may require methods that can jointly infer the dynamics of multiple populations. This is of particular importance for harvested species, for which management must balance utilization of productive populations with protection of weak ones. 2. Here we present a novel method for simultaneous learning about the spatio-temporal dynamics of multiple populations that combines genetic data with prior information about abundance and movement, akin to an integrated population modelling approach. We apply the Bayesian genetic mixed stock analysis to 17 wild and 10 hatchery-reared Baltic salmon (S. salar) stocks, quantifying uncertainty in stock composition in time and space, and in population dynamics parameters such as migration timing and speed. 3. The genetic data were informative about stock-specific movement patterns, updating priors for migration path, timing and speed. Use of a population dynamics model allowed robust interpolation of expected catch composition at areas and times with no genetic observations. Our results indicate that the commonly used "equal prior probabilities" assumption may not be appropriate for all mixed stock analyses: incorporation of prior information about stock abundance and movement resulted in more plausible and precise estimates of mixture compositions in time and space. 4. The model we present here forms the basis for optimizing the spatial and temporal allocation of harvest to support the management of mixed populations of migratory species.Peer reviewe

    Identifying Currents in the Gene Pool for Bacterial Populations Using an Integrative Approach

    Get PDF
    The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html

    Accounting for stellar activity signals in radial-velocity data by using change point detection techniques star

    Get PDF
    Context. Active regions on the photosphere of a star have been the major obstacle for detecting Earth-like exoplanets using the radial velocity (RV) method. A commonly employed solution for addressing stellar activity is to assume a linear relationship between the RV observations and the activity indicators along the entire time series, and then remove the estimated contribution of activity from the variation in RV data (overall correction method). However, since active regions evolve on the photosphere over time, correlations between the RV observations and the activity indicators will correspondingly be anisotropic. Aims. We present an approach that recognizes the RV locations where the correlations between the RV and the activity indicators significantly change in order to better account for variations in RV caused by stellar activity. Methods. The proposed approach uses a general family of statistical breakpoint methods, often referred to as change point detection (CPD) algorithms; several implementations of which are available in R and python. A thorough comparison is made between the breakpoint-based approach and the overall correction method. To ensure wide representativity, we use measurements from real stars that have different levels of stellar activity and whose spectra have different signal-to-noise ratios. Results. When the corrections for stellar activity are applied separately to each temporal segment identified by the breakpoint method, the corresponding residuals in the RV time series are typically much smaller than those obtained by the overall correction method. Consequently, the generalized Lomb-Scargle periodogram contains a smaller number of peaks caused by active regions. The CPD algorithm is particularly effective when focusing on active stars with long time series, such as alpha Cen B. In that case, we demonstrate that the breakpoint method improves the detection limit of exoplanets by 74% on average with respect to the overall correction method. Conclusions. CPD algorithms provide a useful statistical framework for estimating the presence of change points in a time series. Since the process underlying the RV measurements generates anisotropic data by its intrinsic properties, it is natural to use CPD to obtain cleaner signals from RV data. We anticipate that the improved exoplanet detection limit may lead to a widespread adoption of such an approach. Our test on the HD 192310 planetary system is encouraging, as we confirm the presence of the two hosted exoplanets and we determine orbital parameters consistent with the literature, also providing much more precise estimates for HD 192310 c.Peer reviewe
    • …
    corecore