1,224 research outputs found

    Control variates for stochastic gradient MCMC

    Get PDF
    It is well known that Markov chain Monte Carlo (MCMC) methods scale poorly with dataset size. A popular class of methods for solving this issue is stochastic gradient MCMC (SGMCMC). These methods use a noisy estimate of the gradient of the log-posterior, which reduces the per iteration computational cost of the algorithm. Despite this, there are a number of results suggesting that stochastic gradient Langevin dynamics (SGLD), probably the most popular of these methods, still has computational cost proportional to the dataset size. We suggest an alternative log-posterior gradient estimate for stochastic gradient MCMC which uses control variates to reduce the variance. We analyse SGLD using this gradient estimate, and show that, under log-concavity assumptions on the target distribution, the computational cost required for a given level of accuracy is independent of the dataset size. Next we show that a different control variate technique, known as zero variance control variates, can be applied to SGMCMC algorithms for free. This post-processing step improves the inference of the algorithm by reducing the variance of the MCMC output. Zero variance control variates rely on the gradient of the log-posterior; we explore how the variance reduction is affected by replacing this with the noisy gradient estimate calculated by SGMCMC

    Modeling human ad hoc coordination

    Get PDF
    Whether in groups of humans or groups of computer agents, collaboration is most effective between individuals who have the ability to coordinate on a joint strategy for collective action. However, in general a rational actor will only intend to coordinate if that actor believes the other group members have the same intention. This circular dependence makes rational coordination difficult in uncertain environments if communication between actors is unreliable and no prior agreements have been made. An important normative question with regard to coordination in these ad hoc settings is therefore how one can come to believe that other actors will coordinate, and with regard to systems involving humans, an important empirical question is how humans arrive at these expectations. We introduce an exact algorithm for computing the infinitely recursive hierarchy of graded beliefs required for rational coordination in uncertain environments, and we introduce a novel mechanism for multiagent coordination that uses it. Our algorithm is valid in any environment with a finite state space, and extensions to certain countably infinite state spaces are likely possible. We test our mechanism for multiagent coordination as a model for human decisions in a simple coordination game using existing experimental data. We then explore via simulations whether modeling humans in this way may improve human-Agent collaboration

    Deep-water chemosynthetic ecosystem research during the Census of Marine Life Decade and Beyond: A Proposed Deep-Ocean Road Map

    Get PDF
    The ChEss project of the Census of Marine Life (2002–2010) helped foster internationally-coordinated studies worldwide focusing on exploration for, and characterization of new deep-sea chemosynthetic ecosystem sites. This work has advanced our understanding of the nature and factors controlling the biogeography and biodiversity of these ecosystems in four geographic locations: the Atlantic Equatorial Belt (AEB), the New Zealand region, the Arctic and Antarctic and the SE Pacific off Chile. In the AEB, major discoveries include hydrothermal seeps on the Costa Rica margin, deepest vents found on the Mid-Cayman Rise and the hottest vents found on the Southern Mid-Atlantic Ridge. It was also shown that the major fracture zones on the MAR do not create barriers for the dispersal but may act as trans-Atlantic conduits for larvae. In New Zealand, investigations of a newly found large cold-seep area suggest that this region may be a new biogeographic province. In the Arctic, the newly discovered sites on the Mohns Ridge (71°N) showed extensive mats of sulfur-oxidisng bacteria, but only one gastropod potentially bears chemosynthetic symbionts, while cold seeps on the Haakon Mossby Mud Volcano (72°N) are dominated by siboglinid worms. In the Antarctic region, the first hydrothermal vents south of the Polar Front were located and biological results indicate that they may represent a new biogeographic province. The recent exploration of the South Pacific region has provided evidence for a sediment hosted hydrothermal source near a methane-rich cold-seep area. Based on our 8 years of investigations of deep-water chemosynthetic ecosystems worldwide, we suggest highest priorities for future research: (i) continued exploration of the deep-ocean ridge-crest; (ii) increased focus on anthropogenic impacts; (iii) concerted effort to coordinate a major investigation of the deep South Pacific Ocean – the largest contiguous habitat for life within Earth's biosphere, but also the world's least investigated deep-ocean basin

    Microbial Symbionts and Ecological Divergence of Caribbean Sponges: A New Perspective on an Ancient Association

    Get PDF
    Marine sponges host diverse communities of microbial symbionts that expand the metabolic capabilities of their host, but the abundance and structure of these communities is highly variable across sponge species. Specificity in these interactions may fuel host niche partitioning on crowded coral reefs by allowing individual sponge species to exploit unique sources of carbon and nitrogen, but this hypothesis is yet to be tested. Given the presence of high sponge biomass and the coexistence of diverse sponge species, the Caribbean Sea provides a unique system in which to investigate this hypothesis. To test for ecological divergence among sympatric Caribbean sponges and investigate whether these trends are mediated by microbial symbionts, we measured stable isotope (δ13C and δ15N) ratios and characterized the microbial community structure of sponge species at sites within four regions spanning a 1700 km latitudinal gradient. There was a low (median of 8.2 %) overlap in the isotopic niches of sympatric species; in addition, host identity accounted for over 75% of the dissimilarity in both δ13C and δ15N values and microbiome community structure among individual samples within a site. There was also a strong phylogenetic signal in both δ15N values and microbial community diversity across host phylogeny, as well as a correlation between microbial community structure and variation in δ13C and δ15N values across samples. Together, this evidence supports a hypothesis of strong evolutionary selection for ecological divergence across sponge lineages and suggests that this divergence is at least partially mediated by associations with microbial symbionts

    Large-Scale Stochastic Sampling from the Probability Simplex

    Get PDF
    Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space, such as the simplex, the time-discretisation error can dominate when we are near the boundary of the space. We demonstrate that while current SGMCMC methods for the simplex perform well in certain cases, they struggle with sparse simplex spaces; when many of the components are close to zero. However, most popular large-scale applications of Bayesian inference on simplex spaces, such as network or topic models, are sparse. We argue that this poor performance is due to the biases of SGMCMC caused by the discretization error. To get around this, we propose the stochastic CIR process, which removes all discretization error and we prove that samples from the stochastic CIR process are asymptotically unbiased. Use of the stochastic CIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches

    Non-Globular Proteins: Nature to Nanotechnology Repetitive, Non-Globular Proteins: Nature to Nanotechnology Held at the University of York

    Get PDF
    Abstract The ability of bacteria to adhere to other cells or to surfaces depends on long, thin adhesive structures that are anchored to their cell walls. These structures include extended protein oligomers known as pili and single, multi-domain polypeptides, mostly based on multiple tandem Ig-like domains. Recent structural studies have revealed the widespread presence of covalent cross-links, not previously seen within proteins, which stabilize these domains. The cross-links discovered so far are either isopeptide bonds that link lysine side chains to the side chains of asparagine or aspartic acid residues or ester bonds between threonine and glutamine side chains. These bonds appear to be formed by spontaneous intramolecular reactions as the proteins fold and are strategically placed so as to impart considerable mechanical strength

    sgmcmc:An R Package for Stochastic Gradient Markov Chain Monte Carlo

    Get PDF
    This paper introduces the R package sgmcmc; which can be used for Bayesian inference on problems with large datasets using stochastic gradient Markov chain Monte Carlo (SGMCMC). Traditional Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings, are known to run prohibitively slowly as the dataset size increases. SGMCMC solves this issue by only using a subset of data at each iteration. SGMCMC requires calculating gradients of the log likelihood and log priors, which can be time consuming and error prone to perform by hand. The sgmcmc package calculates these gradients itself using automatic differentiation, making the implementation of these methods much easier. To do this, the package uses the software library TensorFlow, which has a variety of statistical distributions and mathematical operations as standard, meaning a wide class of models can be built using this framework. SGMCMC has become widely adopted in the machine learning literature, but less so in the statistics community. We believe this may be partly due to lack of software; this package aims to bridge this gap
    • …
    corecore