48 research outputs found

    Application of Bayesian regression with singular value decomposition method in association studies for sequence data

    Get PDF
    Genetic association studies usually involve a large number of single-nucleotide polymorphisms (SNPs) (k) and a relative small sample size (n), which produces the situation that k is much greater than n. Because conventional statistical approaches are unable to deal with multiple SNPs simultaneously when k is much greater than n, single-SNP association studies have been used to identify genes involved in a disease’s pathophysiology, which causes a multiple testing problem. To evaluate the contribution of multiple SNPs simultaneously to disease traits when k is much greater than n, we developed the Bayesian regression with singular value decomposition (BRSVD) method. The method reduces the dimension of the design matrix from k to n by applying singular value decomposition to the design matrix. We evaluated the model using a Markov chain Monte Carlo simulation with Gibbs sampler constructed from the posterior densities driven by conjugate prior densities. Permutation was incorporated to generate empirical p-values. We applied the BRSVD method to the sequence data provided by Genetic Analysis Workshop 17 and found that the BRSVD method is a practical method that can be used to analyze sequence data in comparison to the single-SNP association test and the penalized regression method

    Analyzing stochastic computer models: A review with opportunities

    Get PDF
    This is the author accepted manuscript. The final version is available from the Institute of Mathematical Statistics via the DOI in this record In modern science, computer models are often used to understand complex phenomena, and a thriving statistical community has grown around analyzing them. This review aims to bring a spotlight to the growing prevalence of stochastic computer models -- providing a catalogue of statistical methods for practitioners, an introductory view for statisticians (whether familiar with deterministic computer models or not), and an emphasis on open questions of relevance to practitioners and statisticians. Gaussian process surrogate models take center stage in this review, and these, along with several extensions needed for stochastic settings, are explained. The basic issues of designing a stochastic computer experiment and calibrating a stochastic computer model are prominent in the discussion. Instructive examples, with data and code, are used to describe the implementation of, and results from, various methods.European Union FP7DOE LABNational Science Foundatio

    Transmission Selects for HIV-1 Strains of Intermediate Virulence: A Modelling Approach

    Get PDF
    Recent data shows that HIV-1 is characterised by variation in viral virulence factors that is heritable between infections, which suggests that viral virulence can be naturally selected at the population level. A trade-off between transmissibility and duration of infection appears to favour viruses of intermediate virulence. We developed a mathematical model to simulate the dynamics of putative viral genotypes that differ in their virulence. As a proxy for virulence, we use set-point viral load (SPVL), which is the steady density of viral particles in blood during asymptomatic infection. Mutation, the dependency of survival and transmissibility on SPVL, and host effects were incorporated into the model. The model was fitted to data to estimate unknown parameters, and was found to fit existing data well. The maximum likelihood estimates of the parameters produced a model in which SPVL converged from any initial conditions to observed values within 100–150 years of first emergence of HIV-1. We estimated the 1) host effect and 2) the extent to which the viral virulence genotype mutates from one infection to the next, and found a trade-off between these two parameters in explaining the variation in SPVL. The model confirms that evolution of virulence towards intermediate levels is sufficiently rapid for it to have happened in the early stages of the HIV epidemic, and confirms that existing viral loads are nearly optimal given the assumed constraints on evolution. The model provides a useful framework under which to examine the future evolution of HIV-1 virulence

    Design of Experiments for Screening

    Full text link
    The aim of this paper is to review methods of designing screening experiments, ranging from designs originally developed for physical experiments to those especially tailored to experiments on numerical models. The strengths and weaknesses of the various designs for screening variables in numerical models are discussed. First, classes of factorial designs for experiments to estimate main effects and interactions through a linear statistical model are described, specifically regular and nonregular fractional factorial designs, supersaturated designs and systematic fractional replicate designs. Generic issues of aliasing, bias and cancellation of factorial effects are discussed. Second, group screening experiments are considered including factorial group screening and sequential bifurcation. Third, random sampling plans are discussed including Latin hypercube sampling and sampling plans to estimate elementary effects. Fourth, a variety of modelling methods commonly employed with screening designs are briefly described. Finally, a novel study demonstrates six screening methods on two frequently-used exemplars, and their performances are compared

    An open challenge to advance probabilistic forecasting for dengue epidemics

    Get PDF
    This is the final version. Available on open access from the National Academy of Sciences via the DOI in this recordData Availability: Data deposition: The data are available at https://github.com/cdcepi/dengue-forecasting-project-2015 (DOI: https://doi.org/10.5281/zenodo.3519270).A wide range of research has promised new tools for forecasting infectious disease dynamics, but little of that research is currently being applied in practice, because tools do not address key public health needs, do not produce probabilistic forecasts, have not been evaluated on external data, or do not provide sufficient forecast skill to be useful. We developed an open collaborative forecasting challenge to assess probabilistic forecasts for seasonal epidemics of dengue, a major global public health problem. Sixteen teams used a variety of methods and data to generate forecasts for 3 epidemiological targets (peak incidence, the week of the peak, and total incidence) over 8 dengue seasons in Iquitos, Peru and San Juan, Puerto Rico. Forecast skill was highly variable across teams and targets. While numerous forecasts showed high skill for midseason situational awareness, early season skill was low, and skill was generally lowest for high incidence seasons, those for which forecasts would be most valuable. A comparison of modeling approaches revealed that average forecast skill was lower for models including biologically meaningful data and mechanisms and that both multimodel and multiteam ensemble forecasts consistently outperformed individual model forecasts. Leveraging these insights, data, and the forecasting framework will be critical to improve forecast skill and the application of forecasts in real time for epidemic preparedness and response. Moreover, key components of this project-integration with public health needs, a common forecasting framework, shared and standardized data, and open participation-can help advance infectious disease forecasting beyond dengue

    Player Pairs Valuation in Ice Hockey

    No full text
    To overcome the shortcomings of simple metrics for evaluating player performance, recent works have introduced more advanced metrics that take into account the context of the players’ actions and perform look-ahead. However, as ice hockey is a team sport, knowing about individual ratings is not enough and coaches want to identify players that play particularly well together. In this paper we therefore extend earlier work for evaluating the performance of players to the related problem of evaluating the performance of player pairs. We experiment with data from seven NHL seasons, discuss the top pairs, and present analyses and insights based on both the absolute and relative ice time together

    Exploratory ensemble designs for environmental models using k‐extended Latin Hypercubes

    No full text
    Copyright © 2015 John Wiley & Sons, Ltd.publication-status: AcceptedOpen Access articleIn this paper we present a novel, flexible, and multi-purpose class of designs for initial exploration of the parameter spaces of computer models, such as those used to study many features of the environment. The idea applies existing technology aimed at expanding a Latin Hypercube (LHC) in order to generate initial LHC designs that are composed of many smaller LHCs. The resulting design and its component parts are designed so that each is approximately orthogonal and maximises a measure of coverage of the parameter space. Designs of the type advocated for in this paper are particularly useful when we want to simultaneously quantify parametric uncertainty and any uncertainty due to the initial conditions, boundary conditions, or forcing functions required to run the model. This makes the class of designs particularly suited to environmental models, such as climate models that contain all of these features. The proposed designs are particularly suited to initial exploratory ensembles whose goal is to guide the design of further ensembles aimed at, for example, calibrating the model. We introduce a new emulator diagnostic that exploits the structure of the advocated ensemble designs and allows for the assessment of structural weaknesses in the statistical modelling. We provide illustrations of the method through a simple example and describe a 400 member ensemble of the Nucleus for European Modelling of the Ocean (NEMO) ocean model designed using the method. We build an emulator for NEMO using the created design to illustrate the use of our emulator diagnostic test.Engineering and Physical Sciences Research Council (EPSRC
    corecore