177,007 research outputs found

    Simultaneous Selection of Multiple Important Single Nucleotide Polymorphisms in Familial Genome Wide Association Studies Data

    Full text link
    We propose a resampling-based fast variable selection technique for selecting important Single Nucleotide Polymorphisms (SNP) in multi-marker mixed effect models used in twin studies. Due to computational complexity, current practice includes testing the effect of one SNP at a time, commonly termed as `single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect the relevant genetic variants, hence we adapt our recently proposed framework of ee-values to address this. In this paper, we propose a computationally efficient approach for single SNP detection in families while utilizing information on multiple SNPs simultaneously. We achieve this through improvements in two aspects. First, unlike other model selection techniques, our method only requires training a model with all possible predictors. Second, we utilize a fast and scalable bootstrap procedure that only requires Monte-Carlo sampling to obtain bootstrapped copies of the estimated vector of coefficients. Using this bootstrap sample, we obtain the ee-value for each SNP, and select SNPs having ee-values below a threshold. We illustrate through numerical studies that our method is more effective in detecting SNPs associated with a trait than either single-marker analysis using family data or model selection methods that ignore the familial dependency structure. We also use the ee-values to perform gene-level analysis in nuclear families and detect several SNPs that have been implicated to be associated with alcohol consumption

    Multi-scale uncertainty quantification in geostatistical seismic inversion

    Full text link
    Geostatistical seismic inversion is commonly used to infer the spatial distribution of the subsurface petro-elastic properties by perturbing the model parameter space through iterative stochastic sequential simulations/co-simulations. The spatial uncertainty of the inferred petro-elastic properties is represented with the updated a posteriori variance from an ensemble of the simulated realizations. Within this setting, the large-scale geological (metaparameters) used to generate the petro-elastic realizations, such as the spatial correlation model and the global a priori distribution of the properties of interest, are assumed to be known and stationary for the entire inversion domain. This assumption leads to underestimation of the uncertainty associated with the inverted models. We propose a practical framework to quantify uncertainty of the large-scale geological parameters in seismic inversion. The framework couples geostatistical seismic inversion with a stochastic adaptive sampling and Bayesian inference of the metaparameters to provide a more accurate and realistic prediction of uncertainty not restricted by heavy assumptions on large-scale geological parameters. The proposed framework is illustrated with both synthetic and real case studies. The results show the ability retrieve more reliable acoustic impedance models with a more adequate uncertainty spread when compared with conventional geostatistical seismic inversion techniques. The proposed approach separately account for geological uncertainty at large-scale (metaparameters) and local scale (trace-by-trace inversion)

    The SIMRAND methodology: Theory and application for the simulation of research and development projects

    Get PDF
    A research and development (R&D) project often involves a number of decisions that must be made concerning which subset of systems or tasks are to be undertaken to achieve the goal of the R&D project. To help in this decision making, SIMRAND (SIMulation of Research ANd Development Projects) is a methodology for the selection of the optimal subset of systems or tasks to be undertaken on an R&D project. Using alternative networks, the SIMRAND methodology models the alternative subsets of systems or tasks under consideration. Each path through an alternative network represents one way of satisfying the project goals. Equations are developed that relate the system or task variables to the measure of reference. Uncertainty is incorporated by treating the variables of the equations probabilistically as random variables, with cumulative distribution functions assessed by technical experts. Analytical techniques of probability theory are used to reduce the complexity of the alternative networks. Cardinal utility functions over the measure of preference are assessed for the decision makers. A run of the SIMRAND Computer I Program combines, in a Monte Carlo simulation model, the network structure, the equations, the cumulative distribution functions, and the utility functions

    Reliable ABC model choice via random forests

    Full text link
    Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table

    Proton-air cross section measurement with the ARGO-YBJ cosmic ray experiment

    Get PDF
    The proton-air cross section in the energy range 1-100 TeV has been measured by the ARGO-YBJ cosmic ray experiment. The analysis is based on the flux attenuation for different atmospheric depths (i.e. zenith angles) and exploits the detector capabilities of selecting the shower development stage by means of hit multiplicity, density and lateral profile measurements at ground. The effects of shower fluctuations, the contribution of heavier primaries and the uncertainties of the hadronic interaction models, have been taken into account. The results have been used to estimate the total proton-proton cross section at center of mass energies between 70 and 500 GeV, where no accelerator data are currently available.Comment: 14 pages, 9 figure

    Machine Learning-Based Elastic Cloud Resource Provisioning in the Solvency II Framework

    Get PDF
    The Solvency II Directive (Directive 2009/138/EC) is a European Directive issued in November 2009 and effective from January 2016, which has been enacted by the European Union to regulate the insurance and reinsurance sector through the discipline of risk management. Solvency II requires European insurance companies to conduct consistent evaluation and continuous monitoring of risks—a process which is computationally complex and extremely resource-intensive. To this end, companies are required to equip themselves with adequate IT infrastructures, facing a significant outlay. In this paper we present the design and the development of a Machine Learning-based approach to transparently deploy on a cloud environment the most resource-intensive portion of the Solvency II-related computation. Our proposal targets DISAR®, a Solvency II-oriented system initially designed to work on a grid of conventional computers. We show how our solution allows to reduce the overall expenses associated with the computation, without hampering the privacy of the companies’ data (making it suitable for conventional public cloud environments), and allowing to meet the strict temporal requirements required by the Directive. Additionally, the system is organized as a self-optimizing loop, which allows to use information gathered from actual (useful) computations, thus requiring a shorter training phase. We present an experimental study conducted on Amazon EC2 to assess the validity and the efficiency of our proposal

    Processing techniques development, volume 2. Part 1: Crop inventory techniques

    Get PDF
    There are no author-identified significant results in this report
    • …
    corecore