177,007 research outputs found
Simultaneous Selection of Multiple Important Single Nucleotide Polymorphisms in Familial Genome Wide Association Studies Data
We propose a resampling-based fast variable selection technique for selecting
important Single Nucleotide Polymorphisms (SNP) in multi-marker mixed effect
models used in twin studies. Due to computational complexity, current practice
includes testing the effect of one SNP at a time, commonly termed as `single
SNP association analysis'. Joint modeling of genetic variants within a gene or
pathway may have better power to detect the relevant genetic variants, hence we
adapt our recently proposed framework of -values to address this. In this
paper, we propose a computationally efficient approach for single SNP detection
in families while utilizing information on multiple SNPs simultaneously. We
achieve this through improvements in two aspects. First, unlike other model
selection techniques, our method only requires training a model with all
possible predictors. Second, we utilize a fast and scalable bootstrap procedure
that only requires Monte-Carlo sampling to obtain bootstrapped copies of the
estimated vector of coefficients. Using this bootstrap sample, we obtain the
-value for each SNP, and select SNPs having -values below a threshold. We
illustrate through numerical studies that our method is more effective in
detecting SNPs associated with a trait than either single-marker analysis using
family data or model selection methods that ignore the familial dependency
structure. We also use the -values to perform gene-level analysis in nuclear
families and detect several SNPs that have been implicated to be associated
with alcohol consumption
Multi-scale uncertainty quantification in geostatistical seismic inversion
Geostatistical seismic inversion is commonly used to infer the spatial
distribution of the subsurface petro-elastic properties by perturbing the model
parameter space through iterative stochastic sequential
simulations/co-simulations. The spatial uncertainty of the inferred
petro-elastic properties is represented with the updated a posteriori variance
from an ensemble of the simulated realizations. Within this setting, the
large-scale geological (metaparameters) used to generate the petro-elastic
realizations, such as the spatial correlation model and the global a priori
distribution of the properties of interest, are assumed to be known and
stationary for the entire inversion domain. This assumption leads to
underestimation of the uncertainty associated with the inverted models. We
propose a practical framework to quantify uncertainty of the large-scale
geological parameters in seismic inversion. The framework couples
geostatistical seismic inversion with a stochastic adaptive sampling and
Bayesian inference of the metaparameters to provide a more accurate and
realistic prediction of uncertainty not restricted by heavy assumptions on
large-scale geological parameters. The proposed framework is illustrated with
both synthetic and real case studies. The results show the ability retrieve
more reliable acoustic impedance models with a more adequate uncertainty spread
when compared with conventional geostatistical seismic inversion techniques.
The proposed approach separately account for geological uncertainty at
large-scale (metaparameters) and local scale (trace-by-trace inversion)
The SIMRAND methodology: Theory and application for the simulation of research and development projects
A research and development (R&D) project often involves a number of decisions that must be made concerning which subset of systems or tasks are to be undertaken to achieve the goal of the R&D project. To help in this decision making, SIMRAND (SIMulation of Research ANd Development Projects) is a methodology for the selection of the optimal subset of systems or tasks to be undertaken on an R&D project. Using alternative networks, the SIMRAND methodology models the alternative subsets of systems or tasks under consideration. Each path through an alternative network represents one way of satisfying the project goals. Equations are developed that relate the system or task variables to the measure of reference. Uncertainty is incorporated by treating the variables of the equations probabilistically as random variables, with cumulative distribution functions assessed by technical experts. Analytical techniques of probability theory are used to reduce the complexity of the alternative networks. Cardinal utility functions over the measure of preference are assessed for the decision makers. A run of the SIMRAND Computer I Program combines, in a Monte Carlo simulation model, the network structure, the equations, the cumulative distribution functions, and the utility functions
Reliable ABC model choice via random forests
Approximate Bayesian computation (ABC) methods provide an elaborate approach
to Bayesian inference on complex models, including model choice. Both
theoretical arguments and simulation experiments indicate, however, that model
posterior probabilities may be poorly evaluated by standard ABC techniques. We
propose a novel approach based on a machine learning tool named random forests
to conduct selection among the highly complex models covered by ABC algorithms.
We thus modify the way Bayesian model selection is both understood and
operated, in that we rephrase the inferential goal as a classification problem,
first predicting the model that best fits the data with random forests and
postponing the approximation of the posterior probability of the predicted MAP
for a second stage also relying on random forests. Compared with earlier
implementations of ABC model choice, the ABC random forest approach offers
several potential improvements: (i) it often has a larger discriminative power
among the competing models, (ii) it is more robust against the number and
choice of statistics summarizing the data, (iii) the computing effort is
drastically reduced (with a gain in computation efficiency of at least fifty),
and (iv) it includes an approximation of the posterior probability of the
selected model. The call to random forests will undoubtedly extend the range of
size of datasets and complexity of models that ABC can handle. We illustrate
the power of this novel methodology by analyzing controlled experiments as well
as genuine population genetics datasets. The proposed methodologies are
implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
Proton-air cross section measurement with the ARGO-YBJ cosmic ray experiment
The proton-air cross section in the energy range 1-100 TeV has been measured
by the ARGO-YBJ cosmic ray experiment. The analysis is based on the flux
attenuation for different atmospheric depths (i.e. zenith angles) and exploits
the detector capabilities of selecting the shower development stage by means of
hit multiplicity, density and lateral profile measurements at ground. The
effects of shower fluctuations, the contribution of heavier primaries and the
uncertainties of the hadronic interaction models, have been taken into account.
The results have been used to estimate the total proton-proton cross section at
center of mass energies between 70 and 500 GeV, where no accelerator data are
currently available.Comment: 14 pages, 9 figure
Recommended from our members
Uncertainty quantification of satellite precipitation estimation and Monte Carlo assessment of the error propagation into hydrologic response
The aim of this paper is to foster the development of an end-to-end uncertainty analysis framework that can quantify satellite-based precipitation estimation error characteristics and to assess the influence of the error propagation into hydrological simulation. First, the error associated with the satellite-based precipitation estimates is assumed as a nonlinear function of rainfall space-time integration scale, rain intensity, and sampling frequency. Parameters of this function are determined by using high-resolution satellite-based precipitation estimates and gauge-corrected radar rainfall data over the southwestern United States. Parameter sensitivity analysis at 16 selected 5° × 5° latitude-longitude grids shows about 12-16% of variance of each parameter with respect to its mean value. Afterward, the influence of precipitation estimation error on the uncertainty of hydrological response is further examined with Monte Carlo simulation. By this approach, 100 ensemble members of precipitation data are generated, as forcing input to a conceptual rainfall-runoff hydrologic model, and the resulting uncertainty in the streamflow prediction is quantified. Case studies are demonstrated over the Leaf River basin in Mississippi. Compared with conventional procedure, i.e., precipitation estimation error as fixed ratio of rain rates, the proposed framework provides more realistic quantification of precipitation estimation error and offers improved uncertainty assessment of the error propagation into hydrologic simulation. Further study shows that the radar rainfall-generated streamflow sequences are consistently contained by the uncertainty bound of satellite rainfall generated streamflow at the 95% confidence interval. Copyright 2006 by the American Geophysical Union
Machine Learning-Based Elastic Cloud Resource Provisioning in the Solvency II Framework
The Solvency II Directive (Directive 2009/138/EC) is a European Directive issued in November 2009 and effective from January 2016, which has been enacted by the European Union to regulate the insurance and reinsurance sector through the discipline of risk management. Solvency II requires European insurance companies to conduct consistent evaluation and continuous monitoring of risks—a process which is computationally complex and extremely resource-intensive. To this end, companies are required to equip themselves with adequate IT infrastructures, facing a significant outlay.
In this paper we present the design and the development of a Machine Learning-based approach to transparently deploy on a cloud environment the most resource-intensive portion of the Solvency II-related computation. Our proposal targets DISAR®, a Solvency II-oriented system initially designed to work on a grid of conventional computers. We show how our solution allows to reduce the overall expenses associated with the computation, without hampering the privacy of the companies’ data (making it suitable for conventional public cloud environments), and allowing to meet the strict temporal requirements required by the Directive. Additionally, the system is organized as a self-optimizing loop, which allows to use information gathered from actual (useful) computations, thus requiring a shorter training phase. We present an experimental study conducted on Amazon EC2 to assess the validity and the efficiency of our proposal
Processing techniques development, volume 2. Part 1: Crop inventory techniques
There are no author-identified significant results in this report
- …