63,385 research outputs found
Forecasting Financial Volatility Using Nested Monte Carlo Expression Discovery
We are interested in discovering expressions for financial prediction using Nested Monte Carlo Search and Genetic Programming. Both methods are applied to learn from financial time series to generate non linear functions for market volatility prediction. The input data, that is a series of daily prices of European S&P500 index, is filtered and sampled in order to improve the training process. Using some assessment metrics, the best generated models given by both approaches for each training sub sample, are evaluated and compared. Results show that Nested Monte Carlo is able to generate better forecasting models than Genetic Programming for the majority of learning samples
Differential expression analysis for multiple conditions
As high-throughput sequencing has become common practice, the cost of
sequencing large amounts of genetic data has been drastically reduced, leading
to much larger data sets for analysis. One important task is to identify
biological conditions that lead to unusually high or low expression of a
particular gene. Packages such as DESeq implement a simple method for testing
differential signal when exactly two biological conditions are possible. For
more than two conditions, pairwise testing is typically used. Here the DESeq
method is extended so that three or more biological conditions can be assessed
simultaneously. Because the computation time grows exponentially in the number
of conditions, a Monte Carlo approach provides a fast way to approximate the
-values for the new test. The approach is studied on both simulated data and
a data set of {\em C. jejuni}, the bacteria responsible for most food poisoning
in the United States
Efficient inference for genetic association studies with multiple outcomes
Combined inference for heterogeneous high-dimensional data is critical in
modern biology, where clinical and various kinds of molecular data may be
available from a single study. Classical genetic association studies regress a
single clinical outcome on many genetic variants one by one, but there is an
increasing demand for joint analysis of many molecular outcomes and genetic
variants in order to unravel functional interactions. Unfortunately, most
existing approaches to joint modelling are either too simplistic to be powerful
or are impracticable for computational reasons. Inspired by Richardson et al.
(2010, Bayesian Statistics 9), we consider a sparse multivariate regression
model that allows simultaneous selection of predictors and associated
responses. As Markov chain Monte Carlo (MCMC) inference on such models can be
prohibitively slow when the number of genetic variants exceeds a few thousand,
we propose a variational inference approach which produces posterior
information very close to that of MCMC inference, at a much reduced
computational cost. Extensive numerical experiments show that our approach
outperforms popular variable selection methods and tailored Bayesian
procedures, dealing within hours with problems involving hundreds of thousands
of genetic variants and tens to hundreds of clinical or molecular outcomes
QuickMMCTest - Quick Multiple Monte Carlo Testing
Multiple hypothesis testing is widely used to evaluate scientific studies
involving statistical tests. However, for many of these tests, p-values are not
available and are thus often approximated using Monte Carlo tests such as
permutation tests or bootstrap tests. This article presents a simple algorithm
based on Thompson Sampling to test multiple hypotheses. It works with arbitrary
multiple testing procedures, in particular with step-up and step-down
procedures. Its main feature is to sequentially allocate Monte Carlo effort,
generating more Monte Carlo samples for tests whose decisions are so far less
certain. A simulation study demonstrates that for a low computational effort,
the new approach yields a higher power and a higher degree of reproducibility
of its results than previously suggested methods
A Two-Tiered Correlation of Dark Matter with Missing Transverse Energy: Reconstructing the Lightest Supersymmetric Particle Mass at the LHC
We suggest that non-trivial correlations between the dark matter particle
mass and collider based probes of missing transverse energy H_T^miss may
facilitate a two tiered approach to the initial discovery of supersymmetry and
the subsequent reconstruction of the LSP mass at the LHC. These correlations
are demonstrated via extensive Monte Carlo simulation of seventeen benchmark
models, each sampled at five distinct LHC center-of-mass beam energies,
spanning the parameter space of No-Scale F-SU(5).This construction is defined
in turn by the union of the Flipped SU(5) Grand Unified Theory, two pairs of
hypothetical TeV scale vector-like supersymmetric multiplets with origins in
F-theory, and the dynamically established boundary conditions of No-Scale
Supergravity. In addition, we consider a control sample comprised of a standard
minimal Supergravity benchmark point. Led by a striking similarity between the
H_T^miss distribution and the familiar power spectrum of a black body radiator
at various temperatures, we implement a broad empirical fit of our simulation
against a Poisson distribution ansatz. We advance the resulting fit as a
theoretical blueprint for deducing the mass of the LSP, utilizing only the
missing transverse energy in a statistical sampling of >= 9 jet events.
Cumulative uncertainties central to the method subsist at a satisfactory 12-15%
level. The fact that supersymmetric particle spectrum of No-Scale F-SU(5) has
thrived the withering onslaught of early LHC data that is steadily decimating
the Constrained Minimal Supersymmetric Standard Model and minimal Supergravity
parameter spaces is a prime motivation for augmenting more conventional LSP
search methodologies with the presently proposed alternative.Comment: JHEP version, 17 pages, 9 Figures, 2 Table
A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data
Due to rapid technological advances, a wide range of different measurements
can be obtained from a given biological sample including single nucleotide
polymorphisms, copy number variation, gene expression levels, DNA methylation
and proteomic profiles. Each of these distinct measurements provides the means
to characterize a certain aspect of biological diversity, and a fundamental
problem of broad interest concerns the discovery of shared patterns of
variation across different data types. Such data types are heterogeneous in the
sense that they represent measurements taken at very different scales or
described by very different data structures. We propose a distance-based
statistical test, the generalized RV (GRV) test, to assess whether there is a
common and non-random pattern of variability between paired biological
measurements obtained from the same random sample. The measurements enter the
test through distance measures which can be chosen to capture particular
aspects of the data. An approximate null distribution is proposed to compute
p-values in closed-form and without the need to perform costly Monte Carlo
permutation procedures. Compared to the classical Mantel test for association
between distance matrices, the GRV test has been found to be more powerful in a
number of simulation settings. We also report on an application of the GRV test
to detect biological pathways in which genetic variability is associated to
variation in gene expression levels in ovarian cancer samples, and present
results obtained from two independent cohorts
- …