63,385 research outputs found

    Forecasting Financial Volatility Using Nested Monte Carlo Expression Discovery

    Get PDF
    We are interested in discovering expressions for financial prediction using Nested Monte Carlo Search and Genetic Programming. Both methods are applied to learn from financial time series to generate non linear functions for market volatility prediction. The input data, that is a series of daily prices of European S&P500 index, is filtered and sampled in order to improve the training process. Using some assessment metrics, the best generated models given by both approaches for each training sub sample, are evaluated and compared. Results show that Nested Monte Carlo is able to generate better forecasting models than Genetic Programming for the majority of learning samples

    Differential expression analysis for multiple conditions

    Full text link
    As high-throughput sequencing has become common practice, the cost of sequencing large amounts of genetic data has been drastically reduced, leading to much larger data sets for analysis. One important task is to identify biological conditions that lead to unusually high or low expression of a particular gene. Packages such as DESeq implement a simple method for testing differential signal when exactly two biological conditions are possible. For more than two conditions, pairwise testing is typically used. Here the DESeq method is extended so that three or more biological conditions can be assessed simultaneously. Because the computation time grows exponentially in the number of conditions, a Monte Carlo approach provides a fast way to approximate the pp-values for the new test. The approach is studied on both simulated data and a data set of {\em C. jejuni}, the bacteria responsible for most food poisoning in the United States

    Efficient inference for genetic association studies with multiple outcomes

    Full text link
    Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modelling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson et al. (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes

    QuickMMCTest - Quick Multiple Monte Carlo Testing

    Get PDF
    Multiple hypothesis testing is widely used to evaluate scientific studies involving statistical tests. However, for many of these tests, p-values are not available and are thus often approximated using Monte Carlo tests such as permutation tests or bootstrap tests. This article presents a simple algorithm based on Thompson Sampling to test multiple hypotheses. It works with arbitrary multiple testing procedures, in particular with step-up and step-down procedures. Its main feature is to sequentially allocate Monte Carlo effort, generating more Monte Carlo samples for tests whose decisions are so far less certain. A simulation study demonstrates that for a low computational effort, the new approach yields a higher power and a higher degree of reproducibility of its results than previously suggested methods

    A Two-Tiered Correlation of Dark Matter with Missing Transverse Energy: Reconstructing the Lightest Supersymmetric Particle Mass at the LHC

    Get PDF
    We suggest that non-trivial correlations between the dark matter particle mass and collider based probes of missing transverse energy H_T^miss may facilitate a two tiered approach to the initial discovery of supersymmetry and the subsequent reconstruction of the LSP mass at the LHC. These correlations are demonstrated via extensive Monte Carlo simulation of seventeen benchmark models, each sampled at five distinct LHC center-of-mass beam energies, spanning the parameter space of No-Scale F-SU(5).This construction is defined in turn by the union of the Flipped SU(5) Grand Unified Theory, two pairs of hypothetical TeV scale vector-like supersymmetric multiplets with origins in F-theory, and the dynamically established boundary conditions of No-Scale Supergravity. In addition, we consider a control sample comprised of a standard minimal Supergravity benchmark point. Led by a striking similarity between the H_T^miss distribution and the familiar power spectrum of a black body radiator at various temperatures, we implement a broad empirical fit of our simulation against a Poisson distribution ansatz. We advance the resulting fit as a theoretical blueprint for deducing the mass of the LSP, utilizing only the missing transverse energy in a statistical sampling of >= 9 jet events. Cumulative uncertainties central to the method subsist at a satisfactory 12-15% level. The fact that supersymmetric particle spectrum of No-Scale F-SU(5) has thrived the withering onslaught of early LHC data that is steadily decimating the Constrained Minimal Supersymmetric Standard Model and minimal Supergravity parameter spaces is a prime motivation for augmenting more conventional LSP search methodologies with the presently proposed alternative.Comment: JHEP version, 17 pages, 9 Figures, 2 Table

    A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data

    Full text link
    Due to rapid technological advances, a wide range of different measurements can be obtained from a given biological sample including single nucleotide polymorphisms, copy number variation, gene expression levels, DNA methylation and proteomic profiles. Each of these distinct measurements provides the means to characterize a certain aspect of biological diversity, and a fundamental problem of broad interest concerns the discovery of shared patterns of variation across different data types. Such data types are heterogeneous in the sense that they represent measurements taken at very different scales or described by very different data structures. We propose a distance-based statistical test, the generalized RV (GRV) test, to assess whether there is a common and non-random pattern of variability between paired biological measurements obtained from the same random sample. The measurements enter the test through distance measures which can be chosen to capture particular aspects of the data. An approximate null distribution is proposed to compute p-values in closed-form and without the need to perform costly Monte Carlo permutation procedures. Compared to the classical Mantel test for association between distance matrices, the GRV test has been found to be more powerful in a number of simulation settings. We also report on an application of the GRV test to detect biological pathways in which genetic variability is associated to variation in gene expression levels in ovarian cancer samples, and present results obtained from two independent cohorts
    corecore