10,788 research outputs found
Forecasting Financial Volatility Using Nested Monte Carlo Expression Discovery
We are interested in discovering expressions for financial prediction using Nested Monte Carlo Search and Genetic Programming. Both methods are applied to learn from financial time series to generate non linear functions for market volatility prediction. The input data, that is a series of daily prices of European S&P500 index, is filtered and sampled in order to improve the training process. Using some assessment metrics, the best generated models given by both approaches for each training sub sample, are evaluated and compared. Results show that Nested Monte Carlo is able to generate better forecasting models than Genetic Programming for the majority of learning samples
A Framework for Monte Carlo based Multiple Testing
We are concerned with a situation in which we would like to test multiple
hypotheses with tests whose p-values cannot be computed explicitly but can be
approximated using Monte Carlo simulation. This scenario occurs widely in
practice. We are interested in obtaining the same rejections and non-rejections
as the ones obtained if the p-values for all hypotheses had been available. The
present article introduces a framework for this scenario by providing a generic
algorithm for a general multiple testing procedure. We establish conditions
which guarantee that the rejections and non-rejections obtained through Monte
Carlo simulations are identical to the ones obtained with the p-values. Our
framework is applicable to a general class of step-up and step-down procedures
which includes many established multiple testing corrections such as the ones
of Bonferroni, Holm, Sidak, Hochberg or Benjamini-Hochberg. Moreover, we show
how to use our framework to improve algorithms available in the literature in
such a way as to yield theoretical guarantees on their results. These
modifications can easily be implemented in practice and lead to a particular
way of reporting multiple testing results as three sets together with an error
bound on their correctness, demonstrated exemplarily using a real biological
dataset
Getting started in probabilistic graphical models
Probabilistic graphical models (PGMs) have become a popular tool for
computational analysis of biological data in a variety of domains. But, what
exactly are they and how do they work? How can we use PGMs to discover patterns
that are biologically relevant? And to what extent can PGMs help us formulate
new hypotheses that are testable at the bench? This note sketches out some
answers and illustrates the main ideas behind the statistical approach to
biological pattern discovery.Comment: 12 pages, 1 figur
On Nesting Monte Carlo Estimators
Many problems in machine learning and statistics involve nested expectations
and thus do not permit conventional Monte Carlo (MC) estimation. For such
problems, one must nest estimators, such that terms in an outer estimator
themselves involve calculation of a separate, nested, estimation. We
investigate the statistical implications of nesting MC estimators, including
cases of multiple levels of nesting, and establish the conditions under which
they converge. We derive corresponding rates of convergence and provide
empirical evidence that these rates are observed in practice. We further
establish a number of pitfalls that can arise from naive nesting of MC
estimators, provide guidelines about how these can be avoided, and lay out
novel methods for reformulating certain classes of nested expectation problems
into single expectations, leading to improved convergence rates. We demonstrate
the applicability of our work by using our results to develop a new estimator
for discrete Bayesian experimental design problems and derive error bounds for
a class of variational objectives.Comment: To appear at International Conference on Machine Learning 201
Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles
Reconstructing transcriptional regulatory networks is an important task in
functional genomics. Data obtained from experiments that perturb genes by
knockouts or RNA interference contain useful information for addressing this
reconstruction problem. However, such data can be limited in size and/or are
expensive to acquire. On the other hand, observational data of the organism in
steady state (e.g. wild-type) are more readily available, but their
informational content is inadequate for the task at hand. We develop a
computational approach to appropriately utilize both data sources for
estimating a regulatory network. The proposed approach is based on a three-step
algorithm to estimate the underlying directed but cyclic network, that uses as
input both perturbation screens and steady state gene expression data. In the
first step, the algorithm determines causal orderings of the genes that are
consistent with the perturbation data, by combining an exhaustive search method
with a fast heuristic that in turn couples a Monte Carlo technique with a fast
search algorithm. In the second step, for each obtained causal ordering, a
regulatory network is estimated using a penalized likelihood based method,
while in the third step a consensus network is constructed from the highest
scored ones. Extensive computational experiments show that the algorithm
performs well in reconstructing the underlying network and clearly outperforms
competing approaches that rely only on a single data source. Further, it is
established that the algorithm produces a consistent estimate of the regulatory
network.Comment: 24 pages, 4 figures, 6 table
- …