793 research outputs found
Monte Carlo methods for the valuation of multiple exercise options
We discuss Monte Carlo methods for valuing options with multiple exercise features in discrete time. By extending the recently developed duality ideas for American option pricing we show how to obtain estimates on the prices of such options using Monte Carlo techniques. We prove convergence of our approach and estimate the error. The methods are applied to options in the energy and interest rate derivative markets
Random intersection trees
Finding interactions between variables in large and high-dimensional datasets
is often a serious computational challenge. Most approaches build up
interaction sets incrementally, adding variables in a greedy fashion. The
drawback is that potentially informative high-order interactions may be
overlooked. Here, we propose at an alternative approach for classification
problems with binary predictor variables, called Random Intersection Trees. It
works by starting with a maximal interaction that includes all variables, and
then gradually removing variables if they fail to appear in randomly chosen
observations of a class of interest. We show that informative interactions are
retained with high probability, and the computational complexity of our
procedure is of order for a value of that can reach values
as low as 1 for very sparse data; in many more general settings, it will still
beat the exponent obtained when using a brute force search constrained to
order interactions. In addition, by using some new ideas based on min-wise
hash schemes, we are able to further reduce the computational cost.
Interactions found by our algorithm can be used for predictive modelling in
various forms, but they are also often of interest in their own right as useful
characterisations of what distinguishes a certain class from others.This is the author's accepted manuscript. The final version of the manuscript can be found in the Journal of Machine Learning Research here: jmlr.csail.mit.edu/papers/volume15/shah14a/shah14a.pdf
Discussion: A tale of three cousins: Lasso, L2Boosting and Dantzig
Discussion of ``The Dantzig selector: Statistical estimation when is much
larger than '' by Emmanuel Candes and Terence Tao [math/0506081]Comment: Published in at http://dx.doi.org/10.1214/009053607000000460 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The xyz algorithm for fast interaction search in high-dimensional data
When performing regression on a data set with p variables, it is often of interest to go beyond using main linear effects and include interactions as products between individual variables. For small-scale problems, these interactions can be computed explicitly but this leads to a computational complexity of at least O(p2) if done naively. This cost can be prohibitive if p is very large. We introduce a new randomised algorithm that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in p. We show that strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires O(pα) operations for 1 < α < 2 depending on their strength. The underlying idea is to transform interaction search into a closest pair problem which can be solved efficiently in subquadratic time. The algorithm is called xyz and is implemented in the language R. We demonstrate its efficiency for application to genome-wide association studies, where more than 1011 interactions can be screened in under 280 seconds with a single-core 1:2 GHz CPU.Isaac Newton Trust Early Career Support Schem
Right singular vector projection graphs: fast high dimensional covariance matrix estimation under latent confounding
In this work we consider the problem of estimating a high-dimensional covariance matrix , given observations of confounded data
with covariance , where is an unknown matrix of latent factor loadings. We propose a simple and scalable
estimator based on the projection on to the right singular vectors of the
observed data matrix, which we call RSVP. Our theoretical analysis of this
method reveals that in contrast to PCA-based approaches, RSVP is able to cope
well with settings where the smallest eigenvalue of is close
to the largest eigenvalue of , as well as settings where the
eigenvalues of are diverging fast. It is also able to handle
data that may have heavy tails and only requires that the data has an
elliptical distribution. RSVP does not require knowledge or estimation of the
number of latent factors , but only recovers up to an unknown
positive scale factor. We argue this suffices in many applications, for example
if an estimate of the correlation matrix is desired. We also show that by using
subsampling, we can further improve the performance of the method. We
demonstrate the favourable performance of RSVP through simulation experiments
and an analysis of gene expression datasets collated by the GTEX consortium.Supported by an EPSRC First Grant and the Alan Turing Institute under the EPSRC grant EP/N510129/1
Analysis of the Copenhagen Accord pledges and its global climatic impactsâ a snapshot of dissonant ambitions
This analysis of the Copenhagen Accord evaluates emission reduction pledges by individual countries against the Accord's climate-related objectives. Probabilistic estimates of the climatic consequences for a set of resulting multi-gas scenarios over the 21st century are calculated with a reduced complexity climate model, yielding global temperature increase and atmospheric CO2 and CO2-equivalent concentrations. Provisions for banked surplus emission allowances and credits from land use, land-use change and forestry are assessed and are shown to have the potential to lead to significant deterioration of the ambition levels implied by the pledges in 2020. This analysis demonstrates that the Copenhagen Accord and the pledges made under it represent a set of dissonant ambitions. The ambition level of the current pledges for 2020 and the lack of commonly agreed goals for 2050 place in peril the Accord's own ambition: to limit global warming to below 2 °C, and even more so for 1.5 °C, which is referenced in the Accord in association with potentially strengthening the long-term temperature goal in 2015. Due to the limited level of ambition by 2020, the ability to limit emissions afterwards to pathways consistent with either the 2 or 1.5 °C goal is likely to become less feasibl
A roadmap for rapid decarbonization
Although the Paris Agreement's goals (1) are aligned with science (2) and can, in principle, be technically and economically achieved (3), alarming inconsistencies remain between science-based targets and national commitments. Despite progress during the 2016 Marrakech climate negotiations, long-term goals can be trumped by political short-termism. Following the Agreement, which became international law earlier than expected, several countries published mid-century decarbonization strategies, with more due soon. Model-based decarbonization assessments (4) and scenarios often struggle to capture transformative change and the dynamics associated with it: disruption, innovation, and nonlinear change in human behavior. For example, in just 2 years, China's coal use swung from 3.7% growth in 2013 to a decline of 3.7% in 2015 (5). To harness these dynamics and to calibrate for short-term realpolitik, we propose framing the decarbonization challenge in terms of a global decadal roadmap based on a simple heuristicâa âcarbon lawââof halving gross anthropogenic carbon-dioxide (CO2) emissions every decade. Complemented by immediately instigated, scalable carbon removal and efforts to ramp down land-use CO2 emissions, this can lead to net-zero emissions around mid-century, a path necessary to limit warming to well below 2°C
A Human Development Framework for CO2 Reductions
Although developing countries are called to participate in CO2 emission
reduction efforts to avoid dangerous climate change, the implications of
proposed reduction schemes in human development standards of developing
countries remain a matter of debate. We show the existence of a positive and
time-dependent correlation between the Human Development Index (HDI) and per
capita CO2 emissions from fossil fuel combustion. Employing this empirical
relation, extrapolating the HDI, and using three population scenarios, the
cumulative CO2 emissions necessary for developing countries to achieve
particular HDI thresholds are assessed following a Development As Usual
approach (DAU). If current demographic and development trends are maintained,
we estimate that by 2050 around 85% of the world's population will live in
countries with high HDI (above 0.8). In particular, 300Gt of cumulative CO2
emissions between 2000 and 2050 are estimated to be necessary for the
development of 104 developing countries in the year 2000. This value represents
between 20% to 30% of previously calculated CO2 budgets limiting global warming
to 2{\deg}C. These constraints and results are incorporated into a CO2
reduction framework involving four domains of climate action for individual
countries. The framework reserves a fair emission path for developing countries
to proceed with their development by indexing country-dependent reduction rates
proportional to the HDI in order to preserve the 2{\deg}C target after a
particular development threshold is reached. Under this approach, global
cumulative emissions by 2050 are estimated to range from 850 up to 1100Gt of
CO2. These values are within the uncertainty range of emissions to limit global
temperatures to 2{\deg}C.Comment: 14 pages, 7 figures, 1 tabl
Missing values: sparse inverse covariance estimation and an extension to sparse regression
We propose an l1-regularized likelihood method for estimating the inverse
covariance matrix in the high-dimensional multivariate normal model in presence
of missing data. Our method is based on the assumption that the data are
missing at random (MAR) which entails also the completely missing at random
case. The implementation of the method is non-trivial as the observed negative
log-likelihood generally is a complicated and non-convex function. We propose
an efficient EM algorithm for optimization with provable numerical convergence
properties. Furthermore, we extend the methodology to handle missing values in
a sparse regression context. We demonstrate both methods on simulated and real
data.Comment: The final publication is available at http://www.springerlink.co
- âŠ