21,085 research outputs found
On-line predictive linear regression
We consider the on-line predictive version of the standard problem of linear
regression; the goal is to predict each consecutive response given the
corresponding explanatory variables and all the previous observations. We are
mainly interested in prediction intervals rather than point predictions. The
standard treatment of prediction intervals in linear regression analysis has
two drawbacks: (1) the classical prediction intervals guarantee that the
probability of error is equal to the nominal significance level epsilon, but
this property per se does not imply that the long-run frequency of error is
close to epsilon; (2) it is not suitable for prediction of complex systems as
it assumes that the number of observations exceeds the number of parameters. We
state a general result showing that in the on-line protocol the frequency of
error for the classical prediction intervals does equal the nominal
significance level, up to statistical fluctuations. We also describe
alternative regression models in which informative prediction intervals can be
found before the number of observations exceeds the number of parameters. One
of these models, which only assumes that the observations are independent and
identically distributed, is popular in machine learning but greatly underused
in the statistical theory of regression.Comment: 34 pages; 6 figures; 1 table. arXiv admin note: substantial text
overlap with arXiv:0906.312
Significance of log-periodic precursors to financial crashes
We clarify the status of log-periodicity associated with speculative bubbles
preceding financial crashes. In particular, we address Feigenbaum's [2001]
criticism and show how it can be rebuked. Feigenbaum's main result is as
follows: ``the hypothesis that the log-periodic component is present in the
data cannot be rejected at the 95% confidence level when using all the data
prior to the 1987 crash; however, it can be rejected by removing the last year
of data.'' (e.g., by removing 15% of the data closest to the critical point).
We stress that it is naive to analyze a critical point phenomenon, i.e., a
power law divergence, reliably by removing the most important part of the data
closest to the critical point. We also present the history of log-periodicity
in the present context explaining its essential features and why it may be
important. We offer an extension of the rational expectation bubble model for
general and arbitrary risk-aversion within the general stochastic discount
factor theory. We suggest guidelines for using log-periodicity and explain how
to develop and interpret statistical tests of log-periodicity. We discuss the
issue of prediction based on our results and the evidence of outliers in the
distribution of drawdowns. New statistical tests demonstrate that the 1% to 10%
quantile of the largest events of the population of drawdowns of the Nasdaq
composite index and of the Dow Jones Industrial Average index belong to a
distribution significantly different from the rest of the population. This
suggests that very large drawdowns result from an amplification mechanism that
may make them more predictable than smaller market moves.Comment: Latex document of 38 pages including 16 eps figures and 3 tables, in
press in Quantitative Financ
Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models
Learning accurate probabilistic models from data is crucial in many practical
tasks in data mining. In this paper we present a new non-parametric calibration
method called \textit{ensemble of near isotonic regression} (ENIR). The method
can be considered as an extension of BBQ, a recently proposed calibration
method, as well as the commonly used calibration method based on isotonic
regression. ENIR is designed to address the key limitation of isotonic
regression which is the monotonicity assumption of the predictions. Similar to
BBQ, the method post-processes the output of a binary classifier to obtain
calibrated probabilities. Thus it can be combined with many existing
classification models. We demonstrate the performance of ENIR on synthetic and
real datasets for the commonly used binary classification models. Experimental
results show that the method outperforms several common binary classifier
calibration methods. In particular on the real data, ENIR commonly performs
statistically significantly better than the other methods, and never worse. It
is able to improve the calibration power of classifiers, while retaining their
discrimination power. The method is also computationally tractable for large
scale datasets, as it is time, where is the number of
samples
- …