45,303 research outputs found
On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions
This paper is about two related decision theoretic problems, nonparametric
two-sample testing and independence testing. There is a belief that two
recently proposed solutions, based on kernels and distances between pairs of
points, behave well in high-dimensional settings. We identify different sources
of misconception that give rise to the above belief. Specifically, we
differentiate the hardness of estimation of test statistics from the hardness
of testing whether these statistics are zero or not, and explicitly discuss a
notion of "fair" alternative hypotheses for these problems as dimension
increases. We then demonstrate that the power of these tests actually drops
polynomially with increasing dimension against fair alternatives. We end with
some theoretical insights and shed light on the \textit{median heuristic} for
kernel bandwidth selection. Our work advances the current understanding of the
power of modern nonparametric hypothesis tests in high dimensions.Comment: 19 pages, 9 figures, published in AAAI-15: The 29th AAAI Conference
on Artificial Intelligence (with author order reversed from ArXiv
Some Recent Developments in Nonparametric Finance
This paper gives a selective review on some recent developments of nonparametric methods in both continuous and discrete time finance, particularly in the areas of nonparametric estimation and testing of diffusion processes, nonparametric testing of parametric diffusion models, nonparametric pricing of derivatives, nonparametric estimation and hypothesis testing for nonlinear pricing kernel, and nonparametric predictability of asset returns. For each financial context, the paper discusses the suitable statistical concepts, models, and modeling procedures, as well as some of their applications to financial data. Their relative strengths and weaknesses are discussed. Much theoretical and empirical research is needed in this area, and more importantly, the paper points to several aspects that deserve further investigation.This paper was published in Advances in Econometrics, Volume 25 (2009), 379–432
Information Theoretic Structure Learning with Confidence
Information theoretic measures (e.g. the Kullback Liebler divergence and
Shannon mutual information) have been used for exploring possibly nonlinear
multivariate dependencies in high dimension. If these dependencies are assumed
to follow a Markov factor graph model, this exploration process is called
structure discovery. For discrete-valued samples, estimates of the information
divergence over the parametric class of multinomial models lead to structure
discovery methods whose mean squared error achieves parametric convergence
rates as the sample size grows. However, a naive application of this method to
continuous nonparametric multivariate models converges much more slowly. In
this paper we introduce a new method for nonparametric structure discovery that
uses weighted ensemble divergence estimators that achieve parametric
convergence rates and obey an asymptotic central limit theorem that facilitates
hypothesis testing and other types of statistical validation.Comment: 10 pages, 3 figure
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
On a Nonparametric Notion of Residual and its Applications
Let be a continuous random vector in , . In this paper, we define the notion of a
nonparametric residual of on that is always independent of the
predictor . We study its properties and show that the proposed
notion of residual matches with the usual residual (error) in a multivariate
normal regression model. Given a random vector in
, we use this notion of
residual to show that the conditional independence between and , given
, is equivalent to the mutual independence of the residuals (of
on and on ) and . This result is used
to develop a test for conditional independence. We propose a bootstrap scheme
to approximate the critical value of this test. We compare the proposed test,
which is easily implementable, with some of the existing procedures through a
simulation study.Comment: 19 pages, 2 figure
Online Nonparametric Anomaly Detection based on Geometric Entropy Minimization
We consider the online and nonparametric detection of abrupt and persistent
anomalies, such as a change in the regular system dynamics at a time instance
due to an anomalous event (e.g., a failure, a malicious activity). Combining
the simplicity of the nonparametric Geometric Entropy Minimization (GEM) method
with the timely detection capability of the Cumulative Sum (CUSUM) algorithm we
propose a computationally efficient online anomaly detection method that is
applicable to high-dimensional datasets, and at the same time achieve a
near-optimum average detection delay performance for a given false alarm
constraint. We provide new insights to both GEM and CUSUM, including new
asymptotic analysis for GEM, which enables soft decisions for outlier
detection, and a novel interpretation of CUSUM in terms of the discrepancy
theory, which helps us generalize it to the nonparametric GEM statistic. We
numerically show, using both simulated and real datasets, that the proposed
nonparametric algorithm attains a close performance to the clairvoyant
parametric CUSUM test.Comment: to appear in IEEE International Symposium on Information Theory
(ISIT) 201
- …