45,303 research outputs found

    On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions

    Full text link
    This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There is a belief that two recently proposed solutions, based on kernels and distances between pairs of points, behave well in high-dimensional settings. We identify different sources of misconception that give rise to the above belief. Specifically, we differentiate the hardness of estimation of test statistics from the hardness of testing whether these statistics are zero or not, and explicitly discuss a notion of "fair" alternative hypotheses for these problems as dimension increases. We then demonstrate that the power of these tests actually drops polynomially with increasing dimension against fair alternatives. We end with some theoretical insights and shed light on the \textit{median heuristic} for kernel bandwidth selection. Our work advances the current understanding of the power of modern nonparametric hypothesis tests in high dimensions.Comment: 19 pages, 9 figures, published in AAAI-15: The 29th AAAI Conference on Artificial Intelligence (with author order reversed from ArXiv

    Some Recent Developments in Nonparametric Finance

    Get PDF
    This paper gives a selective review on some recent developments of nonparametric methods in both continuous and discrete time finance, particularly in the areas of nonparametric estimation and testing of diffusion processes, nonparametric testing of parametric diffusion models, nonparametric pricing of derivatives, nonparametric estimation and hypothesis testing for nonlinear pricing kernel, and nonparametric predictability of asset returns. For each financial context, the paper discusses the suitable statistical concepts, models, and modeling procedures, as well as some of their applications to financial data. Their relative strengths and weaknesses are discussed. Much theoretical and empirical research is needed in this area, and more importantly, the paper points to several aspects that deserve further investigation.This paper was published in Advances in Econometrics, Volume 25 (2009), 379–432

    Information Theoretic Structure Learning with Confidence

    Full text link
    Information theoretic measures (e.g. the Kullback Liebler divergence and Shannon mutual information) have been used for exploring possibly nonlinear multivariate dependencies in high dimension. If these dependencies are assumed to follow a Markov factor graph model, this exploration process is called structure discovery. For discrete-valued samples, estimates of the information divergence over the parametric class of multinomial models lead to structure discovery methods whose mean squared error achieves parametric convergence rates as the sample size grows. However, a naive application of this method to continuous nonparametric multivariate models converges much more slowly. In this paper we introduce a new method for nonparametric structure discovery that uses weighted ensemble divergence estimators that achieve parametric convergence rates and obey an asymptotic central limit theorem that facilitates hypothesis testing and other types of statistical validation.Comment: 10 pages, 3 figure

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    On a Nonparametric Notion of Residual and its Applications

    Get PDF
    Let (X,Z)(X, \mathbf{Z}) be a continuous random vector in R×Rd\mathbb{R} \times \mathbb{R}^d, d≥1d \ge 1. In this paper, we define the notion of a nonparametric residual of XX on Z\mathbf{Z} that is always independent of the predictor Z\mathbf{Z}. We study its properties and show that the proposed notion of residual matches with the usual residual (error) in a multivariate normal regression model. Given a random vector (X,Y,Z)(X, Y, \mathbf{Z}) in R×R×Rd\mathbb{R} \times \mathbb{R} \times \mathbb{R}^d, we use this notion of residual to show that the conditional independence between XX and YY, given Z\mathbf{Z}, is equivalent to the mutual independence of the residuals (of XX on Z\mathbf{Z} and YY on Z\mathbf{Z}) and Z\mathbf{Z}. This result is used to develop a test for conditional independence. We propose a bootstrap scheme to approximate the critical value of this test. We compare the proposed test, which is easily implementable, with some of the existing procedures through a simulation study.Comment: 19 pages, 2 figure

    Online Nonparametric Anomaly Detection based on Geometric Entropy Minimization

    Full text link
    We consider the online and nonparametric detection of abrupt and persistent anomalies, such as a change in the regular system dynamics at a time instance due to an anomalous event (e.g., a failure, a malicious activity). Combining the simplicity of the nonparametric Geometric Entropy Minimization (GEM) method with the timely detection capability of the Cumulative Sum (CUSUM) algorithm we propose a computationally efficient online anomaly detection method that is applicable to high-dimensional datasets, and at the same time achieve a near-optimum average detection delay performance for a given false alarm constraint. We provide new insights to both GEM and CUSUM, including new asymptotic analysis for GEM, which enables soft decisions for outlier detection, and a novel interpretation of CUSUM in terms of the discrepancy theory, which helps us generalize it to the nonparametric GEM statistic. We numerically show, using both simulated and real datasets, that the proposed nonparametric algorithm attains a close performance to the clairvoyant parametric CUSUM test.Comment: to appear in IEEE International Symposium on Information Theory (ISIT) 201
    • …
    corecore