52,480 research outputs found

    A framework for dependency estimation in heterogeneous data streams

    Get PDF
    Estimating dependencies from data is a fundamental task of Knowledge Discovery. Identifying the relevant variables leads to a better understanding of data and improves both the runtime and the outcomes of downstream Data Mining tasks. Dependency estimation from static numerical data has received much attention. However, real-world data often occurs as heterogeneous data streams: On the one hand, data is collected online and is virtually infinite. On the other hand, the various components of a stream may be of different types, e.g., numerical, ordinal or categorical. For this setting, we propose Monte Carlo Dependency Estimation (MCDE), a framework that quantifies multivariate dependency as the average statistical discrepancy between marginal and conditional distributions, via Monte Carlo simulations. MCDE handles heterogeneity by leveraging three statistical tests: the Mann–Whitney U, the Kolmogorov–Smirnov and the Chi-Squared test. We demonstrate that MCDE goes beyond the state of the art regarding dependency estimation by meeting a broad set of requirements. Finally, we show with a real-world use case that MCDE can discover useful patterns in heterogeneous data streams

    Robust Unit Root and Cointegration Rank Tests for Panels and Large Systems

    Get PDF
    This study develops new tests for unit roots and cointegration rank in heterogeneous time series panels using methods that are robust to the presence of both incidental trends and cross sectional dependency of unknown form. Furthermore, the procedures do not require a choice of lag truncation or bandwidth to accommodate higher order serial correlation. The cointegration rank tests can also be implemented in relatively large dimensioned systems of equations for which conventional VECM based tests become infeasible. Monte Carlo simulations demonstrate that the procedures have high power and good size properties even in panels with relatively small dimensions.Panel Unit Roots, Cointegration Rank Tests, Robust Autocovariance Estimation

    An efficient estimator for locally stationary Gaussian long-memory processes

    Full text link
    This paper addresses the estimation of locally stationary long-range dependent processes, a methodology that allows the statistical analysis of time series data exhibiting both nonstationarity and strong dependency. A time-varying parametric formulation of these models is introduced and a Whittle likelihood technique is proposed for estimating the parameters involved. Large sample properties of these Whittle estimates such as consistency, normality and efficiency are established in this work. Furthermore, the finite sample behavior of the estimators is investigated through Monte Carlo experiments. As a result from these simulations, we show that the estimates behave well even for relatively small sample sizes.Comment: Published in at http://dx.doi.org/10.1214/10-AOS812 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nonparametric Beta Kernel Estimator for Long Memory Time Series

    Get PDF
    The paper introduces a new nonparametric estimator of the spectral density that is given in smoothing the periodogram by the probability density of Beta random variable (Beta kernel). The estimator is proved to be bounded for short memory data, and diverges at the origin for long memory data. The convergence in probability of the relative error and Monte Carlo simulations suggest that the estimator automaticaly adapts to the long- or the short-range dependency of the process. A cross-validation procedure is also studied in order to select the nuisance parameter of the estimator. Illustrations on historical as well as most recent returns and absolute returns of the S&P500 index show the reasonable performance of the estimation, and show that the data-driven estimator is a valuable tool for the detection of long-memory as well as hidden periodicities in stock returns.spectral density, long rage dependence, nonparametric estimation

    A simulation study of semiparametric estimation in copula models based on minimum Alpha-Divergence

    Full text link
    The purpose of this paper is to introduce two semiparametric methods for the estimation of copula parameter. These methods are based on minimum Alpha-Divergence between a non-parametric estimation of copula density using local likelihood probit transformation method and a true copula density function. A Monte Carlo study is performed to measure the performance of these methods based on Hellinger distance and Neyman divergence as special cases of Alpha-Divergence. Simulation results are compared to the Maximum Pseudo-Likelihood (MPL) estimation as a conventional estimation method in well-known bivariate copula models. These results show that the proposed method based on Minimum Pseudo Hellinger Distance estimation has a good performance in small sample size and weak dependency situations. The parameter estimation methods are applied to a real data set in Hydrology.Comment: 14 page

    Nonparametric Beta kernel estimator for long memory time series

    Get PDF
    The paper introduces a new nonparametric estimator of the spectral density that is given in smoothing the periodogram by the probability density of Beta random variable (Beta kernel). The estimator is proved to be bounded for short memory data, and diverges at the origin for long memory data. The convergence in probability of the relative error and Monte Carlo simulations suggest that the estimator automaticaly adapts to the long- or the short-range dependency of the process. A cross-validation procedure is also studied in order to select the nuisance parameter of the estimator. Illustrations on historical as well as most recent returns and absolute returns of the S&P500 index show the reasonable performance of the estimation, and show that the data-driven estimator is a valuable tool for the detection of long-memory as well as hidden periodicities in stock returns.spectral density, long range dependence, nonparametric estimation, periodogram, kernel smoothing, Beta kernel, cross-validation
    • …
    corecore