69,876 research outputs found
Estimation of instrinsic dimension via clustering
The problem of estimating the intrinsic dimension of a set of points in high dimensional space is a critical issue for a wide range of disciplines, including genomics, finance, and networking. Current estimation techniques are dependent on either the ambient or intrinsic dimension in terms of computational complexity, which may cause these methods to become intractable for large data sets. In this paper, we present a clustering-based methodology that exploits the inherent self-similarity of data to efficiently estimate the intrinsic dimension of a set of points. When the data satisfies a specified general clustering condition, we prove that the estimated dimension approaches the true Hausdorff dimension. Experiments show that the clustering-based approach allows for more efficient and accurate intrinsic dimension estimation compared with all prior techniques, even when the data does not conform to obvious self-similarity structure. Finally, we present empirical results which show the clustering-based estimation allows for a natural partitioning of the data points that lie on separate manifolds of varying intrinsic dimension
A New Estimator of Intrinsic Dimension Based on the Multipoint Morisita Index
The size of datasets has been increasing rapidly both in terms of number of
variables and number of events. As a result, the empty space phenomenon and the
curse of dimensionality complicate the extraction of useful information. But,
in general, data lie on non-linear manifolds of much lower dimension than that
of the spaces in which they are embedded. In many pattern recognition tasks,
learning these manifolds is a key issue and it requires the knowledge of their
true intrinsic dimension. This paper introduces a new estimator of intrinsic
dimension based on the multipoint Morisita index. It is applied to both
synthetic and real datasets of varying complexities and comparisons with other
existing estimators are carried out. The proposed estimator turns out to be
fairly robust to sample size and noise, unaffected by edge effects, able to
handle large datasets and computationally efficient
Investigating dynamic dependence using copulae
A general methodology for time series modelling is developed which works down from distributional
properties to implied structural models including the standard regression relationship. This
general to specific approach is important since it can avoid spurious assumptions such as linearity
in the form of the dynamic relationship between variables. It is based on splitting the multivariate
distribution of a time series into two parts: (i) the marginal unconditional distribution, (ii) the
serial dependence encompassed in a general function , the copula. General properties of the class of
copula functions that fulfill the necessary requirements for Markov chain construction are exposed.
Special cases for the gaussian copula with AR(p) dependence structure and for archimedean copulae
are presented. We also develop copula based dynamic dependency measures — auto-concordance
in place of autocorrelation. Finally, we provide empirical applications using financial returns and
transactions based forex data. Our model encompasses the AR(p) model and allows non-linearity.
Moreover, we introduce non-linear time dependence functions that generalize the autocorrelation
function
Exact Dimensionality Selection for Bayesian PCA
We present a Bayesian model selection approach to estimate the intrinsic
dimensionality of a high-dimensional dataset. To this end, we introduce a novel
formulation of the probabilisitic principal component analysis model based on a
normal-gamma prior distribution. In this context, we exhibit a closed-form
expression of the marginal likelihood which allows to infer an optimal number
of components. We also propose a heuristic based on the expected shape of the
marginal likelihood curve in order to choose the hyperparameters. In
non-asymptotic frameworks, we show on simulated data that this exact
dimensionality selection approach is competitive with both Bayesian and
frequentist state-of-the-art methods
A comparative evaluation of nonlinear dynamics methods for time series prediction
A key problem in time series prediction using autoregressive models is to fix the model order, namely the number of past samples required to model the time series adequately. The estimation of the model order using cross-validation may be a long process. In this paper, we investigate alternative methods to cross-validation, based on nonlinear dynamics methods, namely Grassberger-Procaccia, K,gl, Levina-Bickel and False Nearest Neighbors algorithms. The experiments have been performed in two different ways. In the first case, the model order has been used to carry out the prediction, performed by a SVM for regression on three real data time series showing that nonlinear dynamics methods have performances very close to the cross-validation ones. In the second case, we have tested the accuracy of nonlinear dynamics methods in predicting the known model order of synthetic time series. In this case, most of the methods have yielded a correct estimate and when the estimate was not correct, the value was very close to the real one
Recommended from our members
Lumpy Price Adjustments: A Microeconometric Analysis
This paper presents a simple model of state-dependent pricing that allows identification of the relative importance of the degree of price rigidity that is inherent to the price setting mechanism (intrinsic) and that which is due to the price’s driving variables (extrinsic). Using two data sets consisting of a large fraction of the price quotes used to compute the Belgian and French CPI, we are able to assess the role of intrinsic and extrinsic price stickiness in explaining the occurrence and magnitude of price changes at the outlet level. We find that infrequent price changes are not necessarily associated with large adjustment costs. Indeed, extrinsic rigidity appears to be significant in many cases. We also find that asymmetry in the price adjustment could be due to trends in marginal costs and/or desired mark-ups rather than asymmetric cost of adjustment bands
- …