54,557 research outputs found

    Quantization/clustering: when and why does k-means work?

    Get PDF
    Though mostly used as a clustering algorithm, k-means are originally designed as a quantization algorithm. Namely, it aims at providing a compression of a probability distribution with k points. Building upon [21, 33], we try to investigate how and when these two approaches are compatible. Namely, we show that provided the sample distribution satisfies a margin like condition (in the sense of [27] for supervised learning), both the associated empirical risk minimizer and the output of Lloyd's algorithm provide almost optimal classification in certain cases (in the sense of [6]). Besides, we also show that they achieved fast and optimal convergence rates in terms of sample size and compression risk

    Can jumps improve the futures margin level? An empirical study based on an SE-SVCJ-GPD model

    Get PDF
    In addition to the characteristics of leptokurtic fat-tailed distribution, financial sequences also exhibit typical volatility and jumps. Moreover, jumps exhibit self-exciting and clustering characteristics under extreme events. However, studies on dynamic margin levels often ignore jumps. In this study, we combine the self-exciting stochastic volatility with correlated jumps (SE-SVCJ) model with a generalized Pareto distribution (GPD) to measure the optimal margin level for the stock index futures market. Value at risk (VaR) is estimated and forecasted using the SE-SVCJ-GPD, SVCJ-GPD, and generalized autoregressive conditional heteroskedasticity with GPD (GARCH-GPD) models. SE-SVCJ-GPD can undertake more risks in the long or short trading position of stock index futures contracts. Moreover, the backtesting experiment results show that the SE-SVCJ-GPD model provides a more accurate margin level forecast than the other methods in both positions. This study’s findings have practical significance and theoretical value for assessing the level of risk and taking corresponding risk-prevention measures

    Minimum Density Hyperplanes

    Get PDF
    Associating distinct groups of objects (clusters) with contiguous regions of high probability density (high-density clusters), is central to many statistical and machine learning approaches to the classification of unlabelled data. We propose a novel hyperplane classifier for clustering and semi-supervised classification which is motivated by this objective. The proposed minimum density hyperplane minimises the integral of the empirical probability density function along it, thereby avoiding intersection with high density clusters. We show that the minimum density and the maximum margin hyperplanes are asymptotically equivalent, thus linking this approach to maximum margin clustering and semi-supervised support vector classifiers. We propose a projection pursuit formulation of the associated optimisation problem which allows us to find minimum density hyperplanes efficiently in practice, and evaluate its performance on a range of benchmark datasets. The proposed approach is found to be very competitive with state of the art methods for clustering and semi-supervised classification

    Maximum Margin Clustering for State Decomposition of Metastable Systems

    Full text link
    When studying a metastable dynamical system, a prime concern is how to decompose the phase space into a set of metastable states. Unfortunately, the metastable state decomposition based on simulation or experimental data is still a challenge. The most popular and simplest approach is geometric clustering which is developed based on the classical clustering technique. However, the prerequisites of this approach are: (1) data are obtained from simulations or experiments which are in global equilibrium and (2) the coordinate system is appropriately selected. Recently, the kinetic clustering approach based on phase space discretization and transition probability estimation has drawn much attention due to its applicability to more general cases, but the choice of discretization policy is a difficult task. In this paper, a new decomposition method designated as maximum margin metastable clustering is proposed, which converts the problem of metastable state decomposition to a semi-supervised learning problem so that the large margin technique can be utilized to search for the optimal decomposition without phase space discretization. Moreover, several simulation examples are given to illustrate the effectiveness of the proposed method

    Anisotropic oracle inequalities in noisy quantization

    Get PDF
    The effect of errors in variables in quantization is investigated. We prove general exact and non-exact oracle inequalities with fast rates for an empirical minimization based on a noisy sample Zi=Xi+ϵi,i=1,…,nZ_i=X_i+\epsilon_i,i=1,\ldots,n, where XiX_i are i.i.d. with density ff and ϵi\epsilon_i are i.i.d. with density η\eta. These rates depend on the geometry of the density ff and the asymptotic behaviour of the characteristic function of η\eta. This general study can be applied to the problem of kk-means clustering with noisy data. For this purpose, we introduce a deconvolution kk-means stochastic minimization which reaches fast rates of convergence under standard Pollard's regularity assumptions.Comment: 30 pages. arXiv admin note: text overlap with arXiv:1205.141
    • …
    corecore