54,557 research outputs found
Quantization/clustering: when and why does k-means work?
Though mostly used as a clustering algorithm, k-means are originally designed
as a quantization algorithm. Namely, it aims at providing a compression of a
probability distribution with k points. Building upon [21, 33], we try to
investigate how and when these two approaches are compatible. Namely, we show
that provided the sample distribution satisfies a margin like condition (in the
sense of [27] for supervised learning), both the associated empirical risk
minimizer and the output of Lloyd's algorithm provide almost optimal
classification in certain cases (in the sense of [6]). Besides, we also show
that they achieved fast and optimal convergence rates in terms of sample size
and compression risk
Can jumps improve the futures margin level? An empirical study based on an SE-SVCJ-GPD model
In addition to the characteristics of leptokurtic fat-tailed distribution,
financial sequences also exhibit typical volatility and jumps.
Moreover, jumps exhibit self-exciting and clustering characteristics
under extreme events. However, studies on dynamic margin levels
often ignore jumps. In this study, we combine the self-exciting
stochastic volatility with correlated jumps (SE-SVCJ) model with a
generalized Pareto distribution (GPD) to measure the optimal
margin level for the stock index futures market. Value at risk (VaR)
is estimated and forecasted using the SE-SVCJ-GPD, SVCJ-GPD,
and generalized autoregressive conditional heteroskedasticity with
GPD (GARCH-GPD) models. SE-SVCJ-GPD can undertake more risks
in the long or short trading position of stock index futures contracts.
Moreover, the backtesting experiment results show that
the SE-SVCJ-GPD model provides a more accurate margin level
forecast than the other methods in both positions. This study’s
findings have practical significance and theoretical value for
assessing the level of risk and taking corresponding risk-prevention
measures
Minimum Density Hyperplanes
Associating distinct groups of objects (clusters) with contiguous regions of
high probability density (high-density clusters), is central to many
statistical and machine learning approaches to the classification of unlabelled
data. We propose a novel hyperplane classifier for clustering and
semi-supervised classification which is motivated by this objective. The
proposed minimum density hyperplane minimises the integral of the empirical
probability density function along it, thereby avoiding intersection with high
density clusters. We show that the minimum density and the maximum margin
hyperplanes are asymptotically equivalent, thus linking this approach to
maximum margin clustering and semi-supervised support vector classifiers. We
propose a projection pursuit formulation of the associated optimisation problem
which allows us to find minimum density hyperplanes efficiently in practice,
and evaluate its performance on a range of benchmark datasets. The proposed
approach is found to be very competitive with state of the art methods for
clustering and semi-supervised classification
Maximum Margin Clustering for State Decomposition of Metastable Systems
When studying a metastable dynamical system, a prime concern is how to
decompose the phase space into a set of metastable states. Unfortunately, the
metastable state decomposition based on simulation or experimental data is
still a challenge. The most popular and simplest approach is geometric
clustering which is developed based on the classical clustering technique.
However, the prerequisites of this approach are: (1) data are obtained from
simulations or experiments which are in global equilibrium and (2) the
coordinate system is appropriately selected. Recently, the kinetic clustering
approach based on phase space discretization and transition probability
estimation has drawn much attention due to its applicability to more general
cases, but the choice of discretization policy is a difficult task. In this
paper, a new decomposition method designated as maximum margin metastable
clustering is proposed, which converts the problem of metastable state
decomposition to a semi-supervised learning problem so that the large margin
technique can be utilized to search for the optimal decomposition without phase
space discretization. Moreover, several simulation examples are given to
illustrate the effectiveness of the proposed method
Anisotropic oracle inequalities in noisy quantization
The effect of errors in variables in quantization is investigated. We prove
general exact and non-exact oracle inequalities with fast rates for an
empirical minimization based on a noisy sample
, where are i.i.d. with density and
are i.i.d. with density . These rates depend on the geometry
of the density and the asymptotic behaviour of the characteristic function
of .
This general study can be applied to the problem of -means clustering with
noisy data. For this purpose, we introduce a deconvolution -means stochastic
minimization which reaches fast rates of convergence under standard Pollard's
regularity assumptions.Comment: 30 pages. arXiv admin note: text overlap with arXiv:1205.141
- …