6,977 research outputs found
A fast and recursive algorithm for clustering large datasets with -medians
Clustering with fast algorithms large samples of high dimensional data is an
important challenge in computational statistics. Borrowing ideas from MacQueen
(1967) who introduced a sequential version of the -means algorithm, a new
class of recursive stochastic gradient algorithms designed for the -medians
loss criterion is proposed. By their recursive nature, these algorithms are
very fast and are well adapted to deal with large samples of data that are
allowed to arrive sequentially. It is proved that the stochastic gradient
algorithm converges almost surely to the set of stationary points of the
underlying loss criterion. A particular attention is paid to the averaged
versions, which are known to have better performances, and a data-driven
procedure that allows automatic selection of the value of the descent step is
proposed.
The performance of the averaged sequential estimator is compared on a
simulation study, both in terms of computation speed and accuracy of the
estimations, with more classical partitioning techniques such as -means,
trimmed -means and PAM (partitioning around medoids). Finally, this new
online clustering technique is illustrated on determining television audience
profiles with a sample of more than 5000 individual television audiences
measured every minute over a period of 24 hours.Comment: Under revision for Computational Statistics and Data Analysi
Polyhedral Predictive Regions For Power System Applications
Despite substantial improvement in the development of forecasting approaches,
conditional and dynamic uncertainty estimates ought to be accommodated in
decision-making in power system operation and market, in order to yield either
cost-optimal decisions in expectation, or decision with probabilistic
guarantees. The representation of uncertainty serves as an interface between
forecasting and decision-making problems, with different approaches handling
various objects and their parameterization as input. Following substantial
developments based on scenario-based stochastic methods, robust and
chance-constrained optimization approaches have gained increasing attention.
These often rely on polyhedra as a representation of the convex envelope of
uncertainty. In the work, we aim to bridge the gap between the probabilistic
forecasting literature and such optimization approaches by generating forecasts
in the form of polyhedra with probabilistic guarantees. For that, we see
polyhedra as parameterized objects under alternative definitions (under
and norms), the parameters of which may be modelled and predicted.
We additionally discuss assessing the predictive skill of such multivariate
probabilistic forecasts. An application and related empirical investigation
results allow us to verify probabilistic calibration and predictive skills of
our polyhedra.Comment: 8 page
Discussion of "Multivariate quantiles and multiple-output regression quantiles: From optimization to halfspace depth"
Discussion of "Multivariate quantiles and multiple-output regression
quantiles: From optimization to halfspace depth" by M. Hallin, D.
Paindaveine and M. Siman [arXiv:1002.4486]Comment: Published in at http://dx.doi.org/10.1214/09-AOS723B the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- âŠ