106 research outputs found
Interpretable Clustering using Unsupervised Binary Trees
We herein introduce a new method of interpretable clustering that uses
unsupervised binary trees. It is a three-stage procedure, the first stage of
which entails a series of recursive binary splits to reduce the heterogeneity
of the data within the new subsamples. During the second stage (pruning),
consideration is given to whether adjacent nodes can be aggregated. Finally,
during the third stage (joining), similar clusters are joined together, even if
they do not descend from the same node originally. Consistency results are
obtained, and the procedure is used on simulated and real data sets.Comment: 25 pages, 6 figure
Aggregating density estimators: an empirical study
We present some new density estimation algorithms obtained by bootstrap
aggregation like Bagging. Our algorithms are analyzed and empirically compared
to other methods found in the statistical literature, like stacking and
boosting for density estimation. We show by extensive simulations that ensemble
learning are effective for density estimation like for classification. Although
our algorithms do not always outperform other methods, some of them are as
simple as bagging, more intuitive and has computational lower cost
Prévisions par arbres de classification
Following the tree method classification, we focus on the instability of the method and suggest a technique where the bootstrap is used. A detailed empirical study is illustrated in this paper.Après une présentation de la construction de prédicteurs par arbre de classification, nous nous intéressons à l'instabilité de cette méthode et proposons une méthodologie dans laquelle intervient le bootstrap. Une étude empirique détaillée illustre ce travail
Time series sampling
Este artÃculo forma parte de las actas del "The 8th International Conference on Time Series and Forecasting."Some complex models are frequently employed to describe physical and mechanical phenomena. In this setting, we have an input X, which is a time series, and an output Y = f(X) where f is a very complicated function, whose computational cost for every new input is very high. We are given two sets of observations of X, S1 and S2 of different sizes such that only f(S1) isavailable. We tackle the problem of selecting a subsample S3 ∈ S2 of a smaller size on which to run the complex model f and such that distribution of f(S3) is close to that of f(S1). We adapt to this new framework five algorithms introduced in a previous work "Subsampling under Distributional Constraints" to solve this problem and show their efficiency using time series data
- …