106 research outputs found

    Interpretable Clustering using Unsupervised Binary Trees

    Get PDF
    We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data within the new subsamples. During the second stage (pruning), consideration is given to whether adjacent nodes can be aggregated. Finally, during the third stage (joining), similar clusters are joined together, even if they do not descend from the same node originally. Consistency results are obtained, and the procedure is used on simulated and real data sets.Comment: 25 pages, 6 figure

    Aggregating density estimators: an empirical study

    Full text link
    We present some new density estimation algorithms obtained by bootstrap aggregation like Bagging. Our algorithms are analyzed and empirically compared to other methods found in the statistical literature, like stacking and boosting for density estimation. We show by extensive simulations that ensemble learning are effective for density estimation like for classification. Although our algorithms do not always outperform other methods, some of them are as simple as bagging, more intuitive and has computational lower cost

    Prévisions par arbres de classification

    Get PDF
    Following the tree method classification, we focus on the instability of the method and suggest a technique where the bootstrap is used. A detailed empirical study is illustrated in this paper.Après une présentation de la construction de prédicteurs par arbre de classification, nous nous intéressons à l'instabilité de cette méthode et proposons une méthodologie dans laquelle intervient le bootstrap. Une étude empirique détaillée illustre ce travail

    Time series sampling

    Get PDF
    Este artículo forma parte de las actas del "The 8th International Conference on Time Series and Forecasting."Some complex models are frequently employed to describe physical and mechanical phenomena. In this setting, we have an input X, which is a time series, and an output Y = f(X) where f is a very complicated function, whose computational cost for every new input is very high. We are given two sets of observations of X, S1 and S2 of different sizes such that only f(S1) isavailable. We tackle the problem of selecting a subsample S3 ∈ S2 of a smaller size on which to run the complex model f and such that distribution of f(S3) is close to that of f(S1). We adapt to this new framework five algorithms introduced in a previous work "Subsampling under Distributional Constraints" to solve this problem and show their efficiency using time series data
    • …
    corecore