5,608 research outputs found

    Interpretable Clustering using Unsupervised Binary Trees

    Get PDF
    We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data within the new subsamples. During the second stage (pruning), consideration is given to whether adjacent nodes can be aggregated. Finally, during the third stage (joining), similar clusters are joined together, even if they do not descend from the same node originally. Consistency results are obtained, and the procedure is used on simulated and real data sets.Comment: 25 pages, 6 figure

    A Multiscale Approach for Statistical Characterization of Functional Images

    Get PDF
    Increasingly, scientific studies yield functional image data, in which the observed data consist of sets of curves recorded on the pixels of the image. Examples include temporal brain response intensities measured by fMRI and NMR frequency spectra measured at each pixel. This article presents a new methodology for improving the characterization of pixels in functional imaging, formulated as a spatial curve clustering problem. Our method operates on curves as a unit. It is nonparametric and involves multiple stages: (i) wavelet thresholding, aggregation, and Neyman truncation to effectively reduce dimensionality; (ii) clustering based on an extended EM algorithm; and (iii) multiscale penalized dyadic partitioning to create a spatial segmentation. We motivate the different stages with theoretical considerations and arguments, and illustrate the overall procedure on simulated and real datasets. Our method appears to offer substantial improvements over monoscale pixel-wise methods. An Appendix which gives some theoretical justifications of the methodology, computer code, documentation and dataset are available in the online supplements

    Exploiting Recurring Patterns to Improve Scalability of Parking Availability Prediction Systems

    Get PDF
    Parking Guidance and Information (PGI) systems aim at supporting drivers in finding suitable parking spaces, also by predicting the availability at driver’s Estimated Time of Arrival (ETA), leveraging information about the general parking availability situation. To do these predictions, most of the proposals in the literature dealing with on-street parking need to train a model for each road segment, with significant scalability issues when deploying a city-wide PGI. By investigating a real dataset we found that on-street parking dynamics show a high temporal auto-correlation. In this paper we present a new processing pipeline that exploits these recurring trends to improve the scalability. The proposal includes two steps to reduce both the number of required models and training examples. The effectiveness of the proposed pipeline has been empirically assessed on a real dataset of on-street parking availability from San Francisco (USA). Results show that the proposal is able to provide parking predictions whose accuracy is comparable to state-of-the-art solutions based on one model per road segment, while requiring only a fraction of training costs, thus being more likely scalable to city-wide scenarios
    corecore