169 research outputs found

    Agglomerative Clustering with Threshold Optimization via Extreme Value Theory

    Full text link
    Clustering is a critical part of many tasks and, in most applications, the number of clusters in the data are unknown and must be estimated. This paper presents an Extreme Value Theory-based approach to threshold selection for clustering, proving that the “correct” linkage distances must follow a Weibull distribution for smooth feature spaces. Deep networks and their associated deep features have transformed many aspects of learning, and this paper shows they are consistent with our extreme-linkage theory and provide Unreasonable Clusterability. We show how our novel threshold selection can be applied to both classic agglomerative clustering and the more recent FINCH (First Integer Neighbor Clustering Hierarchy) algorithm. Our evaluation utilizes over a dozen different large-scale vision datasets/subsets, including multiple face-clustering datasets and ImageNet for both in-domain and, more importantly, out-of-domain object clustering. Across multiple deep features clustering tasks with very different characteristics, our novel automated threshold selection performs well, often outperforming state-of-the-art clustering techniques even when they select parameters on the test set

    Unsupervised amplitude and texture based classification of SAR images with multinomial latent model

    Get PDF
    We combine both amplitude and texture statistics of the Synthetic Aperture Radar (SAR) images for classification purpose. We use Nakagami density to model the class amplitudes and a non-Gaussian Markov Random Field (MRF) texture model with t-distributed regression error to model the textures of the classes. A non-stationary Multinomial Logistic (MnL) latent class label model is used as a mixture density to obtain spatially smooth class segments. The Classification Expectation-Maximization (CEM) algorithm is performed to estimate the class parameters and to classify the pixels. We resort to Integrated Classification Likelihood (ICL) criterion to determine the number of classes in the model. We obtained some classification results of water, land and urban areas in both supervised and unsupervised cases on TerraSAR-X, as well as COSMO-SkyMed data


    Get PDF
    Operational and environmental variance can skew reliability metrics and increase uncertainty around lifetime estimates. For this reason, fleet-wide analysis is often too general for accurate predictions on heterogeneous populations. Also, modern sensor based reliability and maintainability field and test data provide a higher level of specialization and disaggregation to relevant integrity metrics (e.g., amount of damage, remaining useful life). Modern advances, like Dynamic Bayesian Networks, reduce uncertainty on a unit-by-unit basis to apply condition-based maintenance. This thesis presents a methodology for leveraging covariate information to identify sub- populations. This population segmentation based methodology reduces fleet uncertainty for more practical resource allocation and scheduled maintenance. First, the author proposes, validates, and demonstrates a clustering based methodology. Afterwards, the author proposes the application of the Student-T Mixture Model (SMM) within the methodology as a versatile tool for modeling fleets with unclear sub-population boundaries. SMM’s fully Bayesian formulation, which is approximated with Variational Bayes (VB), is motivated and discussed. The scope of this research includes a new modeling approach, a proposed algorithm, and example applications
    • …