264,806 research outputs found

    Element-centric clustering comparison unifies overlaps and hierarchy

    Full text link
    Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science

    A New Approach to Time Domain Classification of Broadband Noise in Gravitational Wave Data

    Get PDF
    Broadband noise in gravitational wave (GW) detectors, also known as triggers, can often be a deterrant to the efficiency with which astrophysical search pipelines detect sources. It is important to understand their instrumental or environmental origin so that they could be eliminated or accounted for in the data. Since the number of triggers is large, data mining approaches such as clustering and classification are useful tools for this task. Classification of triggers based on a handful of discrete properties has been done in the past. A rich information content is available in the waveform or 'shape' of the triggers that has had a rather restricted exploration so far. This paper presents a new way to classify triggers deriving information from both trigger waveforms as well as their discrete physical properties using a sequential combination of the Longest Common Sub-Sequence (LCSS) and LCSS coupled with Fast Time Series Evaluation (FTSE) for waveform classification and the multidimensional hierarchical classification (MHC) analysis for the grouping based on physical properties. A generalized k-means algorithm is used with the LCSS (and LCSS+FTSE) for clustering the triggers using a validity measure to determine the correct number of clusters in absence of any prior knowledge. The results have been demonstrated by simulations and by application to a segment of real LIGO data from the sixth science run.Comment: 16 pages, 16 figure

    Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection

    Full text link
    There has been a recent emergence of sampling-based techniques for estimating epistemic uncertainty in deep neural networks. While these methods can be applied to classification or semantic segmentation tasks by simply averaging samples, this is not the case for object detection, where detection sample bounding boxes must be accurately associated and merged. A weak merging strategy can significantly degrade the performance of the detector and yield an unreliable uncertainty measure. This paper provides the first in-depth investigation of the effect of different association and merging strategies. We compare different combinations of three spatial and two semantic affinity measures with four clustering methods for MC Dropout with a Single Shot Multi-Box Detector. Our results show that the correct choice of affinity-clustering combination can greatly improve the effectiveness of the classification and spatial uncertainty estimation and the resulting object detection performance. We base our evaluation on a new mix of datasets that emulate near open-set conditions (semantically similar unknown classes), distant open-set conditions (semantically dissimilar unknown classes) and the common closed-set conditions (only known classes).Comment: to appear in IEEE International Conference on Robotics and Automation 2019 (ICRA 2019

    An empirical comparison of classification algorithms for diagnosis of depression from brain sMRI scans

    Full text link
    To be diagnostically effective, structural magnetic resonance imaging (sMRI) must reliably distinguish a depressed individual from a healthy individual at individual scans level. One of the tasks in the automated diagnosis of depression from brain sMRI is the classification. It determines the class to which a sample belongs (i.e., depressed/not depressed, remitted/not-remitted depression) based on the values of its features. Thus far, very limited works have been reported for identification of a suitable classification algorithm for depression detection. In this paper, different types of classification algorithms are compared for effective diagnosis of depression. Ten independent classification schemas are applied and a comparative study is carried out. The algorithms are: Naïve Bayes, Support Vector Machines (SVM) with Radial Basis Function (RBF), SVM Sigmoid, J48, Random Forest, Random Tree, Voting Feature Intervals (VFI), LogitBoost, Simple KMeans Classification Via Clustering (KMeans) and Classification Via Clustering Expectation Minimization (EM) respectively. The performances of the algorithms are determined through a set of experiments on sMRI brain scans. An experimental procedure is developed to measure the performance of the tested algorithms. A classification accuracy evaluation method was employed for evaluation and comparison of the performance of the examined classifiers

    Groundwater quality assessment: an improved approach to K-means clustering, principal component analysis and spatial analysis: a case study

    Get PDF
    "K-means clustering and principal component analysis (PCA) are widely used in water quality analysis and management. Nevertheless, numerous studies have pointed out that K-means with the squared Euclidean distance is not suitable for high-dimensional datasets. We evaluate a methodology (K-means based on PCA) for water quality evaluation. It is based on the PCA method to reduce the dataset from high dimensional to low for the improvement of K-means clustering. For this, a large dataset of 28 hydrogeochemical variables and 582 wells in the coastal aquifer are classified with K-means clustering for high dimensional and K-means clustering based on PCA. The proposed method achieved increased quality cluster cohesion according to the average Silhouette index. It ranged from 0.13 for high dimensional k-means clustering to 5.94 for K-means based on PCA and the practical spatial geographic information systems (GIS) evaluation of clustering indicates more quality results for K-means clustering based on PCA. K-means based on PCA identified three hydrogeochemical classes and their sources. High salinity was attributed to seawater intrusion and the mineralization process, high levels of heavy metals related to domestic-industrial wastewater discharge and low heavy metals concentrations were associated with industrial wastewater punctual discharges. This approach allowed the demarcation of natural and anthropogenic variation sources in the aquifer and provided greater certainty and accuracy to the data classification.
    • …
    corecore