264,806 research outputs found
Element-centric clustering comparison unifies overlaps and hierarchy
Clustering is one of the most universal approaches for understanding complex
data. A pivotal aspect of clustering analysis is quantitatively comparing
clusterings; clustering comparison is the basis for many tasks such as
clustering evaluation, consensus clustering, and tracking the temporal
evolution of clusters. In particular, the extrinsic evaluation of clustering
methods requires comparing the uncovered clusterings to planted clusterings or
known metadata. Yet, as we demonstrate, existing clustering comparison measures
have critical biases which undermine their usefulness, and no measure
accommodates both overlapping and hierarchical clusterings. Here we unify the
comparison of disjoint, overlapping, and hierarchically structured clusterings
by proposing a new element-centric framework: elements are compared based on
the relationships induced by the cluster structure, as opposed to the
traditional cluster-centric philosophy. We demonstrate that, in contrast to
standard clustering similarity measures, our framework does not suffer from
critical biases and naturally provides unique insights into how the clusterings
differ. We illustrate the strengths of our framework by revealing new insights
into the organization of clusters in two applications: the improved
classification of schizophrenia based on the overlapping and hierarchical
community structure of fMRI brain networks, and the disentanglement of various
social homophily factors in Facebook social networks. The universality of
clustering suggests far-reaching impact of our framework throughout all areas
of science
A New Approach to Time Domain Classification of Broadband Noise in Gravitational Wave Data
Broadband noise in gravitational wave (GW) detectors, also known as triggers,
can often be a deterrant to the efficiency with which astrophysical search
pipelines detect sources. It is important to understand their instrumental or
environmental origin so that they could be eliminated or accounted for in the
data. Since the number of triggers is large, data mining approaches such as
clustering and classification are useful tools for this task. Classification of
triggers based on a handful of discrete properties has been done in the past. A
rich information content is available in the waveform or 'shape' of the
triggers that has had a rather restricted exploration so far. This paper
presents a new way to classify triggers deriving information from both trigger
waveforms as well as their discrete physical properties using a sequential
combination of the Longest Common Sub-Sequence (LCSS) and LCSS coupled with
Fast Time Series Evaluation (FTSE) for waveform classification and the
multidimensional hierarchical classification (MHC) analysis for the grouping
based on physical properties. A generalized k-means algorithm is used with the
LCSS (and LCSS+FTSE) for clustering the triggers using a validity measure to
determine the correct number of clusters in absence of any prior knowledge. The
results have been demonstrated by simulations and by application to a segment
of real LIGO data from the sixth science run.Comment: 16 pages, 16 figure
Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection
There has been a recent emergence of sampling-based techniques for estimating
epistemic uncertainty in deep neural networks. While these methods can be
applied to classification or semantic segmentation tasks by simply averaging
samples, this is not the case for object detection, where detection sample
bounding boxes must be accurately associated and merged. A weak merging
strategy can significantly degrade the performance of the detector and yield an
unreliable uncertainty measure. This paper provides the first in-depth
investigation of the effect of different association and merging strategies. We
compare different combinations of three spatial and two semantic affinity
measures with four clustering methods for MC Dropout with a Single Shot
Multi-Box Detector. Our results show that the correct choice of
affinity-clustering combination can greatly improve the effectiveness of the
classification and spatial uncertainty estimation and the resulting object
detection performance. We base our evaluation on a new mix of datasets that
emulate near open-set conditions (semantically similar unknown classes),
distant open-set conditions (semantically dissimilar unknown classes) and the
common closed-set conditions (only known classes).Comment: to appear in IEEE International Conference on Robotics and Automation
2019 (ICRA 2019
An empirical comparison of classification algorithms for diagnosis of depression from brain sMRI scans
To be diagnostically effective, structural magnetic resonance imaging (sMRI) must reliably distinguish a depressed individual from a healthy individual at individual scans level. One of the tasks in the automated diagnosis of depression from brain sMRI is the classification. It determines the class to which a sample belongs (i.e., depressed/not depressed, remitted/not-remitted depression) based on the values of its features. Thus far, very limited works have been reported for identification of a suitable classification algorithm for depression detection. In this paper, different types of classification algorithms are compared for effective diagnosis of depression. Ten independent classification schemas are applied and a comparative study is carried out. The algorithms are: Naïve Bayes, Support Vector Machines (SVM) with Radial Basis Function (RBF), SVM Sigmoid, J48, Random Forest, Random Tree, Voting Feature Intervals (VFI), LogitBoost, Simple KMeans Classification Via Clustering (KMeans) and Classification Via Clustering Expectation Minimization (EM) respectively. The performances of the algorithms are determined through a set of experiments on sMRI brain scans. An experimental procedure is developed to measure the performance of the tested algorithms. A classification accuracy evaluation method was employed for evaluation and comparison of the performance of the examined classifiers
Groundwater quality assessment: an improved approach to K-means clustering, principal component analysis and spatial analysis: a case study
"K-means clustering and principal component analysis (PCA) are widely used in water quality analysis and management. Nevertheless, numerous studies have pointed out that K-means with the squared Euclidean distance is not suitable for high-dimensional datasets. We evaluate a methodology (K-means based on PCA) for water quality evaluation. It is based on the PCA method to reduce the dataset from high dimensional to low for the improvement of K-means clustering. For this, a large dataset of 28 hydrogeochemical variables and 582 wells in the coastal aquifer are classified with K-means clustering for high dimensional and K-means clustering based on PCA. The proposed method achieved increased quality cluster cohesion according to the average Silhouette index. It ranged from 0.13 for high dimensional k-means clustering to 5.94 for K-means based on PCA and the practical spatial geographic information systems (GIS) evaluation of clustering indicates more quality results for K-means clustering based on PCA. K-means based on PCA identified three hydrogeochemical classes and their sources. High salinity was attributed to seawater intrusion and the mineralization process, high levels of heavy metals related to domestic-industrial wastewater discharge and low heavy metals concentrations were associated with industrial wastewater punctual discharges. This approach allowed the demarcation of natural and anthropogenic variation sources in the aquifer and provided greater certainty and accuracy to the data classification.
- …