4,309 research outputs found
A Short Survey on Data Clustering Algorithms
With rapidly increasing data, clustering algorithms are important tools for
data analytics in modern research. They have been successfully applied to a
wide range of domains; for instance, bioinformatics, speech recognition, and
financial analysis. Formally speaking, given a set of data instances, a
clustering algorithm is expected to divide the set of data instances into the
subsets which maximize the intra-subset similarity and inter-subset
dissimilarity, where a similarity measure is defined beforehand. In this work,
the state-of-the-arts clustering algorithms are reviewed from design concept to
methodology; Different clustering paradigms are discussed. Advanced clustering
algorithms are also discussed. After that, the existing clustering evaluation
metrics are reviewed. A summary with future insights is provided at the end
Gravitational Clustering: A Simple, Robust and Adaptive Approach for Distributed Networks
Distributed signal processing for wireless sensor networks enables that
different devices cooperate to solve different signal processing tasks. A
crucial first step is to answer the question: who observes what? Recently,
several distributed algorithms have been proposed, which frame the
signal/object labelling problem in terms of cluster analysis after extracting
source-specific features, however, the number of clusters is assumed to be
known. We propose a new method called Gravitational Clustering (GC) to
adaptively estimate the time-varying number of clusters based on a set of
feature vectors. The key idea is to exploit the physical principle of
gravitational force between mass units: streaming-in feature vectors are
considered as mass units of fixed position in the feature space, around which
mobile mass units are injected at each time instant. The cluster enumeration
exploits the fact that the highest attraction on the mobile mass units is
exerted by regions with a high density of feature vectors, i.e., gravitational
clusters. By sharing estimates among neighboring nodes via a
diffusion-adaptation scheme, cooperative and distributed cluster enumeration is
achieved. Numerical experiments concerning robustness against outliers,
convergence and computational complexity are conducted. The application in a
distributed cooperative multi-view camera network illustrates the applicability
to real-world problems.Comment: 12 pages, 9 figure
Dynamic Fuzzy c-Means (dFCM) Clustering and its Application to Calorimetric Data Reconstruction in High Energy Physics
In high energy physics experiments, calorimetric data reconstruction requires
a suitable clustering technique in order to obtain accurate information about
the shower characteristics such as position of the shower and energy
deposition. Fuzzy clustering techniques have high potential in this regard, as
they assign data points to more than one cluster,thereby acting as a tool to
distinguish between overlapping clusters. Fuzzy c-means (FCM) is one such
clustering technique that can be applied to calorimetric data reconstruction.
However, it has a drawback: it cannot easily identify and distinguish clusters
that are not uniformly spread. A version of the FCM algorithm called dynamic
fuzzy c-means (dFCM) allows clusters to be generated and eliminated as
required, with the ability to resolve non-uniformly distributed clusters. Both
the FCM and dFCM algorithms have been studied and successfully applied to
simulated data of a sampling tungsten-silicon calorimeter. It is seen that the
FCM technique works reasonably well, and at the same time, the use of the dFCM
technique improves the performance.Comment: 15 pages, 10 figures. It is accepted for publication in NIM
A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets
The term "outlier" can generally be defined as an observation that is significantly different from
the other values in a data set. The outliers may be instances of error or indicate events. The
task of outlier detection aims at identifying such outliers in order to improve the analysis of
data and further discover interesting and useful knowledge about unusual events within numerous
applications domains. In this paper, we report on contemporary unsupervised outlier detection
techniques for multiple types of data sets and provide a comprehensive taxonomy framework and
two decision trees to select the most suitable technique based on data set. Furthermore, we
highlight the advantages, disadvantages and performance issues of each class of outlier detection
techniques under this taxonomy framework
- …