175 research outputs found
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
Over the past five decades, k-means has become the clustering algorithm of
choice in many application domains primarily due to its simplicity, time/space
efficiency, and invariance to the ordering of the data points. Unfortunately,
the algorithm's sensitivity to the initial selection of the cluster centers
remains to be its most serious drawback. Numerous initialization methods have
been proposed to address this drawback. Many of these methods, however, have
time complexity superlinear in the number of data points, which makes them
impractical for large data sets. On the other hand, linear methods are often
random and/or sensitive to the ordering of the data points. These methods are
generally unreliable in that the quality of their results is unpredictable.
Therefore, it is common practice to perform multiple runs of such methods and
take the output of the run that produces the best results. Such a practice,
however, greatly increases the computational requirements of the otherwise
highly efficient k-means algorithm. In this chapter, we investigate the
empirical performance of six linear, deterministic (non-random), and
order-invariant k-means initialization methods on a large and diverse
collection of data sets from the UCI Machine Learning Repository. The results
demonstrate that two relatively unknown hierarchical initialization methods due
to Su and Dy outperform the remaining four methods with respect to two
objective effectiveness criteria. In addition, a recent method due to Erisoglu
et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms
(Springer, 2014). arXiv admin note: substantial text overlap with
arXiv:1304.7465, arXiv:1209.196
Integrating rough set theory and medical applications
AbstractMedical science is not an exact science in which processes can be easily analyzed and modeled. Rough set theory has proven well suited for accommodating such inexactness of the medical profession. As rough set theory matures and its theoretical perspective is extended, the theory has been also followed by development of innovative rough sets systems as a result of this maturation. Unique concerns in medical sciences as well as the need of integrated rough sets systems are discussed. We present a short survey of ongoing research and a case study on integrating rough set theory and medical application. Issues in the current state of rough sets in advancing medical technology and some of its challenges are also highlighted
A Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising clustering technology to identify
clusters based on their associations with subspaces in high dimensional spaces.
SC can be classified into hard subspace clustering (HSC) and soft subspace
clustering (SSC). While HSC algorithms have been extensively studied and well
accepted by the scientific community, SSC algorithms are relatively new but
gaining more attention in recent years due to better adaptability. In the
paper, a comprehensive survey on existing SSC algorithms and the recent
development are presented. The SSC algorithms are classified systematically
into three main categories, namely, conventional SSC (CSSC), independent SSC
(ISSC) and extended SSC (XSSC). The characteristics of these algorithms are
highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review
Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published âFuzzy Setsâ [335]. After only one year, the first effects of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory
Analysis and Detection of Outliers in GNSS Measurements by Means of Machine Learning Algorithms
L'abstract è presente nell'allegato / the abstract is in the attachmen
A Review of using Data Mining Techniques in Power Plants
Data mining techniques and their applications have developed rapidly during the last two decades. This paper reviews application of data mining techniques in power systems, specially in power plants, through a survey of literature between the year 2000 and 2015. Keyword indices, articlesâ abstracts and conclusions were used to classify more than 86 articles about application of data mining in power plants, from many academic journals and research centers. Because this paper concerns about application of data mining in power plants; the paper started by providing a brief introduction about data mining and power systems to give the reader better vision about these two different disciplines. This paper presents a comprehensive survey of the collected articles and classifies them according to three categories: the used techniques, the problem and the application area. From this review we found that data mining techniques (classification, regression, clustering and association rules) could be used to solve many types of problems in power plants, like predicting the amount of generated power, failure prediction, failure diagnosis, failure detection and many others. Also there is no standard technique that could be used for a specific problem. Application of data mining in power plants is a rich research area and still needs more exploration
- âŚ