Search CORE

4,985 research outputs found

Towards Optimized K Means Clustering using Nature-inspired Algorithms for Software Bug Prediction

Author: Geerish Suddul
Kumar Dookhitram
Tameswar Kajal
Publication venue: Global Journals Inc. (US)
Publication date: 20/05/2023
Field of study

In today s software development environment the necessity for providing quality software products has undoubtedly remained the largest difficulty As a result early software bug prediction in the development phase is critical for lowering maintenance costs and improving overall software performance Clustering is a well-known unsupervised method for data classification and finding related patterns hidden in dataset

Global Journal of Computer Science and Technology (GJCST)

Clustering: finding patterns in the darkness

Author: Menéndez H.
Menéndez H.
Publication venue: Endless Science Ltd
Publication date: 01/01/2021
Field of study

Machine learning is changing the world and fuelling Industry 4.0. These statistical methods focused on identifying patterns in data to provide an intelligent response to specific requests. Although understanding data tends to require expert knowledge to supervise the decision-making process, some techniques need no supervision. These unsupervised techniques can work blindly but they are based on data similarity. One of the most popular areas in this field is clustering. Clustering groups data to guarantee that the clusters’ elements have a strong similarity while the clusters are distinct among them. This field started with the K-means algorithm, one of the most popular algorithms in machine learning with extensive applications. Currently, there are multiple strategies to deal with the clustering problem. This review introduces some of the classical algorithms, focusing significantly on algorithms based on evolutionary computation, and explains some current applications of clustering to large datasets

Middlesex University Research Repository

Recommended from our members

A niching memetic algorithm for simultaneous clustering and feature selection

Author: Fairhurst M
Liu X
Sheng W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2008
Field of study

Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data

Brunel University Research Archive

Using Self-Organizing Maps to Visualize, Filter and Cluster Multidimensional Bio-Omics Data

Author: Fang Hai
Zhang Ji
Publication venue: 'IntechOpen'
Publication date: 21/11/2012
Field of study

IntechOpen

Silhouette + Attraction: A Simple and Effective Method for Text Clustering

Author: Azzag
Banerjee
Barnette
Berry
Cagnina
Choi
Cutting
Errecalde
Errecalde
Fisher
Hu
Ingaramo
Ingaramo
Ingaramo
Ingaramo
Ingaramo
Ingaramo
Karthikeyan
Kyriakopoulou
LETICIA C. CAGNINA
Levene
Liu
MacQueen
MARCELO L. ERRECALDE
Neto
Ng
Ng
PAOLO ROSSO
Paterlini
Popova
Slonim
Stein
Stein
Stein
Stein
Steinbach
Steinberger
Takeda
Tan
Tukey
Tukey
van Rijsbergen
Xu
Zha
Zhang
Zhou
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/08/2015
Field of study

[EN] This article presents silhouette attraction (Sil Att), a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows us to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The experimental work shows that Sil Att is able to obtain high-quality results on text corpora with very different characteristics. Furthermore, its stable performance on all the considered corpora is indicative that it is a very robust method. This is a very interesting positive aspect of Sil Att with respect to the other algorithms used in the experiments, whose performances heavily depend on specific characteristics of the corpora being considered.This research work has been partially funded by UNSL, CONICET (Argentina), DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) research project, and the WIQ-EI IRSES project (grant no. 269180) within the FP 7 Marie Curie People Framework on Web Information Quality Evaluation Initiative. The work of the third author was done also in the framework of the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Errecalde, M.; Cagnina, L.; Rosso, P. (2015). Silhouette + Attraction: A Simple and Effective Method for Text Clustering. Natural Language Engineering. 1-40. https://doi.org/10.1017/S1351324915000273S140Zhao, Y., & Karypis, G. (2004). Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning, 55(3), 311-331. doi:10.1023/b:mach.0000027785.44527.d6Tu, L., & Chen, Y. (2009). Stream data clustering based on grid density and attraction. ACM Transactions on Knowledge Discovery from Data, 3(3), 1-27. doi:10.1145/1552303.1552305Yang, T., Jin, R., Chi, Y., & Zhu, S. (2009). Combining link and content for community detection. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’09. doi:10.1145/1557019.1557120Zhao, Y., Karypis, G., & Fayyad, U. (2005). Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery, 10(2), 141-168. doi:10.1007/s10618-005-0361-3Kaufman, L., & Rousseeuw, P. J. (Eds.). (1990). Finding Groups in Data. Wiley Series in Probability and Statistics. doi:10.1002/9780470316801Karypis, G., Eui-Hong Han, & Kumar, V. (1999). Chameleon: hierarchical clustering using dynamic modeling. Computer, 32(8), 68-75. doi:10.1109/2.781637Cagnina, L., Errecalde, M., Ingaramo, D., & Rosso, P. (2014). An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences, 265, 36-49. doi:10.1016/j.ins.2013.12.010He, H., Chen, B., Xu, W., & Guo, J. (2007). Short Text Feature Extraction and Clustering for Web Topic Mining. Third International Conference on Semantics, Knowledge and Grid (SKG 2007). doi:10.1109/skg.2007.76Spearman, C. (1904). The Proof and Measurement of Association between Two Things. The American Journal of Psychology, 15(1), 72. doi:10.2307/1412159Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. doi:10.1016/0377-0427(87)90125-7Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. doi:10.1017/cbo9780511809071Qi, G.-J., Aggarwal, C. C., & Huang, T. (2012). Community Detection with Edge Content in Social Media Networks. 2012 IEEE 28th International Conference on Data Engineering. doi:10.1109/icde.2012.77Daxin Jiang, Jian Pei, & Aidong Zhang. (s. f.). DHC: a density-based hierarchical clustering method for time series gene expression data. Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings. doi:10.1109/bibe.2003.1188978Charikar, M., Chekuri, C., Feder, T., & Motwani, R. (2004). Incremental Clustering and Dynamic Information Retrieval. SIAM Journal on Computing, 33(6), 1417-1440. doi:10.1137/s0097539702418498Selim, S. Z., & Alsultan, K. (1991). A simulated annealing algorithm for the clustering problem. Pattern Recognition, 24(10), 1003-1008. doi:10.1016/0031-3203(91)90097-oAranganayagi, S., & Thangavel, K. (2007). Clustering Categorical Data Using Silhouette Coefficient as a Relocating Measure. International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007). doi:10.1109/iccima.2007.328Makagonov, P., Alexandrov, M., & Gelbukh, A. (2004). Clustering Abstracts Instead of Full Texts. Lecture Notes in Computer Science, 129-135. doi:10.1007/978-3-540-30120-2_17Jing L. 2005. Survey of text clustering. Technical report. Department of Mathematics. The University of Hong Kong, Hong Kong, China.Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423. doi:10.1002/j.1538-7305.1948.tb01338.xHearst, M. A. (2006). Clustering versus faceted categories for information exploration. Communications of the ACM, 49(4), 59. doi:10.1145/1121949.1121983Alexandrov, M., Gelbukh, A., & Rosso, P. (2005). An Approach to Clustering Abstracts. Lecture Notes in Computer Science, 275-285. doi:10.1007/11428817_25Dos Santos, J. B., Heuser, C. A., Moreira, V. P., & Wives, L. K. (2011). Automatic threshold estimation for data matching applications. Information Sciences, 181(13), 2685-2699. doi:10.1016/j.ins.2010.05.029Hasan, M. A., Chaoji, V., Salem, S., & Zaki, M. J. (2009). Robust partitional clustering by outlier and density insensitive seeding. Pattern Recognition Letters, 30(11), 994-1002. doi:10.1016/j.patrec.2009.04.013Dunn†, J. C. (1974). Well-Separated Clusters and Optimal Fuzzy Partitions. Journal of Cybernetics, 4(1), 95-104. doi:10.1080/01969727408546059Carullo, M., Binaghi, E., & Gallo, I. (2009). An online document clustering technique for short web contents. Pattern Recognition Letters, 30(10), 870-876. doi:10.1016/j.patrec.2009.04.001Kruskal, W. H., & Wallis, W. A. (1952). Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association, 47(260), 583-621. doi:10.1080/01621459.1952.10483441Bezdek, J. C., & Pal, N. R. (s. f.). Cluster validation with generalized Dunn’s indices. Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems. doi:10.1109/annes.1995.499469Brun, M., Sima, C., Hua, J., Lowey, J., Carroll, B., Suh, E., & Dougherty, E. R. (2007). Model-based evaluation of clustering validation measures. Pattern Recognition, 40(3), 807-824. doi:10.1016/j.patcog.2006.06.026Davies, D. L., & Bouldin, D. W. (1979). A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224-227. doi:10.1109/tpami.1979.4766909Pinto, D., & Rosso, P. (s. f.). On the Relative Hardness of Clustering Corpora. Lecture Notes in Computer Science, 155-161. doi:10.1007/978-3-540-74628-7_22Pons-Porrata, A., Berlanga-Llavori, R., & Ruiz-Shulcloper, J. (2007). Topic discovery based on text mining techniques. Information Processing & Management, 43(3), 752-768. doi:10.1016/j.ipm.2006.06.001Pinto, D., Benedí, J.-M., & Rosso, P. (2007). Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance. Lecture Notes in Computer Science, 611-622. doi:10.1007/978-3-540-70939-8_5

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

RiuNet

Intelligent Modelling of the Environmental Behaviour of Chemicals

Author: Kumar Shefali (gnd: 135777798)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

In view of the new European Union chemical policy REACH (Registration, Evaluation, and Authorization of Chemicals), interest in "non-animal" methods for assessing the risk potentials of chemicals towards human health and environment has increased. The incapability of classical modelling approaches in the complex and ill-defined modelling problems of chemicals' environmental behavior, together with an availability of large computing power in modern times raise an interest in applying computational models inspired by the approaches coming from the area of artificial intelligence. This thesis is devoted to promote the applications of neuro/fuzzy techniques in assessing the environmental behavior of chemicals. Some of the bottlenecks lying in the neuro/fuzzy modelling of chemicals' behavior towards environment have been identified and the solutions have been provided based on the techniques of computational intelligence.Diese Dissertation beinhaltet die Anwendung von neuronalen bzw. fuzzy Netzen, um das Umweltverhalten von Chemikalien beurteilen zu können. In dieser Arbeit werden die Probleme der Modellierung von Chemikalien gegenüber der Umwelt aufgezeigt und Lösungen angeboten. Die Lösungen basieren auf künstlichen Intelligenztechniken. Die Qualität der Modellierungstechniken hängt von mehreren Faktoren ab, z.B. der Eingabe, der Struktur und so weiter. In vielen Fällen werden keine geeigneten Resultate erhalten. So läuft es auf die Entwicklung eines Modells mit einer niedrigen Generalisierungsfähigkeit (Verallgemeinerungsfähigkeit)hinaus

Rostocker Dokumentenserver

Methods for Analysing Endothelial Cell Shape and Behaviour in Relation to the Focal Nature of Atherosclerosis

Author: Iftikhar Saadia
Iftikhar Saadia
Publication venue: Bioengineering, Imperial College London
Publication date: 01/06/2011
Field of study

The aim of this thesis is to develop automated methods for the analysis of the spatial patterns, and the functional behaviour of endothelial cells, viewed under microscopy, with applications to the understanding of atherosclerosis. Initially, a radial search approach to segmentation was attempted in order to trace the cell and nuclei boundaries using a maximum likelihood algorithm; it was found inadequate to detect the weak cell boundaries present in the available data. A parametric cell shape model was then introduced to fit an equivalent ellipse to the cell boundary by matching phase-invariant orientation fields of the image and a candidate cell shape. This approach succeeded on good quality images, but failed on images with weak cell boundaries. Finally, a support vector machines based method, relying on a rich set of visual features, and a small but high quality training dataset, was found to work well on large numbers of cells even in the presence of strong intensity variations and imaging noise. Using the segmentation results, several standard shear-stress dependent parameters of cell morphology were studied, and evidence for similar behaviour in some cell shape parameters was obtained in in-vivo cells and their nuclei. Nuclear and cell orientations around immature and mature aortas were broadly similar, suggesting that the pattern of flow direction near the wall stayed approximately constant with age. The relation was less strong for the cell and nuclear length-to-width ratios. Two novel shape analysis approaches were attempted to find other properties of cell shape which could be used to annotate or characterise patterns, since a wide variability in cell and nuclear shapes was observed which did not appear to fit the standard parameterisations. Although no firm conclusions can yet be drawn, the work lays the foundation for future studies of cell morphology. To draw inferences about patterns in the functional response of cells to flow, which may play a role in the progression of disease, single-cell analysis was performed using calcium sensitive florescence probes. Calcium transient rates were found to change with flow, but more importantly, local patterns of synchronisation in multi-cellular groups were discernable and appear to change with flow. The patterns suggest a new functional mechanism in flow-mediation of cell-cell calcium signalling

Spiral - Imperial College Digital Repository

A review of quantum-inspired metaheuristic algorithms for automatic clustering

Author: Bhattacharyya Siddhartha
Dey Alokananda
Dey Sandip
Konar Debanjan
Mršić Leo
Pal Pankaj
Platoš Jan
Snášel Václav
Publication venue: MDPI
Publication date: 01/01/2023
Field of study

In real-world scenarios, identifying the optimal number of clusters in a dataset is a difficult task due to insufficient knowledge. Therefore, the indispensability of sophisticated automatic clus tering algorithms for this purpose has been contemplated by some researchers. Several automatic clustering algorithms assisted by quantum-inspired metaheuristics have been developed in recent years. However, the literature lacks definitive documentation of the state-of-the-art quantum-inspired metaheuristic algorithms for automatically clustering datasets. This article presents a brief overview of the automatic clustering process to establish the importance of making the clustering process automatic. The fundamental concepts of the quantum computing paradigm are also presented to highlight the utility of quantum-inspired algorithms. This article thoroughly analyses some algo rithms employed to address the automatic clustering of various datasets. The reviewed algorithms were classified according to their main sources of inspiration. In addition, some representative works of each classification were chosen from the existing works. Thirty-six such prominent algorithms were further critically analysed based on their aims, used mechanisms, data specifications, merits and demerits. Comparative results based on the performance and optimal computational time are also presented to critically analyse the reviewed algorithms. As such, this article promises to provide a detailed analysis of the state-of-the-art quantum-inspired metaheuristic algorithms, while highlighting their merits and demerits.Web of Science119art. no. 201

DSpace at VSB Technical University of Ostrava

Improved K-means clustering algorithms : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, New Zealand

Author: Liu Tong
Publication venue: 'Massey University'
Publication date: 01/01/2020
Field of study

K-means clustering algorithm is designed to divide the samples into subsets with the goal that maximizes the intra-subset similarity and inter-subset dissimilarity where the similarity measures the relationship between two samples. As an unsupervised learning technique, K-means clustering algorithm is considered one of the most used clustering algorithms and has been applied in a variety of areas such as artificial intelligence, data mining, biology, psychology, marketing, medicine, etc. K-means clustering algorithm is not robust and its clustering result depends on the initialization, the similarity measure, and the predefined cluster number. Previous research focused on solving a part of these issues but has not focused on solving them in a unified framework. However, fixing one of these issues does not guarantee the best performance. To improve K-means clustering algorithm, one of the most famous and widely used clustering algorithms, by solving its issues simultaneously is challenging and significant. This thesis conducts an extensive research on K-means clustering algorithm aiming to improve it. First, we propose the Initialization-Similarity (IS) clustering algorithm to solve the issues of the initialization and the similarity measure of K-means clustering algorithm in a unified way. Specifically, we propose to fix the initialization of the clustering by using sum-of-norms (SON) which outputs the new representation of the original samples and to learn the similarity matrix based on the data distribution. Furthermore, the derived new representation is used to conduct K-means clustering. Second, we propose a Joint Feature Selection with Dynamic Spectral (FSDS) clustering algorithm to solve the issues of the cluster number determination, the similarity measure, and the robustness of the clustering by selecting effective features and reducing the influence of outliers simultaneously. Specifically, we propose to learn the similarity matrix based on the data distribution as well as adding the ranked constraint on the Laplacian matrix of the learned similarity matrix to automatically output the cluster number. Furthermore, the proposed algorithm employs the L2,1-norm as the sparse constraints on the regularization term and the loss function to remove the redundant features and reduce the influence of outliers respectively. Third, we propose a Joint Robust Multi-view (JRM) spectral clustering algorithm that conducts clustering for multi-view data while solving the initialization issue, the cluster number determination, the similarity measure learning, the removal of the redundant features, and the reduction of outlier influence in a unified way. Finally, the proposed algorithms outperformed the state-of-the-art clustering algorithms on real data sets. Moreover, we theoretically prove the convergences of the proposed optimization methods for the proposed objective functions

Massey Research Online