Search CORE

4,544 research outputs found

A Short Survey on Data Clustering Algorithms

Author: Wong Ka-Chun
Publication venue
Publication date: 25/11/2015
Field of study

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end

arXiv.org e-Print Archive

Crossref

Optimizing an Organized Modularity Measure for Topographic Graph Clustering: a Deterministic Annealing Approach

Author: Becker
Bishop
Blondel
Boulet
Butts
Cerny
Di Battista
Duch
Eades
Fabrice Rossi
Fabrikant
Fortunato
Fruchterman
Golub
Graepel
Graepel
Guimera
Herman
Hofmann
Jaakkola
Knuth
Lee
Lehmann
Nathalie Villa-Vialaneix
Newman
Newman
Newman
Newman
Newman
Noack
Noack
Purchase
Reichardt
Rose
Schaeffer
Schölkopf
Vesanto
von Luxburg
Ware
Wasserman
Watts
Yen
Zachary
Publication venue: 'Elsevier BV'
Publication date: 01/03/2010
Field of study

This paper proposes an organized generalization of Newman and Girvan's modularity measure for graph clustering. Optimized via a deterministic annealing scheme, this measure produces topologically ordered graph clusterings that lead to faithful and readable graph representations based on clustering induced graphs. Topographic graph clustering provides an alternative to more classical solutions in which a standard graph clustering method is applied to build a simpler graph that is then represented with a graph layout algorithm. A comparative study on four real world graphs ranging from 34 to 1 133 vertices shows the interest of the proposed approach with respect to classical solutions and to self-organizing maps for graphs

arXiv.org e-Print Archive

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Author: Korenblum Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

arXiv.org e-Print Archive

Directory of Open Access Journals

Hierarchically Clustered Adaptive Quantization CMAC and Its Learning Convergence

Author: Lai Edmund M-K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2007
Field of study

No abstract availabl

Massey Research Online

Modularity revisited: A novel dynamics-based concept for decomposing complex networks

Author: Bruckner S.
Conrad T. O. F.
Djurdjevac N.
Sarich M.
Schütte Ch.
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 01/06/2014
Field of study

Finding modules (or clusters) in large, complex networks is a challenging task, in particular if one is not interested in a full decomposition of the whole network into modules. We consider modular networks that also contain nodes that do not belong to one of modules but to several or to none at all. A new method for analyzing such networks is presented. It is based on spectral analysis of random walks on modular networks. In contrast to other spectral clustering approaches, we use different transition rules of the random walk. This leads to much more prominent gaps in the spectrum of the adapted random walk and allows for easy identification of the network's modular structure, and also identifying the nodes belonging to these modules. We also give a characterization of that set of nodes that do not belong to any module, which we call transition region. Finally, by analyzing the transition region, we describe an algorithm that identifies so called hub-nodes inside the transition region that are important connections between modules or between a module and the rest of the network. The resulting algorithms scale linearly with network size (if the network connectivity is sparse) and thus can also be applied to very large networks

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Customer churn prediction in telecom using machine learning and social network analysis in big data platform

Author: Ahmad Abdelrahim Kasem
Aljoumaa Kadan
Jafar Assef
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2019
Field of study

Customer churn is a major problem and one of the most important concerns for large companies. Due to the direct effect on the revenues of the companies, especially in the telecom field, companies are seeking to develop means to predict potential customer to churn. Therefore, finding factors that increase customer churn is important to take necessary actions to reduce this churn. The main contribution of our work is to develop a churn prediction model which assists telecom operators to predict customers who are most likely subject to churn. The model developed in this work uses machine learning techniques on big data platform and builds a new way of features' engineering and selection. In order to measure the performance of the model, the Area Under Curve (AUC) standard measure is adopted, and the AUC value obtained is 93.3%. Another main contribution is to use customer social network in the prediction model by extracting Social Network Analysis (SNA) features. The use of SNA enhanced the performance of the model from 84 to 93.3% against AUC standard. The model was prepared and tested through Spark environment by working on a large dataset created by transforming big raw data provided by SyriaTel telecom company. The dataset contained all customers' information over 9 months, and was used to train, test, and evaluate the system at SyriaTel. The model experimented four algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM" and Extreme Gradient Boosting "XGBOOST". However, the best results were obtained by applying XGBOOST algorithm. This algorithm was used for classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK

arXiv.org e-Print Archive

Directory of Open Access Journals

Network Analysis of Microarray Data

Author: Cattelani Luca
Federico Antonio
Greco Dario
Pavel Alisa
Serra Angela
Publication venue: Springer, UK
Publication date: 01/01/2022
Field of study

DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.Peer reviewe

Helsingin yliopiston digitaalinen arkisto