Search CORE

20,122 research outputs found

Clustering and Community Detection in Directed Networks: A Survey

Author: Malliaros Fragkiskos D.
Vazirgiannis Michalis
Publication venue: 'Elsevier BV'
Publication date: 05/08/2013
Field of study

Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed - in the sense that there is directionality on the edges, making the semantics of the edges non symmetric. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of applications. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs - with clustering being the primary method and tool for community detection and evaluation. The goal of this paper is to offer an in-depth review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms capitalize on. Then we present the relevant work along two orthogonal classifications. The first one is mostly concerned with the methodological principles of the clustering algorithms, while the second one approaches the methods from the viewpoint regarding the properties of a good cluster in a directed network. Further, we present methods and metrics for evaluating graph clustering results, demonstrate interesting application domains and provide promising future research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear

arXiv.org e-Print Archive

CiteSeerX

Towards combinatorial clustering: preliminary research survey

Author: Levin Mark Sh.
Publication venue
Publication date: 28/05/2015
Field of study

The paper describes clustering problems from the combinatorial viewpoint. A brief systemic survey is presented including the following: (i) basic clustering problems (e.g., classification, clustering, sorting, clustering with an order over cluster), (ii) basic approaches to assessment of objects and object proximities (i.e., scales, comparison, aggregation issues), (iii) basic approaches to evaluation of local quality characteristics for clusters and total quality characteristics for clustering solutions, (iv) clustering as multicriteria optimization problem, (v) generalized modular clustering framework, (vi) basic clustering models/methods (e.g., hierarchical clustering, k-means clustering, minimum spanning tree based clustering, clustering as assignment, detection of clisue/quasi-clique based clustering, correlation clustering, network communities based clustering), Special attention is targeted to formulation of clustering as multicriteria optimization models. Combinatorial optimization models are used as auxiliary problems (e.g., assignment, partitioning, knapsack problem, multiple choice problem, morphological clique problem, searching for consensus/median for structures). Numerical examples illustrate problem formulations, solving methods, and applications. The material can be used as follows: (a) a research survey, (b) a fundamental for designing the structure/architecture of composite modular clustering software, (c) a bibliography reference collection, and (d) a tutorial.Comment: 102 pages, 66 figures, 67 table

arXiv.org e-Print Archive

Accelerating drug repurposing for COVID-19 via modeling drug mechanism of action with large scale gene-expression profiles

Author: Gao S. Q.
Han Lu
Shan G. C.
Wang H. Y.
Zhou W. X.
Publication venue
Publication date: 15/05/2020
Field of study

The novel coronavirus disease, named COVID-19, emerged in China in December 2019, and has rapidly spread around the world. It is clearly urgent to fight COVID-19 at global scale. The development of methods for identifying drug uses based on phenotypic data can improve the efficiency of drug development. However, there are still many difficulties in identifying drug applications based on cell picture data. This work reported one state-of-the-art machine learning method to identify drug uses based on the cell image features of 1024 drugs generated in the LINCS program. Because the multi-dimensional features of the image are affected by non-experimental factors, the characteristics of similar drugs vary greatly, and the current sample number is not enough to use deep learning and other methods are used for learning optimization. As a consequence, this study is based on the supervised ITML algorithm to convert the characteristics of drugs. The results show that the characteristics of ITML conversion are more conducive to the recognition of drug functions. The analysis of feature conversion shows that different features play important roles in identifying different drug functions. For the current COVID-19, Chloroquine and Hydroxychloroquine achieve antiviral effects by inhibiting endocytosis, etc., and were classified to the same community. And Clomiphene in the same community inibited the entry of Ebola Virus, indicated a similar MoAs that could be reflected by cell image.Comment: 22 page

arXiv.org e-Print Archive

Bridging belief function theory to modern machine learning

Author: Burger Thomas
Publication venue
Publication date: 15/04/2015
Field of study

Machine learning is a quickly evolving field which now looks really different from what it was 15 years ago, when classification and clustering were major issues. This document proposes several trends to explore the new questions of modern machine learning, with the strong afterthought that the belief function framework has a major role to play

arXiv.org e-Print Archive

Survey of state-of-the-art mixed data clustering algorithms

Author: Ahmad Amir
Khan Shehroz S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/03/2019
Field of study

Mixed data comprises both numeric and categorical features, and mixed datasets occur frequently in many domains, such as health, finance, and marketing. Clustering is often applied to mixed datasets to find structures and to group similar objects for further analysis. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation or averaging, to the feature values of these datasets. In this paper, we present a taxonomy for the study of mixed data clustering algorithms by identifying five major research themes. We then present a state-of-the-art review of the research works within each research theme. We analyze the strengths and weaknesses of these methods with pointers for future research directions. Lastly, we present an in-depth analysis of the overall challenges in this field, highlight open research questions and discuss guidelines to make progress in the field.Comment: 20 Pages, 2 columns, 6 Tables, 209 Reference

arXiv.org e-Print Archive

Orthogonal symmetric non-negative matrix factorization under the stochastic block model

Author: Chen Yuguo
Paul Subhadeep
Publication venue
Publication date: 17/05/2016
Field of study

We present a method based on the orthogonal symmetric non-negative matrix tri-factorization of the normalized Laplacian matrix for community detection in complex networks. While the exact factorization of a given order may not exist and is NP hard to compute, we obtain an approximate factorization by solving an optimization problem. We establish the connection of the factors obtained through the factorization to a non-negative basis of an invariant subspace of the estimated matrix, drawing parallel with the spectral clustering. Using such factorization for clustering in networks is motivated by analyzing a block-diagonal Laplacian matrix with the blocks representing the connected components of a graph. The method is shown to be consistent for community detection in graphs generated from the stochastic block model and the degree corrected stochastic block model. Simulation results and real data analysis show the effectiveness of these methods under a wide variety of situations, including sparse and highly heterogeneous graphs where the usual spectral clustering is known to fail. Our method also performs better than the state of the art in popular benchmark network datasets, e.g., the political web blogs and the karate club data.Comment: 35 pages, 3 figure

arXiv.org e-Print Archive

Hybrid Clustering based on Content and Connection Structure using Joint Nonnegative Matrix Factorization

Author: Drake Barry
Du Rundong
Park Haesun
Publication venue
Publication date: 28/03/2017
Field of study

We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for the joint NMF objective function, based on a block coordinate descent (BCD) framework. The proposed hybrid method discovers content associations via latent connections found using SymNMF. The method can also be applied with a natural conversion of the problem when a hypergraph formulation is used or the content is associated with hypergraph edges. Experimental results show that by simultaneously utilizing both content and connection structure, our hybrid method produces higher quality clustering results compared to the other NMF clustering methods that uses content alone (standard NMF) or connection structure alone (SymNMF). We also present some interesting applications to several types of real world data such as citation recommendations of papers. The hybrid method proposed in this paper can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures.Comment: 9 pages, Submitted to a conference, Feb. 201

arXiv.org e-Print Archive

A Memetic Algorithm for the Minimum Conductance Graph Partitioning Problem

Author: Chalupa David
Publication venue
Publication date: 10/04/2017
Field of study

The minimum conductance problem is an NP-hard graph partitioning problem. Apart from the search for bottlenecks in complex networks, the problem is very closely related to the popular area of network community detection. In this paper, we tackle the minimum conductance problem as a pseudo-Boolean optimisation problem and propose a memetic algorithm to solve it. An efficient local search strategy is established. Our memetic algorithm starts by using this local search strategy with different random strings to sample a set of diverse initial solutions. This is followed by an evolutionary phase based on a steady-state framework and two intensification subroutines. We compare the algorithm to a wide range of multi-start local search approaches and classical genetic algorithms with different crossover operators. The experimental results are presented for a diverse set of real-world networks. These results indicate that the memetic algorithm outperforms the alternative stochastic approaches

arXiv.org e-Print Archive

Recommended from our members

Community detection in network analysis: a survey

Author: Zhang Lingjia
Publication venue
Publication date: 13/10/2016
Field of study

The existence of community structures in networks is not unusual, including in the domains of sociology, biology, and business, etc. The characteristic of the community structure is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. In academia, there is a surge in research efforts on community detection in network analysis, especially in developing statistically sound methodologies for exploring, modeling, and interpreting these kind of structures and relationships. This survey paper aims to provide a brief review of current applicable statistical methodologies and approaches in a comparative manner along with metrics for evaluating graph clustering results and application using R. At the end, we provide promising future research directions.Statistic

Texas ScholarWorks

Identification of Overlapping Communities via Constrained Egonet Tensor Decomposition

Author: Giannakis Georgios B.
Sheikholeslami Fatemeh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/07/2017
Field of study

Detection of overlapping communities in real-world networks is a generally challenging task. Upon recognizing that a network is in fact the union of its egonets, a novel network representation using multi-way data structures is advocated in this contribution. The introduced sparse tensor-based representation exhibits richer structure compared to its matrix counterpart, and thus enables a more robust approach to community detection. To leverage this structure, a constrained tensor approximation framework is introduced using PARAFAC decomposition. The arising constrained trilinear optimization is handled via alternating minimization, where intermediate subproblems are solved using the alternating direction method of multipliers (ADMM) to ensure convergence. The factors obtained provide soft community memberships, which can further be exploited for crisp, and possibly-overlapping community assignments. The framework is further broadened to include time-varying graphs, where the edgeset as well as the underlying communities evolve through time. Performance of the proposed approach is assessed via tests on benchmark synthetic graphs as well as real-world networks. As corroborated by numerical tests, the proposed tensor-based representation captures multi-hop nodal connections, that is, connectivity patterns within single-hop neighbors, whose exploitation yields a more robust community identification in the presence of mixing as well as overlapping communities

arXiv.org e-Print Archive