Community detection in networks by soft modularity maximization : A new approach and empirical comparisons

Abstract

Community detection in networks is one of the major fundamentals of the science of networks. This is an emerging discipline and part of the computing sciences. It purports to study networks data and, especially, analyze the links and interconnections within these networks. Nevertheless, it did not attract significant interest until the rapid growth of the Internet in the early 2000's, as it became more and more popular and extended to diverse scientific areas such as physics, biology, ecology, marketing, etc... In general terms, a graph is a mathematical object composed of elements called "nodes" which can be connected two-by-two by an edge if there is any relation between them. As the science of networks spreads to more and more sectors, we can find networks in a growing number of contexts. Among these networks, one of the best-known is the World Wide Web, within which web pages are interconnected by hyperlinks. Another, more recent example of networks is Facebook, the well-known social media through which people connect with each other on the basis of friendships or any other characteristics which they are likely to share. The aim of this thesis is to examine a characteristic feature of any network: community structure, in particular the detection of these communities using clustering methods. Clustering groups nodes according to their similarities or to their difference in communities without knowing beforehand the class labels underlying the graph. Thus, the clustering algorithms generally allow to achieve a partition of the distribution of every nodes in the various communities. In this classic vision of clustering, every node is thus assigned to a single community. Nevertheless, this view was recently somewhat contradicted by the appearance of the concept of fuzzy communities, in which a node may be in more than one community at a time. In this concept, the communities may overlap and the structure of communities of the graph becomes more complex to analyze. That is why we introduce in this thesis two new clustering algorithms allowing us to find a fuzzy partition of communities in a network. These new algorithms are based on a measure of closeness called modularity and introduced by the physicist J. Newman which we modified to obtain a fuzzy version that allows us to meet new expectations in terms of detection of communities. The purpose of this thesis is to study the performances of our two new algorithms regarding detection of communities by comparing them with other methods of clustering which are already well-established in the science of networks. To direct our study, we posed two research questions: • Are the entropy based soft modularity and the deterministic annealing entropy based soft modularity algorithms competitive compared to the kernel k-means algorithms whenever we use the natural numbers of clusters ? • Are the entropy based soft modularity and the deterministic annealing entropy based soft modularity algorithms competitive compared to the kernel k-means algorithms and the Louvain method whenever the number of clusters has not been determined in advance ? To answer these questions, we are going to conduct two different experiments. In the first one, we shall compare our algorithms with four kernel k-means: the Sigmoid Commute Time, the Sigmoid Corrected Commute Time, the Log Forest and the Free Energy, by using the natural number of clusters for each dataset. In the second experiment, we shall once again compare our algorithms to four kernel k-means but also to the Louvain method, though in this case, we will not determine the number of clusters beforehand. We will thus have to define it empirically for each dataset and each algorithm, except for the Louvain method which, by itself, returns a certain number of clusters.Master [120] en Ingénieur de gestion, Université catholique de Louvain, 201

    Similar works

    Full text

    thumbnail-image

    Available Versions