thesis

Complex information networks – detecting community structure in bipartite networks

Abstract

The last decade has witnessed great expansion in research and study of complex networks. A complex network is a large-scale network that reflects the interactions between objects or components of complicated systems. These components, known as clusters, communities or modules, perform together in order to provide one or more functions of the system. A vast number of systems, from the brain to ecosystems, power grids and the Internet, criminal relationships and financial transactions, can all be described as large complex networks. For most complex networks, the complexity arises from the fact that the structure is highly irregular, complex and dynamically evolving in time; and that the observed patterns of interactions highly influence the behaviour of the entire system. One of the topological properties that can expose the hierarchical structure of complex networks is community structure. Community detection is a common problem in complex networks that consists in general of finding groups of densely connected nodes with few connections to nodes outside of a group. The lack of consensus on a definition for a community leads to extensive studies on community structure of complex networks in order to provide improved community detection methods. Community structure is a common and important topological characteristic of many real world complex networks. In particular, identifying communities in bipartite networks is an important task in many scientific domains. In a bipartite network, the node set consists of two disjoint sets of nodes, primary set (P) and secondary set (S), such that links between nodes may occur only if the nodes belong to different sets. There are really two approaches to identifying clusters in a bipartite network: the first, and more common, is when our real interest is in community structure within the primary node set P; and the second is when our real interest is in bipartite communities within the whole network. Thus, in this research we investigate and study the state-of-the-art of community detection algorithms, in particular, those to identify the communities in bipartite networks in order to provide us with a more complete understanding of the relationship between communities. The practical aim is to derive a coarse-grain description of the network topology that will aid understanding of its hierarchical structure. The research of the thesis consists of four main phases. First, one of the best algorithms for community detection in classical networks, Infomap, has not been adapted to the big and important class of bipartite networks. This research gap is one focus of the thesis. We integrate the weighted projection method for bipartite networks based on common neighbors similarity into Infomap, to acquire a weighted one mode network that can be clustered by this random walks technique. We apply this method to a number of real world bipartite networks, to detect significant community structure. We measure the performance of our approach based on the ground truth. This requires deep knowledge of the formation of relations within and between clusters in these real world networks. Although such investigation is excessively time consuming, and impractical or impossible in large networks, the result is much more accurate and more meaningful and gives us confidence that this method can be usefully applied to large networks where ground truth is not known. Second, several possible edge additions are conducted to test how random walks based algorithm, Infomap, performs when the minimal modification is made to convert a bipartite network to a nearly bipartite (but unipartite) network. The experiments on small bipartite networks obtain encouraging results. Third, we shift focus from community detection based on random walks to community detection based on the strongest communities possible in a bipartite network, which are bicliques. We develop a novel algorithm to identify overlapping communities at the base level of hierarchy in bipartite networks. We combine existing techniques (bicliques, cliques, structural equivalence) into a novel method to solve this new research problem. We classify the output communities into 5 categories based on community strength. From this base level, we apply the Jaccard index as a threshold in order to reduce the redundancy of overlapping communities, to obtain higher levels of the hierarchy. We compare results from our overlapping approach with other concurrent approaches not only directly to the ground truth, but also using a widely accepted scale for evaluating the quality of partitions, Normalized Mutual Information (NMI). In the last phase of the thesis, a large financial bipartite network collected during 6 months fieldwork is analysed and tested in order to reveal its hierarchical structure. We apply all methods presented in Chapter 3, Chapter 4 and Chapter 5. The main contribution of this thesis is an improved method to detect the hierarchical and overlapping community structure in bipartite complex networks based on structural equivalence of nodes. More generally, it aims to derive a coarse-grain depiction of real large-scale networks through structural properties of their identified communities as well as their performance with respect to the known ground truth

    Similar works