79 research outputs found
Outlier Edge Detection Using Random Graph Generation Models and Applications
Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edge-ego-network, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some real-world graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape
Community detection by consensus genetic-based algorithm for directed networks
Finding communities in networks is a commonly used form of network analysis. There is a myriad of community detection algorithms in the literature to perform this task. In spite of that, the number of community detection algorithms in directed networks is much lower than in undirected networks. However, evaluation measures to estimate the quality of communities in undirected networks nowadays have its adaptation to directed networks as, for example, the well-known modularity measure. This paper introduces a genetic-based consensus clustering to detect communities in directed networks with the directed modularity as the fitness function. Consensus strategies involve combining computational models to improve the quality of solutions generated by a single model. The reason behind the development of a consensus strategy relies on the fact that recent studies indicate that the modularity may fail in detecting expected clusterings. Computational experiments with artificial LFR networks show that the proposed method was very competitive in comparison to existing strategies in the literature. (C) 2016 The Authors. Published by Elsevier B.V.Instituto de Ciência e Tecnologia, Universidade Federal de São Paulo (UNIFESP) Av. Cesare M. G. Lattes, 1201, Eugênio de Mello, São José dos Campos-SP, CEP: 12247-014, BrasilInstituto de Ciência e Tecnologia, Universidade Federal de São Paulo (UNIFESP) Av. Cesare M. G. Lattes, 1201, Eugênio de Mello, São José dos Campos-SP, CEP: 12247-014, BrasilWeb of Scienc
Accelerating the Information-Theoretic Approach of Community Detection Using Distributed and Hybrid Memory Parallel Schemes
There are several approaches for discovering communities in a network (graph). Despite being approximating in nature, discovering communities based on the laws of Information Theory has a proven standard of accuracy. The information-theoretic algorithm known as Infomap developed a decade ago for detecting communities, did not foresee the tremendous growth of social networking, multimedia, and massive information boom. To discover communities in massive networks, we have designed a distributed-memory-parallel Infomap in the MPI framework. Our design reaches scalability of over 500 processes capable of processing networks with millions of edges while maintaining quality comparable to the sequential Infomap. We have further developed a novel parallel hybrid approach for Infomap consists of both distributed and shared memory parallelism using MPI and OpenMP frameworks. This achieves a speedup of more than 11x in processing a network of over 100 million edges which is significantly greater than the state-of-the-art techniques
Towards a hybrid recommendation approach using a community detection and evaluation algorithm
In social learning platforms, community detection algorithms are used to identify groups of learners with similar interests, behavior, and levels. While, recommendation algorithms personalize the learning experience based on learners' profile information, including interests and past behavior. Combining these algorithms can improve the recommendation quality by identifying learners with similar needs and interests for more accurate and relevant suggestions. Community detection enhances recommendations by identifying groups of learners with similar needs and interests. Leveraging their similarities, recommendation algorithms generate more accurate suggestions. In this article, we propose a novel approach that combines community detection and recommendation algorithms into a single framework to provide learners with personalized recommendations and opportunities for collaborative learning. Our proposed approach consists of three steps: first, applying the maximal clique-based algorithm to detect learning communities with common characteristics and interests; second, evaluating learners within their communities using static and dynamic evaluation; and third, generating personalized recommendations within each detected cluster using a recommendation system based on correlation and co-occurrence. To evaluate the effectiveness of our proposed approach, we conducted experiments on a real-world dataset. Our results show that our approach outperforms existing methods in terms of modularity, precision, and accuracy
An algorithm for network community structure detection by Surprise
The success of network science to describe many complex systems and their ubiquitous presence has brought the development of new, more efficient, methods of analysis to the spotlight. However, some problems still remain open. One of which, the focus of our work, is the determination of a network’s community structure. Even though there’s no consensual formal definition, communities come from the intuitive idea that nodes form subgroups in the larger networks. In this regard, many different algorithms have been proposed in order to identify such groups. Here we tackle this problem in two different fronts: first, we developed a new algorithm based on the Surprise function and secondly, we created a novel benchmark, a set of artificial networks with a seeded community structure, to compare the performance of competing algorithms. Our own Surpriser algorithm was tested against seven other methods from the literature in three different benchmarks. We show that the Surprise based methods are the most consistent among different benchmarks, with Surpriser having an edge over the competition. Finally, we show that our benchmark is the hardest of the three as very few algorithms are able to solve it.O sucesso da teoria dos grafos para descrever sistemas complexos, bem como a onipresença destes, deu muito destaque a elaboração de métodos eficientes para sua analise. No entanto, varias questões continuam em aberto. Uma delas, a qual nos dedicamos neste trabalho, é a obtenção das comunidades presentes nessas redes. Muito embora não exista um consenso formal sobre sua definição, a presença de comunidades vem da ideia intuitiva de que nós formam subgrupos dentro da rede. Neste sentido, muitos algoritmos diferentes foram propostos para identificar tais grupos. Aqui nós atacamos este problema em duas frentes: primeiro, desenvolvemos um novo algoritmo baseado na função Surprise e segundo, criamos um novo benchmark, um conjunto de redes artificiais com comunidades préestabelecidas, para comparar a performance de diferentes algoritmos. O nosso algoritmo, chamado Surpriser, foi testado contra sete outros métodos da literatura em três benchmarks diferentes. Nós mostramos que métodos baseados na Surprise são os mais consistentes nos diferentes benchmarks e que o nosso Surpriser leva uma vantagem sobre os últimos. Finalmente, mostramos que o nosso benchmark é o mais difÃcil dos três, pois poucos algoritmos conseguem resolve-lo
Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics
Background: Network communities help the functional organization and
evolution of complex networks. However, the development of a method, which is
both fast and accurate, provides modular overlaps and partitions of a
heterogeneous network, has proven to be rather difficult. Methodology/Principal
Findings: Here we introduce the novel concept of ModuLand, an integrative
method family determining overlapping network modules as hills of an influence
function-based, centrality-type community landscape, and including several
widely used modularization methods as special cases. As various adaptations of
the method family, we developed several algorithms, which provide an efficient
analysis of weighted and directed networks, and (1) determine pervasively
overlapping modules with high resolution; (2) uncover a detailed hierarchical
network structure allowing an efficient, zoom-in analysis of large networks;
(3) allow the determination of key network nodes and (4) help to predict
network dynamics. Conclusions/Significance: The concept opens a wide range of
possibilities to develop new approaches and applications including network
routing, classification, comparison and prediction.Comment: 25 pages with 6 figures and a Glossary + Supporting Information
containing pseudo-codes of all algorithms used, 14 Figures, 5 Tables (with 18
module definitions, 129 different modularization methods, 13 module
comparision methods) and 396 references. All algorithms can be downloaded
from this web-site: http://www.linkgroup.hu/modules.ph
- …