79 research outputs found

    Outlier Edge Detection Using Random Graph Generation Models and Applications

    Get PDF
    Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape

    Community detection by consensus genetic-based algorithm for directed networks

    Get PDF
    Finding communities in networks is a commonly used form of network analysis. There is a myriad of community detection algorithms in the literature to perform this task. In spite of that, the number of community detection algorithms in directed networks is much lower than in undirected networks. However, evaluation measures to estimate the quality of communities in undirected networks nowadays have its adaptation to directed networks as, for example, the well-known modularity measure. This paper introduces a genetic-based consensus clustering to detect communities in directed networks with the directed modularity as the fitness function. Consensus strategies involve combining computational models to improve the quality of solutions generated by a single model. The reason behind the development of a consensus strategy relies on the fact that recent studies indicate that the modularity may fail in detecting expected clusterings. Computational experiments with artificial LFR networks show that the proposed method was very competitive in comparison to existing strategies in the literature. (C) 2016 The Authors. Published by Elsevier B.V.Instituto de Ciência e Tecnologia, Universidade Federal de São Paulo (UNIFESP) Av. Cesare M. G. Lattes, 1201, Eugênio de Mello, São José dos Campos-SP, CEP: 12247-014, BrasilInstituto de Ciência e Tecnologia, Universidade Federal de São Paulo (UNIFESP) Av. Cesare M. G. Lattes, 1201, Eugênio de Mello, São José dos Campos-SP, CEP: 12247-014, BrasilWeb of Scienc

    Accelerating the Information-Theoretic Approach of Community Detection Using Distributed and Hybrid Memory Parallel Schemes

    Get PDF
    There are several approaches for discovering communities in a network (graph). Despite being approximating in nature, discovering communities based on the laws of Information Theory has a proven standard of accuracy. The information-theoretic algorithm known as Infomap developed a decade ago for detecting communities, did not foresee the tremendous growth of social networking, multimedia, and massive information boom. To discover communities in massive networks, we have designed a distributed-memory-parallel Infomap in the MPI framework. Our design reaches scalability of over 500 processes capable of processing networks with millions of edges while maintaining quality comparable to the sequential Infomap. We have further developed a novel parallel hybrid approach for Infomap consists of both distributed and shared memory parallelism using MPI and OpenMP frameworks. This achieves a speedup of more than 11x in processing a network of over 100 million edges which is significantly greater than the state-of-the-art techniques

    Towards a hybrid recommendation approach using a community detection and evaluation algorithm

    Get PDF
    In social learning platforms, community detection algorithms are used to identify groups of learners with similar interests, behavior, and levels. While, recommendation algorithms personalize the learning experience based on learners' profile information, including interests and past behavior. Combining these algorithms can improve the recommendation quality by identifying learners with similar needs and interests for more accurate and relevant suggestions. Community detection enhances recommendations by identifying groups of learners with similar needs and interests. Leveraging their similarities, recommendation algorithms generate more accurate suggestions. In this article, we propose a novel approach that combines community detection and recommendation algorithms into a single framework to provide learners with personalized recommendations and opportunities for collaborative learning. Our proposed approach consists of three steps: first, applying the maximal clique-based algorithm to detect learning communities with common characteristics and interests; second, evaluating learners within their communities using static and dynamic evaluation; and third, generating personalized recommendations within each detected cluster using a recommendation system based on correlation and co-occurrence. To evaluate the effectiveness of our proposed approach, we conducted experiments on a real-world dataset. Our results show that our approach outperforms existing methods in terms of modularity, precision, and accuracy

    An algorithm for network community structure detection by Surprise

    Get PDF
    The success of network science to describe many complex systems and their ubiquitous presence has brought the development of new, more efficient, methods of analysis to the spotlight. However, some problems still remain open. One of which, the focus of our work, is the determination of a network’s community structure. Even though there’s no consensual formal definition, communities come from the intuitive idea that nodes form subgroups in the larger networks. In this regard, many different algorithms have been proposed in order to identify such groups. Here we tackle this problem in two different fronts: first, we developed a new algorithm based on the Surprise function and secondly, we created a novel benchmark, a set of artificial networks with a seeded community structure, to compare the performance of competing algorithms. Our own Surpriser algorithm was tested against seven other methods from the literature in three different benchmarks. We show that the Surprise based methods are the most consistent among different benchmarks, with Surpriser having an edge over the competition. Finally, we show that our benchmark is the hardest of the three as very few algorithms are able to solve it.O sucesso da teoria dos grafos para descrever sistemas complexos, bem como a onipresença destes, deu muito destaque a elaboração de métodos eficientes para sua analise. No entanto, varias questões continuam em aberto. Uma delas, a qual nos dedicamos neste trabalho, é a obtenção das comunidades presentes nessas redes. Muito embora não exista um consenso formal sobre sua definição, a presença de comunidades vem da ideia intuitiva de que nós formam subgrupos dentro da rede. Neste sentido, muitos algoritmos diferentes foram propostos para identificar tais grupos. Aqui nós atacamos este problema em duas frentes: primeiro, desenvolvemos um novo algoritmo baseado na função Surprise e segundo, criamos um novo benchmark, um conjunto de redes artificiais com comunidades préestabelecidas, para comparar a performance de diferentes algoritmos. O nosso algoritmo, chamado Surpriser, foi testado contra sete outros métodos da literatura em três benchmarks diferentes. Nós mostramos que métodos baseados na Surprise são os mais consistentes nos diferentes benchmarks e que o nosso Surpriser leva uma vantagem sobre os últimos. Finalmente, mostramos que o nosso benchmark é o mais difícil dos três, pois poucos algoritmos conseguem resolve-lo

    Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics

    Get PDF
    Background: Network communities help the functional organization and evolution of complex networks. However, the development of a method, which is both fast and accurate, provides modular overlaps and partitions of a heterogeneous network, has proven to be rather difficult. Methodology/Principal Findings: Here we introduce the novel concept of ModuLand, an integrative method family determining overlapping network modules as hills of an influence function-based, centrality-type community landscape, and including several widely used modularization methods as special cases. As various adaptations of the method family, we developed several algorithms, which provide an efficient analysis of weighted and directed networks, and (1) determine pervasively overlapping modules with high resolution; (2) uncover a detailed hierarchical network structure allowing an efficient, zoom-in analysis of large networks; (3) allow the determination of key network nodes and (4) help to predict network dynamics. Conclusions/Significance: The concept opens a wide range of possibilities to develop new approaches and applications including network routing, classification, comparison and prediction.Comment: 25 pages with 6 figures and a Glossary + Supporting Information containing pseudo-codes of all algorithms used, 14 Figures, 5 Tables (with 18 module definitions, 129 different modularization methods, 13 module comparision methods) and 396 references. All algorithms can be downloaded from this web-site: http://www.linkgroup.hu/modules.ph

    Scalable Community Detection

    Get PDF
    • …
    corecore