114 research outputs found

    A smart local moving algorithm for large-scale modularity-based community detection

    Get PDF
    We introduce a new algorithm for modularity-based community detection in large networks. The algorithm, which we refer to as a smart local moving algorithm, takes advantage of a well-known local moving heuristic that is also used by other algorithms. Compared with these other algorithms, our proposed algorithm uses the local moving heuristic in a more sophisticated way. Based on an analysis of a diverse set of networks, we show that our smart local moving algorithm identifies community structures with higher modularity values than other algorithms for large-scale modularity optimization, among which the popular 'Louvain algorithm' introduced by Blondel et al. (2008). The computational efficiency of our algorithm makes it possible to perform community detection in networks with tens of millions of nodes and hundreds of millions of edges. Our smart local moving algorithm also performs well in small and medium-sized networks. In short computing times, it identifies community structures with modularity values equally high as, or almost as high as, the highest values reported in the literature, and sometimes even higher than the highest values found in the literature

    Accuracy Optimization of Centrality Score Based Community Detection

    Get PDF
    Various concepts can be represented as a graph or the network. The network representation helps to characterize the varied relations between a set of objects by taking each object as a vertex and the interaction between them as an edge. Different systems can be modelled and analyzed in terms of graph theory. Community structure is a property that seems to be common to many networks. The division of the some objects into groups within which the connections or relations are dense, and the connections with other objects are sparser. Various research and data points proves that many real world networks has these communities or groups or the modules that are sub graphs with more edges connecting the vertices of the same group and comparatively fewer links joining the outside vertices. The groups or the communities exhibit the topological relations between the elements of the underlying system and the functional entities. The proposed approach is to exploit the global as well as local information about the network topologies. The authors propose a hybrid strategy to use the edge centrality property of the edges to find out the communities and use local moving heuristic to increase the modularity index of those communities. Such communities can be relevantly efficient and accurate to some applications. DOI: 10.17762/ijritcc2321-8169.15073

    A Fast and Efficient Incremental Approach toward Dynamic Community Detection

    Full text link
    Community detection is a discovery tool used by network scientists to analyze the structure of real-world networks. It seeks to identify natural divisions that may exist in the input networks that partition the vertices into coherent modules (or communities). While this problem space is rich with efficient algorithms and software, most of this literature caters to the static use-case where the underlying network does not change. However, many emerging real-world use-cases give rise to a need to incorporate dynamic graphs as inputs. In this paper, we present a fast and efficient incremental approach toward dynamic community detection. The key contribution is a generic technique called Δ−screening\Delta-screening, which examines the most recent batch of changes made to an input graph and selects a subset of vertices to reevaluate for potential community (re)assignment. This technique can be incorporated into any of the community detection methods that use modularity as its objective function for clustering. For demonstration purposes, we incorporated the technique into two well-known community detection tools. Our experiments demonstrate that our new incremental approach is able to generate performance speedups without compromising on the output quality (despite its heuristic nature). For instance, on a real-world network with 63M temporal edges (over 12 time steps), our approach was able to complete in 1056 seconds, yielding a 3x speedup over a baseline implementation. In addition to demonstrating the performance benefits, we also show how to use our approach to delineate appropriate intervals of temporal resolutions at which to analyze an input network

    The Intellectual Structure of Digital Humanities: An Author Co-Citation Analysis

    Get PDF

    Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    Full text link
    Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

    Mapping social media attention in Microbiology: Identifying main topics and actors

    Get PDF
    This paper aims to map and identify topics of interest within the field of Microbiology and identify the main sources driving such attention. We combine data from Web of Science and Altmetric.com, a platform which retrieves mentions to scientific literature from social media and other non-academic communication outlets. We focus on the dissemination of microbial publications in Twitter, news media and policy briefs. A two-mode network of social accounts shows distinctive areas of activity. We identify a cluster of papers mentioned solely by regional news media. A central area of the network is formed by papers discussed by the three outlets. A large portion of the network is driven by Twitter activity. When analyzing top actors contributing to such network, we observe that more than half of the Twitter accounts are bots, mentioning 32% of the documents in our dataset. Within news media outlets, there is a predominance of popular science outlets. With regard to policy briefs, both international and national bodies are represented. Finally, our topic analysis shows that the thematic focus of papers mentioned varies by outlet. While news media cover the wider range of topics, policy briefs are focused on translational medicine, and bacterial outbreaks

    Journal Maps, Interactive Overlays, and the Measurement of Interdisciplinarity on the Basis of Scopus Data (1996-2012)

    Get PDF
    Using Scopus data, we construct a global map of science based on aggregated journal-journal citations from 1996-2012 (N of journals = 20,554). This base map enables users to overlay downloads from Scopus interactively. Using a single year (e.g., 2012), results can be compared with mappings based on the Journal Citation Reports at the Web-of-Science (N = 10,936). The Scopus maps are more detailed at both the local and global levels because of their greater coverage, including, for example, the arts and humanities. The base maps can be interactively overlaid with journal distributions in sets downloaded from Scopus, for example, for the purpose of portfolio analysis. Rao-Stirling diversity can be used as a measure of interdisciplinarity in the sets under study. Maps at the global and the local level, however, can be very different because of the different levels of aggregation involved. Two journals, for example, can both belong to the humanities in the global map, but participate in different specialty structures locally. The base map and interactive tools are available online (with instructions) at http://www.leydesdorff.net/scopus_ovl.Comment: accepted for publication in the Journal of the Association for Information Science and Technology (JASIST

    Characterizing the potential of being emerging generic technologies: A Bi-Layer Network Analytics-based Prediction Method

    Full text link
    © 2019 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings. All rights reserved. Despite tremendous involvement of bibliometrics in profiling technological landscapes and identifying emerging topics, how to predict potential technological change is still unclear. This paper proposes a bi-layer network analytics-based prediction method to characterize the potential of being emerging generic technologies. Initially, based on the innovation literature, three technological characteristics are defined, and quantified by topological indicators in network analytics; a link prediction approach is applied for reconstructing the network with weighted missing links, and such reconstruction will also result in the change of related technological characteristics; the comparison between the two ranking lists of terms can help identify potential emerging generic technologies. A case study on predicting emerging generic technologies in information science demonstrates the feasibility and reliability of the proposed method
    • …
    corecore