1,392 research outputs found

    Distributed Community Detection with the WCC Metric

    Full text link
    Community detection has become an extremely active area of research in recent years, with researchers proposing various new metrics and algorithms to address the problem. Recently, the Weighted Community Clustering (WCC) metric was proposed as a novel way to judge the quality of a community partitioning based on the distribution of triangles in the graph, and was demonstrated to yield superior results over other commonly used metrics like modularity. The same authors later presented a parallel algorithm for optimizing WCC on large graphs. In this paper, we propose a new distributed, vertex-centric algorithm for community detection using the WCC metric. Results are presented that demonstrate the algorithm's performance and scalability on up to 32 worker machines and real graphs of up to 1.8 billion vertices. The algorithm scales best with the largest graphs, and to our knowledge, it is the first distributed algorithm for optimizing the WCC metric.Comment: 6 pages, 6 figure

    Put three and three together: Triangle-driven community detection

    Get PDF
    Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its applications in many fields such as biology, social networks, or network traffic analysis. Although the existing metrics used to quantify the quality of a community work well in general, under some circumstances, they fail at correctly capturing such notion. The main reason is that these metrics consider the internal community edges as a set, but ignore how these actually connect the vertices of the community. We propose the Weighted Community Clustering (WCC), which is a new community metric that takes the triangle instead of the edge as the minimal structural motif indicating the presence of a strong relation in a graph. We theoretically analyse WCC in depth and formally prove, by means of a set of properties, that the maximization of WCC guarantees communities with cohesion and structure. In addition, we propose Scalable Community Detection (SCD), a community detection algorithm based on WCC, which is designed to be fast and scalable on SMP machines, showing experimentally that WCC correctly captures the concept of community in social networks using real datasets. Finally, using ground-truth data, we show that SCD provides better quality than the best disjoint community detection algorithms of the state of the art while performing faster.Peer ReviewedPostprint (author's final draft

    Using Triangles to Improve Community Detection in Directed Networks

    Full text link
    In a graph, a community may be loosely defined as a group of nodes that are more closely connected to one another than to the rest of the graph. While there are a variety of metrics that can be used to specify the quality of a given community, one common theme is that flows tend to stay within communities. Hence, we expect cycles to play an important role in community detection. For undirected graphs, the importance of triangles -- an undirected 3-cycle -- has been known for a long time and can be used to improve community detection. In directed graphs, the situation is more nuanced. The smallest cycle is simply two nodes with a reciprocal connection, and using information about reciprocation has proven to improve community detection. Our new idea is based on the four types of directed triangles that contain cycles. To identify communities in directed networks, then, we propose an undirected edge-weighting scheme based on the type of the directed triangles in which edges are involved. We also propose a new metric on quality of the communities that is based on the number of 3-cycles that are split across communities. To demonstrate the impact of our new weighting, we use the standard METIS graph partitioning tool to determine communities and show experimentally that the resulting communities result in fewer 3-cycles being cut. The magnitude of the effect varies between a 10 and 50% reduction, and we also find evidence that this weighting scheme improves a task where plausible ground-truth communities are known.Comment: 10 pages, 3 figure

    Evaluation Criteria for Object-oriented Metrics

    Get PDF
    In this paper an evaluation model for object-oriented (OO) metrics is proposed. We have evaluated the existing evaluation criteria for OO metrics, and based on the observations, a model is proposed which tries to cover most of the features for the evaluation of OO metrics. The model is validated by applying it to existing OO metrics. In contrast to the other existing criteria, the proposed model is simple in implementation and includes the practical and important aspects of evaluation; hence it suitable to evaluate and validate any OO complexity metric

    LazyFox: Fast and parallelized overlapping community detection in large graphs

    Get PDF
    The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlapping communities that can only be determined at much higher computational cost. We build on an efficient algorithm, Fox, that detects such overlapping communities. Fox measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LazyFox, a multi-threaded version of the Fox algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LazyFox enables overlapping community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LazyFox's implementation was published and is available as a tool under an MIT licence at https://github.com/TimGarrels/LazyFox.Comment: 17 pages, 5 figure

    The LDBC Graphalytics Benchmark

    Full text link
    In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, standard graph datasets, synthetic dataset generators, and reference output for validation purposes. Its test harness produces deep metrics that quantify multiple kinds of systems scalability, weak and strong, and robustness, such as failures and performance variability. The benchmark also balances comprehensiveness with runtime necessary to obtain the deep metrics. The benchmark comes with open-source software for generating performance data, for validating algorithm results, for monitoring and sharing performance data, and for obtaining the final benchmark result as a standard performance report

    Land Cover and Rainfall Interact to Shape Waterbird Community Composition

    Get PDF
    Human land cover can degrade estuaries directly through habitat loss and fragmentation or indirectly through nutrient inputs that reduce water quality. Strong precipitation events are occurring more frequently, causing greater hydrological connectivity between watersheds and estuaries. Nutrient enrichment and dissolved oxygen depletion that occur following these events are known to limit populations of benthic macroinvertebrates and commercially harvested species, but the consequences for top consumers such as birds remain largely unknown. We used non-metric multidimensional scaling (MDS) and structural equation modeling (SEM) to understand how land cover and annual variation in rainfall interact to shape waterbird community composition in Chesapeake Bay, USA. The MDS ordination indicated that urban subestuaries shifted from a mixed generalist-specialist community in 2002, a year of severe drought, to generalist-dominated community in 2003, of year of high rainfall. The SEM revealed that this change was concurrent with a sixfold increase in nitrate-N concentration in subestuaries. In the drought year of 2002, waterbird community composition depended only on the direct effect of urban development in watersheds. In the wet year of 2003, community composition depended both on this direct effect and on indirect effects associated with high nitrate-N inputs to northern parts of the Bay, particularly in urban subestuaries. Our findings suggest that increased runoff during periods of high rainfall can depress water quality enough to alter the composition of estuarine waterbird communities, and that this effect is compounded in subestuaries dominated by urban development. Estuarine restoration programs often chart progress by monitoring stressors and indicators, but rarely assess multivariate relationships among them. Estuarine management planning could be improved by tracking the structure of relationships among land cover, water quality, and waterbirds. Unraveling these complex relationships may help managers identify and mitigate ecological thresholds that occur with increasing human land cover
    • …
    corecore