1,392 research outputs found
Distributed Community Detection with the WCC Metric
Community detection has become an extremely active area of research in recent
years, with researchers proposing various new metrics and algorithms to address
the problem. Recently, the Weighted Community Clustering (WCC) metric was
proposed as a novel way to judge the quality of a community partitioning based
on the distribution of triangles in the graph, and was demonstrated to yield
superior results over other commonly used metrics like modularity. The same
authors later presented a parallel algorithm for optimizing WCC on large
graphs. In this paper, we propose a new distributed, vertex-centric algorithm
for community detection using the WCC metric. Results are presented that
demonstrate the algorithm's performance and scalability on up to 32 worker
machines and real graphs of up to 1.8 billion vertices. The algorithm scales
best with the largest graphs, and to our knowledge, it is the first distributed
algorithm for optimizing the WCC metric.Comment: 6 pages, 6 figure
Put three and three together: Triangle-driven community detection
Community detection has arisen as one of the most relevant topics in the field of graph data mining due to its applications in many fields such as biology, social networks, or network traffic analysis. Although the existing metrics used to quantify the quality of a community work well in general, under some circumstances, they fail at correctly capturing such notion. The main reason is that these metrics consider the internal community edges as a set, but ignore how these actually connect the vertices of the community. We propose the Weighted Community Clustering (WCC), which is a new community metric that takes the triangle instead of the edge as the minimal structural motif indicating the presence of a strong relation in a graph. We theoretically analyse WCC in depth and formally prove, by means of a set of properties, that the maximization of WCC guarantees communities with cohesion and structure. In addition, we propose Scalable Community Detection (SCD), a community detection algorithm based on WCC, which is designed to be fast and scalable on SMP machines, showing experimentally that WCC correctly captures the concept of community in social networks using real datasets. Finally, using ground-truth data, we show that SCD provides better quality than the best disjoint community detection algorithms of the state of the art while performing faster.Peer ReviewedPostprint (author's final draft
Using Triangles to Improve Community Detection in Directed Networks
In a graph, a community may be loosely defined as a group of nodes that are
more closely connected to one another than to the rest of the graph. While
there are a variety of metrics that can be used to specify the quality of a
given community, one common theme is that flows tend to stay within
communities. Hence, we expect cycles to play an important role in community
detection. For undirected graphs, the importance of triangles -- an undirected
3-cycle -- has been known for a long time and can be used to improve community
detection. In directed graphs, the situation is more nuanced. The smallest
cycle is simply two nodes with a reciprocal connection, and using information
about reciprocation has proven to improve community detection. Our new idea is
based on the four types of directed triangles that contain cycles. To identify
communities in directed networks, then, we propose an undirected edge-weighting
scheme based on the type of the directed triangles in which edges are involved.
We also propose a new metric on quality of the communities that is based on the
number of 3-cycles that are split across communities. To demonstrate the impact
of our new weighting, we use the standard METIS graph partitioning tool to
determine communities and show experimentally that the resulting communities
result in fewer 3-cycles being cut. The magnitude of the effect varies between
a 10 and 50% reduction, and we also find evidence that this weighting scheme
improves a task where plausible ground-truth communities are known.Comment: 10 pages, 3 figure
Evaluation Criteria for Object-oriented Metrics
In this paper an evaluation model for object-oriented (OO) metrics is proposed. We have evaluated the existing evaluation criteria for OO metrics, and based on the observations, a model is proposed which tries to cover most of the features for the evaluation of OO metrics. The model is validated by applying it to existing OO metrics. In contrast to the other existing criteria, the proposed model is simple in implementation and includes the practical and important aspects of evaluation; hence it suitable to evaluate and validate any OO complexity metric
LazyFox: Fast and parallelized overlapping community detection in large graphs
The detection of communities in graph datasets provides insight about a
graph's underlying structure and is an important tool for various domains such
as social sciences, marketing, traffic forecast, and drug discovery. While most
existing algorithms provide fast approaches for community detection, their
results usually contain strictly separated communities. However, most datasets
would semantically allow for or even require overlapping communities that can
only be determined at much higher computational cost. We build on an efficient
algorithm, Fox, that detects such overlapping communities. Fox measures the
closeness of a node to a community by approximating the count of triangles
which that node forms with that community. We propose LazyFox, a multi-threaded
version of the Fox algorithm, which provides even faster detection without an
impact on community quality. This allows for the analyses of significantly
larger and more complex datasets. LazyFox enables overlapping community
detection on complex graph datasets with millions of nodes and billions of
edges in days instead of weeks. As part of this work, LazyFox's implementation
was published and is available as a tool under an MIT licence at
https://github.com/TimGarrels/LazyFox.Comment: 17 pages, 5 figure
The LDBC Graphalytics Benchmark
In this document, we describe LDBC Graphalytics, an industrial-grade
benchmark for graph analysis platforms. The main goal of Graphalytics is to
enable the fair and objective comparison of graph analysis platforms. Due to
the diversity of bottlenecks and performance issues such platforms need to
address, Graphalytics consists of a set of selected deterministic algorithms
for full-graph analysis, standard graph datasets, synthetic dataset generators,
and reference output for validation purposes. Its test harness produces deep
metrics that quantify multiple kinds of systems scalability, weak and strong,
and robustness, such as failures and performance variability. The benchmark
also balances comprehensiveness with runtime necessary to obtain the deep
metrics. The benchmark comes with open-source software for generating
performance data, for validating algorithm results, for monitoring and sharing
performance data, and for obtaining the final benchmark result as a standard
performance report
Land Cover and Rainfall Interact to Shape Waterbird Community Composition
Human land cover can degrade estuaries directly through habitat loss and fragmentation or indirectly through nutrient inputs that reduce water quality. Strong precipitation events are occurring more frequently, causing greater hydrological connectivity between watersheds and estuaries. Nutrient enrichment and dissolved oxygen depletion that occur following these events are known to limit populations of benthic macroinvertebrates and commercially harvested species, but the consequences for top consumers such as birds remain largely unknown. We used non-metric multidimensional scaling (MDS) and structural equation modeling (SEM) to understand how land cover and annual variation in rainfall interact to shape waterbird community composition in Chesapeake Bay, USA. The MDS ordination indicated that urban subestuaries shifted from a mixed generalist-specialist community in 2002, a year of severe drought, to generalist-dominated community in 2003, of year of high rainfall. The SEM revealed that this change was concurrent with a sixfold increase in nitrate-N concentration in subestuaries. In the drought year of 2002, waterbird community composition depended only on the direct effect of urban development in watersheds. In the wet year of 2003, community composition depended both on this direct effect and on indirect effects associated with high nitrate-N inputs to northern parts of the Bay, particularly in urban subestuaries. Our findings suggest that increased runoff during periods of high rainfall can depress water quality enough to alter the composition of estuarine waterbird communities, and that this effect is compounded in subestuaries dominated by urban development. Estuarine restoration programs often chart progress by monitoring stressors and indicators, but rarely assess multivariate relationships among them. Estuarine management planning could be improved by tracking the structure of relationships among land cover, water quality, and waterbirds. Unraveling these complex relationships may help managers identify and mitigate ecological thresholds that occur with increasing human land cover
- …