637 research outputs found
Faster unfolding of communities: speeding up the Louvain algorithm
Many complex networks exhibit a modular structure of densely connected groups
of nodes. Usually, such a modular structure is uncovered by the optimization of
some quality function. Although flawed, modularity remains one of the most
popular quality functions. The Louvain algorithm was originally developed for
optimizing modularity, but has been applied to a variety of methods. As such,
speeding up the Louvain algorithm, enables the analysis of larger graphs in a
shorter time for various methods. We here suggest to consider moving nodes to a
random neighbor community, instead of the best neighbor community. Although
incredibly simple, it reduces the theoretical runtime complexity from
to in networks with a
clear community structure. In benchmark networks, it speeds up the algorithm
roughly 2-3 times, while in some real networks it even reaches 10 times faster
runtimes. This improvement is due to two factors: (1) a random neighbor is
likely to be in a "good" community; and (2) random neighbors are likely to be
hubs, helping the convergence. Finally, the performance gain only slightly
diminishes the quality, especially for modularity, thus providing a good
quality-performance ratio. However, these gains are less pronounced, or even
disappear, for some other measures such as significance or surprise
Systematic analysis of agreement between metrics and peer review in the UK REF
When performing a national research assessment, some countries rely on
citation metrics whereas others, such as the UK, primarily use peer review. In
the influential Metric Tide report, a low agreement between metrics and peer
review in the UK Research Excellence Framework (REF) was found. However,
earlier studies observed much higher agreement between metrics and peer review
in the REF and argued in favour of using metrics. This shows that there is
considerable ambiguity in the discussion on agreement between metrics and peer
review. We provide clarity in this discussion by considering four important
points: (1) the level of aggregation of the analysis; (2) the use of either a
size-dependent or a size-independent perspective; (3) the suitability of
different measures of agreement; and (4) the uncertainty in peer review. In the
context of the REF, we argue that agreement between metrics and peer review
should be assessed at the institutional level rather than at the publication
level. Both a size-dependent and a size-independent perspective are relevant in
the REF. The interpretation of correlations may be problematic and as an
alternative we therefore use measures of agreement that are based on the
absolute or relative differences between metrics and peer review. To get an
idea of the uncertainty in peer review, we rely on a model to bootstrap peer
review outcomes. We conclude that particularly in Physics, Clinical Medicine,
and Public Health, metrics agree quite well with peer review and may offer an
alternative to peer review
From Louvain to Leiden: guaranteeing well-connected communities
Community detection is often used to understand the structure of large and
complex networks. One of the most popular algorithms for uncovering community
structure is the so-called Louvain algorithm. We show that this algorithm has a
major defect that largely went unnoticed until now: the Louvain algorithm may
yield arbitrarily badly connected communities. In the worst case, communities
may even be disconnected, especially when running the algorithm iteratively. In
our experimental analysis, we observe that up to 25% of the communities are
badly connected and up to 16% are disconnected. To address this problem, we
introduce the Leiden algorithm. We prove that the Leiden algorithm yields
communities that are guaranteed to be connected. In addition, we prove that,
when the Leiden algorithm is applied iteratively, it converges to a partition
in which all subsets of all communities are locally optimally assigned.
Furthermore, by relying on a fast local move approach, the Leiden algorithm
runs faster than the Louvain algorithm. We demonstrate the performance of the
Leiden algorithm for several benchmark and real-world networks. We find that
the Leiden algorithm is faster than the Louvain algorithm and uncovers better
partitions, in addition to providing explicit guarantees
Significant Scales in Community Structure
Many complex networks show signs of modular structure, uncovered by community
detection. Although many methods succeed in revealing various partitions, it
remains difficult to detect at what scale some partition is significant. This
problem shows foremost in multi-resolution methods. We here introduce an
efficient method for scanning for resolutions in one such method. Additionally,
we introduce the notion of "significance" of a partition, based on subgraph
probabilities. Significance is independent of the exact method used, so could
also be applied in other methods, and can be interpreted as the gain in
encoding a graph by making use of a partition. Using significance, we can
determine "good" resolution parameters, which we demonstrate on benchmark
networks. Moreover, optimizing significance itself also shows excellent
performance. We demonstrate our method on voting data from the European
Parliament. Our analysis suggests the European Parliament has become
increasingly ideologically divided and that nationality plays no role.Comment: To appear in Scientific Report
Detecting communities using asymptotical Surprise
Nodes in real-world networks are repeatedly observed to form dense clusters,
often referred to as communities. Methods to detect these groups of nodes
usually maximize an objective function, which implicitly contains the
definition of a community. We here analyze a recently proposed measure called
surprise, which assesses the quality of the partition of a network into
communities. In its current form, the formulation of surprise is rather
difficult to analyze. We here therefore develop an accurate asymptotic
approximation. This allows for the development of an efficient algorithm for
optimizing surprise. Incidentally, this leads to a straightforward extension of
surprise to weighted graphs. Additionally, the approximation makes it possible
to analyze surprise more closely and compare it to other methods, especially
modularity. We show that surprise is (nearly) unaffected by the well known
resolution limit, a particular problem for modularity. However, surprise may
tend to overestimate the number of communities, whereas they may be
underestimated by modularity. In short, surprise works well in the limit of
many small communities, whereas modularity works better in the limit of few
large communities. In this sense, surprise is more discriminative than
modularity, and may find communities where modularity fails to discern any
structure
Early school-leaving in the Netherlands
The role of student-, family- and school factors for early school-leaving in lower secondary educationMost studies on early school-leaving address only partial causes of why some students leave school early. This study aims to develop a more elaborate model to explain early school-leaving in lower secondary education, taking into account individual, family and school factors at the same time. By using a longitudinal dataset we are able to attribute clear causal relations between the different factors. We distinguish four groups of school-leavers, separating âdropoutsâ (those without any qualification) from those who left school after attaining a diploma in lower secondary education (âlow qualifiedâ), those who pursued education as an apprentice (âapprenticesâ) and the ones who continued education and received a full upper secondary qualification (âfull qualification). Discerning these four groups shows clear differences in the background of different types of early school-leavers and in the effects of school factors.labour market entry and occupational careers;
- âŠ