20,637 research outputs found
A Fast and Efficient Incremental Approach toward Dynamic Community Detection
Community detection is a discovery tool used by network scientists to analyze
the structure of real-world networks. It seeks to identify natural divisions
that may exist in the input networks that partition the vertices into coherent
modules (or communities). While this problem space is rich with efficient
algorithms and software, most of this literature caters to the static use-case
where the underlying network does not change. However, many emerging real-world
use-cases give rise to a need to incorporate dynamic graphs as inputs.
In this paper, we present a fast and efficient incremental approach toward
dynamic community detection. The key contribution is a generic technique called
, which examines the most recent batch of changes made to an
input graph and selects a subset of vertices to reevaluate for potential
community (re)assignment. This technique can be incorporated into any of the
community detection methods that use modularity as its objective function for
clustering. For demonstration purposes, we incorporated the technique into two
well-known community detection tools. Our experiments demonstrate that our new
incremental approach is able to generate performance speedups without
compromising on the output quality (despite its heuristic nature). For
instance, on a real-world network with 63M temporal edges (over 12 time steps),
our approach was able to complete in 1056 seconds, yielding a 3x speedup over a
baseline implementation. In addition to demonstrating the performance benefits,
we also show how to use our approach to delineate appropriate intervals of
temporal resolutions at which to analyze an input network
Multi-level algorithms for modularity clustering
Modularity is one of the most widely used quality measures for graph
clusterings. Maximizing modularity is NP-hard, and the runtime of exact
algorithms is prohibitive for large graphs. A simple and effective class of
heuristics coarsens the graph by iteratively merging clusters (starting from
singletons), and optionally refines the resulting clustering by iteratively
moving individual vertices between clusters. Several heuristics of this type
have been proposed in the literature, but little is known about their relative
performance.
This paper experimentally compares existing and new coarsening- and
refinement-based heuristics with respect to their effectiveness (achieved
modularity) and efficiency (runtime). Concerning coarsening, it turns out that
the most widely used criterion for merging clusters (modularity increase) is
outperformed by other simple criteria, and that a recent algorithm by Schuetz
and Caflisch is no improvement over simple greedy coarsening for these
criteria. Concerning refinement, a new multi-level algorithm is shown to
produce significantly better clusterings than conventional single-level
algorithms. A comparison with published benchmark results and algorithm
implementations shows that combinations of coarsening and multi-level
refinement are competitive with the best algorithms in the literature.Comment: 12 pages, 10 figures, see
http://www.informatik.tu-cottbus.de/~rrotta/ for downloading the graph
clustering softwar
Learning from the Success of MPI
The Message Passing Interface (MPI) has been extremely successful as a
portable way to program high-performance parallel computers. This success has
occurred in spite of the view of many that message passing is difficult and
that other approaches, including automatic parallelization and directive-based
parallelism, are easier to use. This paper argues that MPI has succeeded
because it addresses all of the important issues in providing a parallel
programming model.Comment: 12 pages, 1 figur
An Empirical Study of Cohesion and Coupling: Balancing Optimisation and Disruption
Search based software engineering has been extensively applied to the problem of finding improved modular structures that maximise cohesion and minimise coupling. However, there has, hitherto, been no longitudinal study of developers’ implementations, over a series of sequential releases. Moreover, results validating whether developers respect the fitness functions are scarce, and the potentially disruptive effect of search-based remodularisation is usually overlooked. We present an empirical study of 233 sequential releases of 10 different systems; the largest empirical study reported in the literature so far, and the first longitudinal study. Our results provide evidence that developers do, indeed, respect the fitness functions used to optimise cohesion/coupling (they are statistically significantly better than arbitrary choices with p << 0.01), yet they also leave considerable room for further improvement (cohesion/coupling can be improved by 25% on average). However, we also report that optimising the structure is highly disruptive (on average more than 57% of the structure must change), while our results reveal that developers tend to avoid such disruption. Therefore, we introduce and evaluate a multi-objective evolutionary approach that minimises disruption while maximising cohesion/coupling improvement. This allows developers to balance reticence to disrupt existing modular structure, against their competing need to improve cohesion and coupling. The multi-objective approach is able to find modular structures that improve the cohesion of developers’ implementations by 22.52%, while causing an acceptably low level of disruption (within that already tolerated by developers)
Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods
Programming languages and platforms improve over time, sometimes resulting in
new language features that offer many benefits. However, despite these
benefits, developers may not always be willing to adopt them in their projects
for various reasons. In this paper, we describe an empirical study where we
assess the adoption of a particular new language feature. Studying how
developers use (or do not use) new language features is important in
programming language research and engineering because it gives designers
insight into the usability of the language to create meaning programs in that
language. This knowledge, in turn, can drive future innovations in the area.
Here, we explore Java 8 default methods, which allow interfaces to contain
(instance) method implementations.
Default methods can ease interface evolution, make certain ubiquitous design
patterns redundant, and improve both modularity and maintainability. A focus of
this work is to discover, through a scientific approach and a novel technique,
situations where developers found these constructs useful and where they did
not, and the reasons for each. Although several studies center around assessing
new language features, to the best of our knowledge, this kind of construct has
not been previously considered.
Despite their benefits, we found that developers did not adopt default
methods in all situations. Our study consisted of submitting pull requests
introducing the language feature to 19 real-world, open source Java projects
without altering original program semantics. This novel assessment technique is
proactive in that the adoption was driven by an automatic refactoring approach
rather than waiting for developers to discover and integrate the feature
themselves. In this way, we set forth best practices and patterns of using the
language feature effectively earlier rather than later and are able to possibly
guide (near) future language evolution. We foresee this technique to be useful
in assessing other new language features, design patterns, and other
programming idioms
A Case for Custom, Composable Composition Operators
Programming languages typically support a fixed set of com- position operators, with fixed semantics. This may impose limits on software designers, in case a desired operator or semantics are not supported by a language, resulting in suboptimal quality characteristics of the designed software system. We demonstrate this using the well-known State design pattern, and propose the use of a composition infrastructure that allows the designer to define custom, composable composition operators. We demonstrate how this approach improves several quality factors of the State design pattern, such as reusability and modularity, while taking a reason- able amount of effort to define the necessary pattern-related code
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
Above and Beyond the Landauer Bound: Thermodynamics of Modularity
Information processing typically occurs via the composition of modular units,
such as universal logic gates. The benefit of modular information processing,
in contrast to globally integrated information processing, is that complex
global computations are more easily and flexibly implemented via a series of
simpler, localized information processing operations which only control and
change local degrees of freedom. We show that, despite these benefits, there
are unavoidable thermodynamic costs to modularity---costs that arise directly
from the operation of localized processing and that go beyond Landauer's
dissipation bound for erasing information. Integrated computations can achieve
Landauer's bound, however, when they globally coordinate the control of all of
an information reservoir's degrees of freedom. Unfortunately, global
correlations among the information-bearing degrees of freedom are easily lost
by modular implementations. This is costly since such correlations are a
thermodynamic fuel. We quantify the minimum irretrievable dissipation of
modular computations in terms of the difference between the change in global
nonequilibrium free energy, which captures these global correlations, and the
local (marginal) change in nonequilibrium free energy, which bounds modular
work production. This modularity dissipation is proportional to the amount of
additional work required to perform the computational task modularly. It has
immediate consequences for physically embedded transducers, known as
information ratchets. We show how to circumvent modularity dissipation by
designing internal ratchet states that capture the global correlations and
patterns in the ratchet's information reservoir. Designed in this way,
information ratchets match the optimum thermodynamic efficiency of globally
integrated computations.Comment: 17 pages, 9 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/idolip.ht
- …