37,458 research outputs found
Community Detection using Locality Statistics
The goal of community detection is to identify clusters and groups of vertices that share common properties or play similar roles in a graph, using only the information encoded in the graph. Our work analyzes two methods of identifying an anomalous community in temporal graphs and another method of identifying active communities in a static massive graph. All methods are based on locality statistics.
In [50], an anomalous community is detected that shows growing connectivities in a time series of graphs. We formulate the task as a hypothesis-testing problem in stochastic block model time series. We derive the limiting properties and power characteristics of two competing test statistics built on distinct underlying locality statistics. In addition, we provide applicable implementations of two competing test statistics and detailed experimental results for a neural imaging application in [36].
In [51], active communities are detected in a static massive graph on which many community detection algorithms scale poorly. We propose a novel framework for detecting active communities that consist of the most active vertices. Our framework utilizes a parallelizable trimming algorithm based on a locality statistic to filter out inactive vertices, and then clusters the remaining active vertices via spectral decomposition of their similarity matrix. The framework is applicable to graphs consisting of billions of vertices and hundreds of billions of edges.
In summary, this work provides developments in community detection, in both temporal graphs and static massive graphs, by employing locality statistics
Distributed Graph Clustering using Modularity and Map Equation
We study large-scale, distributed graph clustering. Given an undirected
graph, our objective is to partition the nodes into disjoint sets called
clusters. A cluster should contain many internal edges while being sparsely
connected to other clusters. In the context of a social network, a cluster
could be a group of friends. Modularity and map equation are established
formalizations of this internally-dense-externally-sparse principle. We present
two versions of a simple distributed algorithm to optimize both measures. They
are based on Thrill, a distributed big data processing framework that
implements an extended MapReduce model. The algorithms for the two measures,
DSLM-Mod and DSLM-Map, differ only slightly. Adapting them for similar quality
measures is straight-forward. We conduct an extensive experimental study on
real-world graphs and on synthetic benchmark graphs with up to 68 billion
edges. Our algorithms are fast while detecting clusterings similar to those
detected by other sequential, parallel and distributed clustering algorithms.
Compared to the distributed GossipMap algorithm, DSLM-Map needs less memory, is
up to an order of magnitude faster and achieves better quality.Comment: 14 pages, 3 figures; v3: Camera ready for Euro-Par 2018, more
details, more results; v2: extended experiments to include comparison with
competing algorithms, shortened for submission to Euro-Par 201
Guided Machine Learning for power grid segmentation
The segmentation of large scale power grids into zones is crucial for control
room operators when managing the grid complexity near real time. In this paper
we propose a new method in two steps which is able to automatically do this
segmentation, while taking into account the real time context, in order to help
them handle shifting dynamics. Our method relies on a "guided" machine learning
approach. As a first step, we define and compute a task specific "Influence
Graph" in a guided manner. We indeed simulate on a grid state chosen
interventions, representative of our task of interest (managing active power
flows in our case). For visualization and interpretation, we then build a
higher representation of the grid relevant to this task by applying the graph
community detection algorithm \textit{Infomap} on this Influence Graph. To
illustrate our method and demonstrate its practical interest, we apply it on
commonly used systems, the IEEE-14 and IEEE-118. We show promising and original
interpretable results, especially on the previously well studied RTS-96 system
for grid segmentation. We eventually share initial investigation and results on
a large-scale system, the French power grid, whose segmentation had a
surprising resemblance with RTE's historical partitioning
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
Computing Vertex Centrality Measures in Massive Real Networks with a Neural Learning Model
Vertex centrality measures are a multi-purpose analysis tool, commonly used
in many application environments to retrieve information and unveil knowledge
from the graphs and network structural properties. However, the algorithms of
such metrics are expensive in terms of computational resources when running
real-time applications or massive real world networks. Thus, approximation
techniques have been developed and used to compute the measures in such
scenarios. In this paper, we demonstrate and analyze the use of neural network
learning algorithms to tackle such task and compare their performance in terms
of solution quality and computation time with other techniques from the
literature. Our work offers several contributions. We highlight both the pros
and cons of approximating centralities though neural learning. By empirical
means and statistics, we then show that the regression model generated with a
feedforward neural networks trained by the Levenberg-Marquardt algorithm is not
only the best option considering computational resources, but also achieves the
best solution quality for relevant applications and large-scale networks.
Keywords: Vertex Centrality Measures, Neural Networks, Complex Network Models,
Machine Learning, Regression ModelComment: 8 pages, 5 tables, 2 figures, version accepted at IJCNN 2018. arXiv
admin note: text overlap with arXiv:1810.1176
Partitioning Complex Networks via Size-constrained Clustering
The most commonly used method to tackle the graph partitioning problem in
practice is the multilevel approach. During a coarsening phase, a multilevel
graph partitioning algorithm reduces the graph size by iteratively contracting
nodes and edges until the graph is small enough to be partitioned by some other
algorithm. A partition of the input graph is then constructed by successively
transferring the solution to the next finer graph and applying a local search
algorithm to improve the current solution.
In this paper, we describe a novel approach to partition graphs effectively
especially if the networks have a highly irregular structure. More precisely,
our algorithm provides graph coarsening by iteratively contracting
size-constrained clusterings that are computed using a label propagation
algorithm. The same algorithm that provides the size-constrained clusterings
can also be used during uncoarsening as a fast and simple local search
algorithm.
Depending on the algorithm's configuration, we are able to compute partitions
of very high quality outperforming all competitors, or partitions that are
comparable to the best competitor in terms of quality, hMetis, while being
nearly an order of magnitude faster on average. The fastest configuration
partitions the largest graph available to us with 3.3 billion edges using a
single machine in about ten minutes while cutting less than half of the edges
than the fastest competitor, kMetis
Detection of the elite structure in a virtual multiplex social system by means of a generalized -core
Elites are subgroups of individuals within a society that have the ability
and means to influence, lead, govern, and shape societies. Members of elites
are often well connected individuals, which enables them to impose their
influence to many and to quickly gather, process, and spread information. Here
we argue that elites are not only composed of highly connected individuals, but
also of intermediaries connecting hubs to form a cohesive and structured
elite-subgroup at the core of a social network. For this purpose we present a
generalization of the -core algorithm that allows to identify a social core
that is composed of well-connected hubs together with their `connectors'. We
show the validity of the idea in the framework of a virtual world defined by a
massive multiplayer online game, on which we have complete information of
various social networks. Exploiting this multiplex structure, we find that the
hubs of the generalized -core identify those individuals that are high
social performers in terms of a series of indicators that are available in the
game. In addition, using a combined strategy which involves the generalized
-core and the recently introduced -core, the elites of the different
'nations' present in the game are perfectly identified as modules of the
generalized -core. Interesting sudden shifts in the composition of the elite
cores are observed at deep levels. We show that elite detection with the
traditional -core is not possible in a reliable way. The proposed method
might be useful in a series of more general applications, such as community
detection.Comment: 13 figures, 3 tables, 19 pages. Accepted for publication in PLoS ON
- …