5,169 research outputs found
A generalised significance test for individual communities in networks
Many empirical networks have community structure, in which nodes are densely
interconnected within each community (i.e., a group of nodes) and sparsely
across different communities. Like other local and meso-scale structure of
networks, communities are generally heterogeneous in various aspects such as
the size, density of edges, connectivity to other communities and significance.
In the present study, we propose a method to statistically test the
significance of individual communities in a given network. Compared to the
previous methods, the present algorithm is unique in that it accepts different
community-detection algorithms and the corresponding quality function for
single communities. The present method requires that a quality of each
community can be quantified and that community detection is performed as
optimisation of such a quality function summed over the communities. Various
community detection algorithms including modularity maximisation and graph
partitioning meet this criterion. Our method estimates a distribution of the
quality function for randomised networks to calculate a likelihood of each
community in the given network. We illustrate our algorithm by synthetic and
empirical networks.Comment: 20 pages, 4 figures and 4 table
Clustering of exchange rates and their dynamics under different dependence measures
This paper proposes an improvement to the method for clustering exchange rates given by D. J. Fenn et al, in Quantitative Finance, 12 (10) 2012, pp.1493-1520. To deal with the potentially non linear nature of currency time series dependence, we propose two alternative similarity metrics to use instead of the one used in the aforementioned paper based on Pearson correlation. Our proposed similarity metrics are based upon Kendall and distance correlations. We observe how each of the newly adapted clustering methods respond over several years of currency exchange data and find significant differences in the resulting clusters.Peer ReviewedPostprint (published version
Learning Reputation in an Authorship Network
The problem of searching for experts in a given academic field is hugely
important in both industry and academia. We study exactly this issue with
respect to a database of authors and their publications. The idea is to use
Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) to perform
topic modelling in order to find authors who have worked in a query field. We
then construct a coauthorship graph and motivate the use of influence
maximisation and a variety of graph centrality measures to obtain a ranked list
of experts. The ranked lists are further improved using a Markov Chain-based
rank aggregation approach. The complete method is readily scalable to large
datasets. To demonstrate the efficacy of the approach we report on an extensive
set of computational simulations using the Arnetminer dataset. An improvement
in mean average precision is demonstrated over the baseline case of simply
using the order of authors found by the topic models
On methods to assess the significance of community structure in networks of financial time series
We consider the problem of determining whether the community
structure found by a clustering algorithm applied to nancial
time series is statistically signi cant, or is due to pure chance, when
no other information than the observed values and a similarity measure
among time series are available. As a subsidiary problem we also analyse
the in
uence of the choice of similarity measure in the accuracy of the
clustering method.
We propose two raw-data based methods for assessing robustness of clustering
algorithms on time-dependent data linked by a relation of similarity:
One based on community scoring functions that quantify some topological
property that characterises ground-truth communities, and another
based on random perturbations and quanti cation of the variation
in the community structure. These methodologies are well-established in
the realm of unweighted networks; our contribution are versions of these
methodologies properly adapted to complete weighted networks.Peer ReviewedPostprint (published version
Large-scale multi-objective influence maximisation with network downscaling
Finding the most influential nodes in a network is a computationally hard
problem with several possible applications in various kinds of network-based
problems. While several methods have been proposed for tackling the influence
maximisation (IM) problem, their runtime typically scales poorly when the
network size increases. Here, we propose an original method, based on network
downscaling, that allows a multi-objective evolutionary algorithm (MOEA) to
solve the IM problem on a reduced scale network, while preserving the relevant
properties of the original network. The downscaled solution is then upscaled to
the original network, using a mechanism based on centrality metrics such as
PageRank. Our results on eight large networks (including two with 50k
nodes) demonstrate the effectiveness of the proposed method with a more than
10-fold runtime gain compared to the time needed on the original network, and
an up to time reduction compared to CELF
Evolutionary Algorithms for Community Detection in Continental-Scale High-Voltage Transmission Grids
Symmetry is a key concept in the study of power systems, not only because the admittance and Jacobian matrices used in power flow analysis are symmetrical, but because some previous studies have shown that in some real-world power grids there are complex symmetries. In order to investigate the topological characteristics of power grids, this paper proposes the use of evolutionary algorithms for community detection using modularity density measures on networks representing supergrids in order to discover densely connected structures. Two evolutionary approaches (generational genetic algorithm, GGA+, and modularity and improved genetic algorithm, MIGA) were applied. The results obtained in two large networks representing supergrids (European grid and North American grid) provide insights on both the structure of the supergrid and the topological differences between different regions. Numerical and graphical results show how these evolutionary approaches clearly outperform to the well-known Louvain modularity method. In particular, the average value of modularity obtained by GGA+ in the European grid was 0.815, while an average of 0.827 was reached in the North American grid. These results outperform those obtained by MIGA and Louvain methods (0.801 and 0.766 in the European grid and 0.813 and 0.798 in the North American grid, respectively)
Configuration model for correlation matrices preserving the node strength
Correlation matrices are a major type of multivariate data. To examine
properties of a given correlation matrix, a common practice is to compare the
same quantity between the original correlation matrix and reference correlation
matrices, such as those derived from random matrix theory, that partially
preserve properties of the original matrix. We propose a model to generate such
reference correlation and covariance matrices for the given matrix. Correlation
matrices are often analysed as networks, which are heterogeneous across nodes
in terms of the total connectivity to other nodes for each node. Given this
background, the present algorithm generates random networks that preserve the
expectation of total connectivity of each node to other nodes, akin to
configuration models for conventional networks. Our algorithm is derived from
the maximum entropy principle. We will apply the proposed algorithm to
measurement of clustering coefficients and community detection, both of which
require a null model to assess the statistical significance of the obtained
results.Comment: 8 figures, 4 table
- …