27,831 research outputs found
Efficient computation of the Weighted Clustering Coefficient
The clustering coefficient of an unweighted network has been extensively used to quantify how tightly connected is the neighbor around a node and it has been widely adopted for assessing the quality of nodes in a social network. The computation of the clustering coefficient is challenging since it requires to count the number of triangles in the graph. Several recent works proposed efficient sampling, streaming and MapReduce algorithms that allow to overcome this computational bottleneck. As a matter of fact, the intensity of the interaction between nodes, that is usually represented with weights on the edges of the graph, is also an important measure of the statistical cohesiveness of a network. Recently various notions of weighted clustering coefficient have been proposed but all those techniques are hard to implement on large-scale graphs. In this work we show how standard sampling techniques can be used to obtain efficient estimators for the most commonly used measures of weighted clustering coefficient. Furthermore we also propose a novel graph-theoretic notion of clustering coefficient in weighted networks. © 2016, Copyright © Taylor & Francis Group, LL
On methods to assess the significance of community structure in networks of financial time series
We consider the problem of determining whether the community
structure found by a clustering algorithm applied to nancial
time series is statistically signi cant, or is due to pure chance, when
no other information than the observed values and a similarity measure
among time series are available. As a subsidiary problem we also analyse
the in
uence of the choice of similarity measure in the accuracy of the
clustering method.
We propose two raw-data based methods for assessing robustness of clustering
algorithms on time-dependent data linked by a relation of similarity:
One based on community scoring functions that quantify some topological
property that characterises ground-truth communities, and another
based on random perturbations and quanti cation of the variation
in the community structure. These methodologies are well-established in
the realm of unweighted networks; our contribution are versions of these
methodologies properly adapted to complete weighted networks.Peer ReviewedPostprint (published version
Applications of Structural Balance in Signed Social Networks
We present measures, models and link prediction algorithms based on the
structural balance in signed social networks. Certain social networks contain,
in addition to the usual 'friend' links, 'enemy' links. These networks are
called signed social networks. A classical and major concept for signed social
networks is that of structural balance, i.e., the tendency of triangles to be
'balanced' towards including an even number of negative edges, such as
friend-friend-friend and friend-enemy-enemy triangles. In this article, we
introduce several new signed network analysis methods that exploit structural
balance for measuring partial balance, for finding communities of people based
on balance, for drawing signed social networks, and for solving the problem of
link prediction. Notably, the introduced methods are based on the signed graph
Laplacian and on the concept of signed resistance distances. We evaluate our
methods on a collection of four signed social network datasets.Comment: 37 page
Graph analytics on modern massively parallel systems
Graphs provide a very flexible abstraction for understanding and modeling complex systems in many fields such as physics, biology, neuroscience, engineering, and social science. Only in the last two decades, with the advent of Big Data era, supercomputers equipped by accelerators –i.e., Graphics Processing Unit (GPUs)–, advanced networking, and highly parallel file systems have been used to analyze graph properties such as reachability, diameter, connected components, centrality, and clustering coefficient. Today graphs of interest may be composed by millions, sometimes billions, of nodes and edges and exhibit a highly irregular structure. As a consequence, the design of efficient and scalable graph algorithms is an extraordinary challenge due to irregular communication and memory access patterns, high synchronization
costs, and lack of data locality. In the present dissertation, we start off with a brief and gentle introduction for the reader to graph analytics and massively parallel systems. In particular, we present the intersection between graph analytics and parallel architectures in the current state-of-the-art and discuss the challenges encountered when solving such problems on large-scale graphs on these architectures (Chapter 1). In Chapter 2, some preliminary definitions and graph-theoretical notions are provided together with a description of the synthetic graphs used in the literature to model real-world networks. In Chapters 3-5, we present and tackle three different relevant problems in graph analysis: reachability (Chapter 3), Betweenness Centrality (Chapter 4), and clustering coefficient (Chapter 5). In detail, Chapter 3 tackles reachability problems by providing two scalable algorithms and implementations which efficiently solve st-connectivity problems on very large-scale graphs Chapter 4 considers the problem of identifying most relevant nodes in a network which plays a crucial role in several applications, including transportation and communication networks, social network analysis, and biological networks. In particular, we focus on a well-known centrality metrics,
namely Betweenness Centrality (BC), and present two different distributed algorithms for the BC computation on unweighted and weighted graphs. For unweighted graphs, we present a new communication-efficient algorithm based on the combination of bi-dimensional (2D) decomposition and multi-level parallelism. Furthermore, new algorithms which exploit the underlying graph topology to reduce the time and space usage of betweenness centrality computations are described as well. Concerning weighted graphs, we provide a scalable algorithm based on an algebraic formulation of the problem. Finally, thorough comprehensive experimental results on synthetic and real-
world large-scale graphs, we show that the proposed techniques are effective in practice and achieve significant speedups against state-of-the-art solutions. Chapter 5 considers clustering coefficients problem. Similarly to Betweenness Centrality, it is a fundamental tool in network analysis, as it specifically measures how nodes tend to cluster together in a network. In the chapter, we first extend caching techniques to Remote Memory Access (RMA) operations on distributed-memory system. The caching layer is mainly designed to avoid inter-node communications in order to achieve similar benefits for irregular applications as communication-avoiding algorithms. We also show how cached RMA is able to improve the performance of a new distributed asynchronous algorithm for the computation of local clustering coefficients. Finally, Chapter 6 contains a brief summary of the key contributions described in the dissertation and presents potential future directions of the
work
EEG sleep stages identification based on weighted undirected complex networks
Sleep scoring is important in sleep research because any errors in the scoring of the patient's sleep electroencephalography (EEG) recordings can cause serious problems such as incorrect diagnosis, medication errors, and misinterpretations of patient's EEG recordings. The aim of this research is to develop a new automatic method for EEG sleep stages classification based on a statistical model and weighted brain networks.
Methods
each EEG segment is partitioned into a number of blocks using a sliding window technique. A set of statistical features are extracted from each block. As a result, a vector of features is obtained to represent each EEG segment. Then, the vector of features is mapped into a weighted undirected network. Different structural and spectral attributes of the networks are extracted and forwarded to a least square support vector machine (LS-SVM) classifier. At the same time the network's attributes are also thoroughly investigated. It is found that the network's characteristics vary with their sleep stages. Each sleep stage is best represented using the key features of their networks.
Results
In this paper, the proposed method is evaluated using two datasets acquired from different channels of EEG (Pz-Oz and C3-A2) according to the R&K and the AASM without pre-processing the original EEG data. The obtained results by the LS-SVM are compared with those by NaĂŻve, k-nearest and a multi-class-SVM. The proposed method is also compared with other benchmark sleep stages classification methods. The comparison results demonstrate that the proposed method has an advantage in scoring sleep stages based on single channel EEG signals.
Conclusions
An average accuracy of 96.74% is obtained with the C3-A2 channel according to the AASM standard, and 96% with the Pz-Oz channel based on the R&K standard
Tunable and Growing Network Generation Model with Community Structures
Recent years have seen a growing interest in the modeling and simulation of
social networks to understand several social phenomena. Two important classes
of networks, small world and scale free networks have gained a lot of research
interest. Another important characteristic of social networks is the presence
of community structures. Many social processes such as information diffusion
and disease epidemics depend on the presence of community structures making it
an important property for network generation models to be incorporated. In this
paper, we present a tunable and growing network generation model with small
world and scale free properties as well as the presence of community
structures. The major contribution of this model is that the communities thus
created satisfy three important structural properties: connectivity within each
community follows power-law, communities have high clustering coefficient and
hierarchical community structures are present in the networks generated using
the proposed model. Furthermore, the model is highly robust and capable of
producing networks with a number of different topological characteristics
varying clustering coefficient and inter-cluster edges. Our simulation results
show that the model produces small world and scale free networks along with the
presence of communities depicting real world societies and social networks.Comment: Social Computing and Its Applications, SCA 13, Karlsruhe : Germany
(2013
Searching for network modules
When analyzing complex networks a key target is to uncover their modular
structure, which means searching for a family of modules, namely node subsets
spanning each a subnetwork more densely connected than the average. This work
proposes a novel type of objective function for graph clustering, in the form
of a multilinear polynomial whose coefficients are determined by network
topology. It may be thought of as a potential function, to be maximized, taking
its values on fuzzy clusterings or families of fuzzy subsets of nodes over
which every node distributes a unit membership. When suitably parametrized,
this potential is shown to attain its maximum when every node concentrates its
all unit membership on some module. The output thus is a partition, while the
original discrete optimization problem is turned into a continuous version
allowing to conceive alternative search strategies. The instance of the problem
being a pseudo-Boolean function assigning real-valued cluster scores to node
subsets, modularity maximization is employed to exemplify a so-called quadratic
form, in that the scores of singletons and pairs also fully determine the
scores of larger clusters, while the resulting multilinear polynomial potential
function has degree 2. After considering further quadratic instances, different
from modularity and obtained by interpreting network topology in alternative
manners, a greedy local-search strategy for the continuous framework is
analytically compared with an existing greedy agglomerative procedure for the
discrete case. Overlapping is finally discussed in terms of multiple runs, i.e.
several local searches with different initializations.Comment: 10 page
- …