1,112 research outputs found
A Faster Method to Estimate Closeness Centrality Ranking
Closeness centrality is one way of measuring how central a node is in the
given network. The closeness centrality measure assigns a centrality value to
each node based on its accessibility to the whole network. In real life
applications, we are mainly interested in ranking nodes based on their
centrality values. The classical method to compute the rank of a node first
computes the closeness centrality of all nodes and then compares them to get
its rank. Its time complexity is , where represents total
number of nodes, and represents total number of edges in the network. In
the present work, we propose a heuristic method to fast estimate the closeness
rank of a node in time complexity, where . We
also propose an extended improved method using uniform sampling technique. This
method better estimates the rank and it has the time complexity , where . This is an excellent improvement over the
classical centrality ranking method. The efficiency of the proposed methods is
verified on real world scale-free social networks using absolute and weighted
error functions
APPLICATION OF GROUP TESTING FOR ANALYZING NOISY NETWORKS
My dissertation focuses on developing scalable algorithms for analyzing large complex networks and evaluating how the results alter with changes to the network. Network analysis has become a ubiquitous and very effective tool in big data analysis, particularly for understanding the mechanisms of complex systems that arise in diverse disciplines such as cybersecurity [83], biology [15], sociology [5], and epidemiology [7]. However, data from real-world systems are inherently noisy because they are influenced by fluctuations in experiments, subjective interpretation of data, and limitation of computing resources. Therefore, the corresponding networks are also approximate. This research addresses these issues of obtaining accurate results from large noisy networks efficiently.
My dissertation has four main components. The first component consists of developing efficient and scalable algorithms for centrality computations that produce reliable results on noisy networks. Two novel contributions I made in this area are the development of a group testing [16] based algorithm for identification of high centrality vertices which is extremely faster than current methods, and an algorithm for computing the betweenness centrality of a specific vertex.
The second component consists of developing quantitative metrics to measure how different noise models affect the analysis results. We implemented a uniform perturbation model based on random addition/ deletion of edges of a network. To quantify the stability of a network we investigated the effect that perturbations have on the top-k ranked vertices and the local structure properties of the top ranked vertices.
The third component consists of developing efficient software for network analysis. I have been part of the development of a software package, ESSENS (Extensible, Scalable Software for Evolving NetworkS) [76], that effectively supports our algorithms on large networks.
The fourth component is a literature review of the various noise models that researchers have applied to networks and the methods they have used to quantify the stability, sensitivity, robustness, and reliability of networks.
These four aspects together will lead to efficient, accurate, and highly scalable algorithms for analyzing noisy networks
Distance-generalized Core Decomposition
The -core of a graph is defined as the maximal subgraph in which every
vertex is connected to at least other vertices within that subgraph. In
this work we introduce a distance-based generalization of the notion of
-core, which we refer to as the -core, i.e., the maximal subgraph in
which every vertex has at least other vertices at distance within
that subgraph. We study the properties of the -core showing that it
preserves many of the nice features of the classic core decomposition (e.g.,
its connection with the notion of distance-generalized chromatic number) and it
preserves its usefulness to speed-up or approximate distance-generalized
notions of dense structures, such as -club.
Computing the distance-generalized core decomposition over large networks is
intrinsically complex. However, by exploiting clever upper and lower bounds we
can partition the computation in a set of totally independent subcomputations,
opening the door to top-down exploration and to multithreading, and thus
achieving an efficient algorithm
Closeness Centrality Algorithms For Multilayer Networks
Centrality measures for simple graphs are well-defined and several
main-memory algorithms exist for each. Simple graphs are not adequate for
modeling complex data sets with multiple entities and relationships. Multilayer
networks (MLNs) have been shown to be better suited, but there are very few
algorithms for centrality computation directly on MLNs. They are converted
(aggregated or collapsed) to simple graphs using Boolean AND or OR operators to
compute centrality, which is not only inefficient but incurs a loss of
structure and semantics. In this paper, we propose algorithms that compute
closeness centrality on an MLN directly using a novel decoupling-based
approach. Individual results of layers (or simple graphs) of an MLN are used
and a composition function developed to compute the centrality for the MLN. The
challenge is to do this accurately and efficiently. However, since these
algorithms do not have complete information of the MLN, computing a global
measure such as closeness centrality is a challenge. Hence, these algorithms
rely on heuristics derived from intuition. The advantage is that this approach
lends itself to parallelism and is more efficient compared to the traditional
approach. We present two heuristics for composition and experimentally validate
accuracy and efficiency on a large number of synthetic and real-world graphs
with diverse characteristics
Analysis of Protein-Protein Interaction Networks Using High Performance Scalable Tools
Protein-Protein Interaction (PPI) Research currently generates an extraordinary amount of publications and interest in fellow computer scientists and biologists alike because of the underlying potential of the source material that researchers can work with. PPI networks are the networks of protein complexes formed by biochemical events or electrostatic forces serving a biological function [1]. Since the analysis of the protein networks is now growing, we have more information regarding protein, genomes and their influence on life. Today, PPI networks are used to study diseases, improve drugs and understand other processes in medicine and health that will eventually help mankind.
Though PPI network research is considered extremely important in the field, there is an issue – we do not have enough people who have enough interdisciplinary knowledge in both the fields of biology and computer science; this limits our rate of progress in the field.
Most biologists that are not expert coders need a way of calculating graph values and information that will help them analyze the graphs better without having to manipulate the data themselves. In this research, I test a few ways of achieving results through the use of available frameworks and algorithms, present the results and compare each method’s efficacy.
My analysis takes place on very large datasets where I calculate several centralities and other data from the graph using different metrics, and I also visualize them in order to gain further insight. I also managed to note the significance of MPI and multithreading on the results thus obtained that suggest building scalable tools will help improve the analysis immensely
- …