146,843 research outputs found
Small-World File-Sharing Communities
Web caches, content distribution networks, peer-to-peer file sharing
networks, distributed file systems, and data grids all have in common that they
involve a community of users who generate requests for shared data. In each
case, overall system performance can be improved significantly if we can first
identify and then exploit interesting structure within a community's access
patterns. To this end, we propose a novel perspective on file sharing based on
the study of the relationships that form among users based on the files in
which they are interested.
We propose a new structure that captures common user interests in data--the
data-sharing graph-- and justify its utility with studies on three
data-distribution systems: a high-energy physics collaboration, the Web, and
the Kazaa peer-to-peer network. We find small-world patterns in the
data-sharing graphs of all three communities. We analyze these graphs and
propose some probable causes for these emergent small-world patterns. The
significance of small-world patterns is twofold: it provides a rigorous support
to intuition and, perhaps most importantly, it suggests ways to design
mechanisms that exploit these naturally emerging patterns
Exploring the topology of small-world networks
Complex networks have been studied for a long time in order to understand various real-world complex systems around us. Complex systems, such as the WWW, the movie-actor network, social networks and neural networks, are systems made of many non-identical elements connected by diverse interactions. The study of the network topology is one of important issues on the way of exploring such systems, because the structure always affects the system function. Traditionally, these systems have been modeled as either completely ordered graphs or completely random graphs. Until recently, some surprising empirical results in the field of complex networks, like 19 clicks of the web s diameter and 6 degrees of separation in social networks, show us the small-world phenomena existing in some large sparse networks. This finding motivates the interest in small-world networks. The objective of the project is to study the properties of small-world networks and the network evolution over time via experiments on a movie actor collaboration network; to find their different characteristics by comparing small-world networks with random networks; and to analyze the factors that result in such differences. The properties of small-world networks discussed here include small diameter, sparseness, clustering, giant component, power-law degree distribution and short path discovery. Also, four existing network models are studied in this project: Watts-Strogatz Small-world model, Erd s R nyi Random-graph model, A.-L. Barab si Scale-free model and Jon Kleinberg Small-world model
A Framework for Web Object Self-Preservation
We propose and develop a framework based on emergent behavior principles for the long-term preservation of digital data using the web infrastructure. We present the development of the framework called unsupervised small-world (USW) which is at the nexus of emergent behavior, graph theory, and digital preservation. The USW algorithm creates graph based structures on the Web used for preservation of web objects (WOs). Emergent behavior activities, based on Craig Reynolds’ “boids” concept, are used to preserve WOs without the need for a central archiving authority. Graph theory is extended by developing an algorithm that incrementally creates small-world graphs. Graph theory provides a foundation to discuss the vulnerability of graphs to different types of failures and attack profiles. Investigation into the robustness and resilience of USW graphs lead to the development of a metric to quantify the effect of damage inflicted on a graph. The metric remains valid whether the graph is connected or not. Different USW preservation policies are explored within a simulation environment where preservation copies have to be spread across hosts. Spreading the copies across hosts helps to ensure that copies will remain available even when there is a concerted effort to remove all copies of a USW component. A moderately aggressive preservation policy is the most effective at making the best use of host and network resources.
Our efforts are directed at answering the following research questions:
1. Can web objects (WOs) be constructed to outlive the people and institutions that created them?
We have developed, analyzed, tested through simulations, and developed a reference implementation of the unsupervised small-world (USW) algorithm that we believe will create a connected network of WOs based on the web infrastructure (WI) that will outlive the people and institutions that created the WOs. The USW graph will outlive its creators by being robust and continuing to operate when some of its WOs are lost, and it is resilient and will recover when some of its WOs are lost.
2. Can we leverage aspects of naturally occurring networks and group behavior for preservation?
We used Reynolds’ tenets for “boids” to guide our analysis and development of the USW algorithm. The USW algorithm allows a WO to “explore” a portion of the USW graph before making connections to members of the graph and before making preservation copies across the “discovered” graph. Analysis and simulation show that the USW graph has an average path length (L(G)) and clustering coefficient (C(G)) values comparable to small-world graphs. A high C(G) is important because it reflects how likely it is that a WO will be able spread copies to other domains, thereby increasing its likelihood of long term survival. A short L(G) is important because it means that a WO will not have to look too far to identify new candidate preservation domains, if needed. Small-world graphs occur in nature and are thus believed to be robust and resilient. The USW algorithms use these small-world graph characteristics to spread preservation copies across as many hosts as needed and possible.
USW graph creation, damage, repair and preservation has been developed and tested in a simulation and reference implementation
Efficient Exact and Approximate Algorithms for Computing Betweenness Centrality in Directed Graphs
Graphs are an important tool to model data in different domains, including
social networks, bioinformatics and the world wide web. Most of the networks
formed in these domains are directed graphs, where all the edges have a
direction and they are not symmetric. Betweenness centrality is an important
index widely used to analyze networks. In this paper, first given a directed
network and a vertex , we propose a new exact algorithm to
compute betweenness score of . Our algorithm pre-computes a set
, which is used to prune a huge amount of computations that do
not contribute in the betweenness score of . Time complexity of our exact
algorithm depends on and it is respectively
and
for unweighted graphs and weighted graphs with positive weights.
is bounded from above by and in most cases, it
is a small constant. Then, for the cases where is large, we
present a simple randomized algorithm that samples from and
performs computations for only the sampled elements. We show that this
algorithm provides an -approximation of the betweenness
score of . Finally, we perform extensive experiments over several real-world
datasets from different domains for several randomly chosen vertices as well as
for the vertices with the highest betweenness scores. Our experiments reveal
that in most cases, our algorithm significantly outperforms the most efficient
existing randomized algorithms, in terms of both running time and accuracy. Our
experiments also show that our proposed algorithm computes betweenness scores
of all vertices in the sets of sizes 5, 10 and 15, much faster and more
accurate than the most efficient existing algorithms.Comment: arXiv admin note: text overlap with arXiv:1704.0735
Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning
The number of triangles is a computationally expensive graph statistic which
is frequently used in complex network analysis (e.g., transitivity ratio), in
various random graph models (e.g., exponential random graph model) and in
important real world applications such as spam detection, uncovering of the
hidden thematic structure of the Web and link recommendation. Counting
triangles in graphs with millions and billions of edges requires algorithms
which run fast, use small amount of space, provide accurate estimates of the
number of triangles and preferably are parallelizable.
In this paper we present an efficient triangle counting algorithm which can
be adapted to the semistreaming model. The key idea of our algorithm is to
combine the sampling algorithm of Tsourakakis et al. and the partitioning of
the set of vertices into a high degree and a low degree subset respectively as
in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a
running time
and an approximation (multiplicative error), where is the number
of vertices, the number of edges and the maximum number of
triangles an edge is contained.
Furthermore, we show how this algorithm can be adapted to the semistreaming
model with space usage and a constant number of passes (three) over the graph
stream. We apply our methods in various networks with several millions of edges
and we obtain excellent results. Finally, we propose a random projection based
method for triangle counting and provide a sufficient condition to obtain an
estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models
for the Web Graph (WAW 2010
Recommended from our members
SpectralNET – an application for spectral graph analysis and visualization
BACKGROUND: Graph theory provides a computational framework for modeling a variety of datasets including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as graphs of nodes (vertices) and interactions (edges) that can carry different weights. SpectralNET is a flexible application for analyzing and visualizing these biological and chemical networks. RESULTS: Available both as a standalone .NET executable and as an ASP.NET web application, SpectralNET was designed specifically with the analysis of graph-theoretic metrics in mind, a computational task not easily accessible using currently available applications. Users can choose either to upload a network for analysis using a variety of input formats, or to have SpectralNET generate an idealized random network for comparison to a real-world dataset. Whichever graph-generation method is used, SpectralNET displays detailed information about each connected component of the graph, including graphs of degree distribution, clustering coefficient by degree, and average distance by degree. In addition, extensive information about the selected vertex is shown, including degree, clustering coefficient, various distance metrics, and the corresponding components of the adjacency, Laplacian, and normalized Laplacian eigenvectors. SpectralNET also displays several graph visualizations, including a linear dimensionality reduction for uploaded datasets (Principal Components Analysis) and a non-linear dimensionality reduction that provides an elegant view of global graph structure (Laplacian eigenvectors). CONCLUSION: SpectralNET provides an easily accessible means of analyzing graph-theoretic metrics for data modeling and dimensionality reduction. SpectralNET is publicly available as both a .NET application and an ASP.NET web application from . Source code is available upon request
The structure and function of complex networks
Inspired by empirical studies of networked systems such as the Internet,
social networks, and biological networks, researchers have in recent years
developed a variety of techniques and models to help us understand or predict
the behavior of these systems. Here we review developments in this field,
including such concepts as the small-world effect, degree distributions,
clustering, network correlations, random graph models, models of network growth
and preferential attachment, and dynamical processes taking place on networks.Comment: Review article, 58 pages, 16 figures, 3 tables, 429 references,
published in SIAM Review (2003
- …