58 research outputs found
Observations on the Dynamic Evolution of Peer-to-Peer Networks
A fundamental theoretical challenge in peer-to-peer systems is proving statements about the evolution of the system while nodes are continuously joining and leaving. Because the system will operate for an infinite time, performance measures based on runtime are uninformative; instead, we must study the rate at which nodes consume resources to maintain the system state
An algorithmic approach to social networks
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 109-120).Social networks consist of a set of individuals and some form of social relationship that ties the individuals together. In this thesis, we use algorithmic techniques to study three aspects of social networks: (1) we analyze the "small-world" phenomenon by examining the geographic patterns of friendships in a large-scale social network, showing how this linkage pattern can itself explain the small-world results; (2) using existing patterns of friendship in a social network and a variety of graph-theoretic techniques, we show how to predict new relationships that will form in the network in the near future; and (3) we show how to infer social connections over which information flows in a network, by examining the times at which individuals in the network exhibit certain pieces of information, or interest in certain topics. Our approach is simultaneously theoretical and data-driven, and our results are based upon real experiments on real social-network data in addition to theoretical investigations of mathematical models of social networks.by David Liben-Nowell.Ph.D
Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions
Algorithms to find optimal alignments among strings, or to find a
parsimonious summary of a collection of strings, are well studied in a variety
of contexts, addressing a wide range of interesting applications. In this
paper, we consider chain letters, which contain a growing sequence of
signatories added as the letter propagates. The unusual constellation of
features exhibited by chain letters (one-ended growth, divergence, and
mutation) make their propagation, and thus the corresponding reconstruction
problem, both distinctive and rich. Here, inspired by these chain letters, we
formally define the problem of computing an optimal summary of a set of
diverging string sequences. From a collection of these sequences of names, with
each sequence noisily corresponding to a branch of the unknown tree
representing the letter's true dissemination, can we efficiently and accurately
reconstruct a tree ? In this paper, we give efficient exact
algorithms for this summarization problem when the number of sequences is
small; for larger sets of sequences, we prove hardness and provide an efficient
heuristic algorithm. We evaluate this heuristic on synthetic data sets chosen
to emulate real chain letters, showing that our algorithm is competitive with
or better than previous approaches, and that it also comes close to finding the
true trees in these synthetic datasets.Comment: 18 pages, 6 figures. Accepted to Combinatorial Pattern Matching (CPM)
202
Do Diffusion Protocols Govern Cascade Growth?
Large cascades can develop in online social networks as people share
information with one another. Though simple reshare cascades have been studied
extensively, the full range of cascading behaviors on social media is much more
diverse. Here we study how diffusion protocols, or the social exchanges that
enable information transmission, affect cascade growth, analogous to the way
communication protocols define how information is transmitted from one point to
another. Studying 98 of the largest information cascades on Facebook, we find a
wide range of diffusion protocols - from cascading reshares of images, which
use a simple protocol of tapping a single button for propagation, to the ALS
Ice Bucket Challenge, whose diffusion protocol involved individuals creating
and posting a video, and then nominating specific others to do the same. We
find recurring classes of diffusion protocols, and identify two key
counterbalancing factors in the construction of these protocols, with
implications for a cascade's growth: the effort required to participate in the
cascade, and the social cost of staying on the sidelines. Protocols requiring
greater individual effort slow down a cascade's propagation, while those
imposing a greater social cost of not participating increase the cascade's
adoption likelihood. The predictability of transmission also varies with
protocol. But regardless of mechanism, the cascades in our analysis all have a
similar reproduction number ( 1.8), meaning that lower rates of
exposure can be offset with higher per-exposure rates of adoption. Last, we
show how a cascade's structure can not only differentiate these protocols, but
also be modeled through branching processes. Together, these findings provide a
framework for understanding how a wide variety of information cascades can
achieve substantial adoption across a network.Comment: ICWSM 201
Geographic Routing in Social Networks
We live in a ‘‘small world,’’ where two arbitrary people are likely connected by a short chain of intermediate friends. With scant information about a target individual, people can successively forward a message along such a chain. Experimental studies have verified this property in real social networks, and theoretical models have been advanced to explain it. However, existing theoretical models have not been shown to capture behavior in real-world social networks. Here, we introduce a richer model relating geography and social-network friendship, in which the probability of befriending a particular person is inversely proportional to the number of closer people. In a large social network, we show that one-third of the friendships are independent of geography and the remainder exhibit the proposed relationship. Further, we prove analytically that short chains can be discovered in every network exhibiting the relationship
The Danger of Testing by Selecting Controlled Subsets, with Applications to Spoken-Word Recognition
When examining the effects of a continuous variable 'x' on an outcome 'y', a researcher might choose to dichotomize on 'x', dividing the population into two sets—low 'x' and high 'x'—and testing whether these two subpopulations differ with respect to 'y'. Dichotomization has long been known to incur a cost in statistical power, but there remain circumstances in which it is appealing: an experimenter might use it to control for confounding covariates through subset selection, by carefully choosing a subpopulation of Low and a corresponding subpopulation of High that are balanced with respect to a list of control variables, and then comparing the subpopulations’ 'y' values. This “divide, select, and test” approach is used in many papers throughout the psycholinguistics literature, and elsewhere. Here we show that, despite the apparent innocuousness, these methodological choices can lead to erroneous results, in two ways. First, if the balanced subsets of Low and High are selected in certain ways, it is possible to conclude a relationship between 'x' and 'y' not present in the full population. Specifically, we show that previously published conclusions drawn from this methodology—about the effect of a particular lexical property on spoken-word recognition—do not in fact appear to hold. Second, if the balanced subsets of Low and High are selected randomly, this methodology frequently fails to show a relationship between 'x' and 'y' that is present in the full population. Our work uncovers a new facet of an ongoing research effort: to identify and reveal the implicit freedoms of experimental design that can lead to false conclusions
Fast matrix computations for pair-wise and column-wise commute times and Katz scores
We first explore methods for approximating the commute time and Katz score
between a pair of nodes. These methods are based on the approach of matrices,
moments, and quadrature developed in the numerical linear algebra community.
They rely on the Lanczos process and provide upper and lower bounds on an
estimate of the pair-wise scores. We also explore methods to approximate the
commute times and Katz scores from a node to all other nodes in the graph.
Here, our approach for the commute times is based on a variation of the
conjugate gradient algorithm, and it provides an estimate of all the diagonals
of the inverse of a matrix. Our technique for the Katz scores is based on
exploiting an empirical localization property of the Katz matrix. We adopt
algorithms used for personalized PageRank computing to these Katz scores and
theoretically show that this approach is convergent. We evaluate these methods
on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our
results show that our pair-wise commute time method and column-wise Katz
algorithm both have attractive theoretical properties and empirical
performance.Comment: 35 pages, journal version of
http://dx.doi.org/10.1007/978-3-642-18009-5_13 which has been submitted for
publication. Please see
http://www.cs.purdue.edu/homes/dgleich/publications/2011/codes/fast-katz/ for
supplemental code
Community detection in graphs
The modern science of networks has brought significant advances to our
understanding of complex systems. One of the most relevant features of graphs
representing real systems is community structure, or clustering, i. e. the
organization of vertices in clusters, with many edges joining vertices of the
same cluster and comparatively few edges joining vertices of different
clusters. Such clusters, or communities, can be considered as fairly
independent compartments of a graph, playing a similar role like, e. g., the
tissues or the organs in the human body. Detecting communities is of great
importance in sociology, biology and computer science, disciplines where
systems are often represented as graphs. This problem is very hard and not yet
satisfactorily solved, despite the huge effort of a large interdisciplinary
community of scientists working on it over the past few years. We will attempt
a thorough exposition of the topic, from the definition of the main elements of
the problem, to the presentation of most methods developed, with a special
focus on techniques designed by statistical physicists, from the discussion of
crucial issues like the significance of clustering and how methods should be
tested and compared against each other, to the description of applications to
real networks.Comment: Review article. 103 pages, 42 figures, 2 tables. Two sections
expanded + minor modifications. Three figures + one table + references added.
Final version published in Physics Report
- …