3,354 research outputs found
Hidden Variables in Bipartite Networks
We introduce and study random bipartite networks with hidden variables. Nodes
in these networks are characterized by hidden variables which control the
appearance of links between node pairs. We derive analytic expressions for the
degree distribution, degree correlations, the distribution of the number of
common neighbors, and the bipartite clustering coefficient in these networks.
We also establish the relationship between degrees of nodes in original
bipartite networks and in their unipartite projections. We further demonstrate
how hidden variable formalism can be applied to analyze topological properties
of networks in certain bipartite network models, and verify our analytical
results in numerical simulations
When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors
Finding similar user pairs is a fundamental task in social networks, with
numerous applications in ranking and personalization tasks such as link
prediction and tie strength detection. A common manifestation of user
similarity is based upon network structure: each user is represented by a
vector that represents the user's network connections, where pairwise cosine
similarity among these vectors defines user similarity. The predominant task
for user similarity applications is to discover all similar pairs that have a
pairwise cosine similarity value larger than a given threshold . In
contrast to previous work where is assumed to be quite close to 1, we
focus on recommendation applications where is small, but still
meaningful. The all pairs cosine similarity problem is computationally
challenging on networks with billions of edges, and especially so for settings
with small . To the best of our knowledge, there is no practical solution
for computing all user pairs with, say on large social networks,
even using the power of distributed algorithms.
Our work directly addresses this challenge by introducing a new algorithm ---
WHIMP --- that solves this problem efficiently in the MapReduce model. The key
insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for
approximate matrix multiplication with the SimHash random projection techniques
of Charikar. We provide a theoretical analysis of WHIMP, proving that it has
near optimal communication costs while maintaining computation cost comparable
with the state of the art. We also empirically demonstrate WHIMP's scalability
by computing all highly similar pairs on four massive data sets, and show that
it accurately finds high similarity pairs. In particular, we note that WHIMP
successfully processes the entire Twitter network, which has tens of billions
of edges
Randomness and Complexity in Networks
I start by reviewing some basic properties of random graphs. I then consider
the role of random walks in complex networks and show how they may be used to
explain why so many long tailed distributions are found in real data sets. The
key idea is that in many cases the process involves copying of properties of
near neighbours in the network and this is a type of short random walk which in
turn produce a natural preferential attachment mechanism. Applying this to
networks of fixed size I show that copying and innovation are processes with
special mathematical properties which include the ability to solve a simple
model exactly for any parameter values and at any time. I finish by looking at
variations of this basic model.Comment: Survey paper based on talk given at the workshop on ``Stochastic
Networks and Internet Technology'', Centro di Ricerca Matematica Ennio De
Giorgi, Matematica nelle Scienze Naturali e Sociali, Pisa, 17th - 21st
September 2007. To appear in proceeding
- …