221 research outputs found
Faster Random Walks By Rewiring Online Social Networks On-The-Fly
Many online social networks feature restrictive web interfaces which only
allow the query of a user's local neighborhood through the interface. To enable
analytics over such an online social network through its restrictive web
interface, many recent efforts reuse the existing Markov Chain Monte Carlo
methods such as random walks to sample the social network and support analytics
based on the samples. The problem with such an approach, however, is the large
amount of queries often required (i.e., a long "mixing time") for a random walk
to reach a desired (stationary) sampling distribution.
In this paper, we consider a novel problem of enabling a faster random walk
over online social networks by "rewiring" the social network on-the-fly.
Specifically, we develop Modified TOpology (MTO)-Sampler which, by using only
information exposed by the restrictive web interface, constructs a "virtual"
overlay topology of the social network while performing a random walk, and
ensures that the random walk follows the modified overlay topology rather than
the original one. We show that MTO-Sampler not only provably enhances the
efficiency of sampling, but also achieves significant savings on query cost
over real-world online social networks such as Google Plus, Epinion etc.Comment: 15 pages, 14 figure, technical report for ICDE2013 paper. Appendix
has all the theorems' proofs; ICDE'201
Navigating Networks with Limited Information
We study navigation with limited information in networks and demonstrate that
many real-world networks have a structure which can be described as favoring
communication at short distance at the cost of constraining communication at
long distance. This feature, which is robust and more evident with limited than
with complete information, reflects both topological and possibly functional
design characteristics. For example, the characteristics of the networks
studied derived from a city and from the Internet are manifested through
modular network designs. We also observe that directed navigation in typical
networks requires remarkably little information on the level of individual
nodes. By studying navigation, or specific signaling, we take a complementary
approach to the common studies of information transfer devoted to broadcasting
of information in studies of virus spreading and the like.Comment: 6 pages, 6 figures. For associated Java applet, see
http://cmol.nbi.dk/models/bit/bit.htm
Degree Ranking Using Local Information
Most real world dynamic networks are evolved very fast with time. It is not
feasible to collect the entire network at any given time to study its
characteristics. This creates the need to propose local algorithms to study
various properties of the network. In the present work, we estimate degree rank
of a node without having the entire network. The proposed methods are based on
the power law degree distribution characteristic or sampling techniques. The
proposed methods are simulated on synthetic networks, as well as on real world
social networks. The efficiency of the proposed methods is evaluated using
absolute and weighted error functions. Results show that the degree rank of a
node can be estimated with high accuracy using only samples of the
network size. The accuracy of the estimation decreases from high ranked to low
ranked nodes. We further extend the proposed methods for random networks and
validate their efficiency on synthetic random networks, that are generated
using Erd\H{o}s-R\'{e}nyi model. Results show that the proposed methods can be
efficiently used for random networks as well
On sampling nodes in a network
Random walk is an important tool in many graph mining applications including estimating graph parameters, sampling portions of the graph, and extracting dense communities. In this paper we consider the problem of sampling nodes from a large graph according to a prescribed distribution by using random walk as the basic primitive. Our goal is to obtain algorithms that make a small number of queries to the graph but output a node that is sampled according to the prescribed distribution. Focusing on the uniform distribution case, we study the query complexity of three algorithms and show a near-tight bound expressed in terms of the parameters of the graph such as average degree and the mixing time. Both theoretically and empirically, we show that some algorithms are preferable in practice than the others. We also extend our study to the problem of sampling nodes according to some polynomial function of their degrees; this has implications for designing efficient algorithms for applications such as triangle counting
Advances in Learning and Understanding with Graphs through Machine Learning
Graphs have increasingly become a crucial way of representing large, complex and disparate datasets from a range of domains, including many scientific disciplines. Graphs are particularly useful at capturing complex relationships or interdependencies within or even between datasets, and enable unique insights which are not possible with other data formats. Over recent years, significant improvements in the ability of machine learning approaches to automatically learn from and identify patterns in datasets have been made.
However due to the unique nature of graphs, and the data they are used to represent, employing machine learning with graphs has thus far proved challenging. A review of relevant literature has revealed that key challenges include issues arising with macro-scale graph learning, interpretability of machine learned representations and a failure to incorporate the temporal dimension present in many datasets. Thus, the work and contributions presented in this thesis primarily investigate how modern machine learning techniques can be adapted to tackle key graph mining tasks, with a particular focus on optimal macro-level representation, interpretability and incorporating temporal dynamics into the learning process. The majority of methods employed are novel approaches centered around attempting to use artificial neural networks in order to learn from graph datasets.
Firstly, by devising a novel graph fingerprint technique, it is demonstrated that this can successfully be applied to two different tasks whilst out-performing established baselines, namely graph comparison and classification. Secondly, it is shown that a mapping can be found between certain topological features and graph embeddings. This, for perhaps the the first time, suggests that it is possible that machines are learning something analogous to human knowledge acquisition, thus bringing interpretability to the graph embedding process. Thirdly, in exploring two new models for incorporating temporal information into the graph learning process, it is found that including such information is crucial to predictive performance in certain key tasks, such as link prediction, where state-of-the-art baselines are out-performed.
The overall contribution of this work is to provide greater insight into and explanation of the ways in which machine learning with respect to graphs is emerging as a crucial set of techniques for understanding complex datasets. This is important as these techniques can potentially be applied to a broad range of scientific disciplines. The thesis concludes with an assessment of limitations and recommendations for future research
- …