43,334 research outputs found
Quick Detection of High-degree Entities in Large Directed Networks
In this paper, we address the problem of quick detection of high-degree
entities in large online social networks. Practical importance of this problem
is attested by a large number of companies that continuously collect and update
statistics about popular entities, usually using the degree of an entity as an
approximation of its popularity. We suggest a simple, efficient, and easy to
implement two-stage randomized algorithm that provides highly accurate
solutions for this problem. For instance, our algorithm needs only one thousand
API requests in order to find the top-100 most followed users in Twitter, a
network with approximately a billion of registered users, with more than 90%
precision. Our algorithm significantly outperforms existing methods and serves
many different purposes, such as finding the most popular users or the most
popular interest groups in social networks. An important contribution of this
work is the analysis of the proposed algorithm using Extreme Value Theory -- a
branch of probability that studies extreme events and properties of largest
order statistics in random samples. Using this theory, we derive an accurate
prediction for the algorithm's performance and show that the number of API
requests for finding the top-k most popular entities is sublinear in the number
of entities. Moreover, we formally show that the high variability among the
entities, expressed through heavy-tailed distributions, is the reason for the
algorithm's efficiency. We quantify this phenomenon in a rigorous mathematical
way
Cost-efficient vaccination protocols for network epidemiology
We investigate methods to vaccinate contact networks -- i.e. removing nodes
in such a way that disease spreading is hindered as much as possible -- with
respect to their cost-efficiency. Any real implementation of such protocols
would come with costs related both to the vaccination itself, and gathering of
information about the network. Disregarding this, we argue, would lead to
erroneous evaluation of vaccination protocols. We use the
susceptible-infected-recovered model -- the generic model for diseases making
patients immune upon recovery -- as our disease-spreading scenario, and analyze
outbreaks on both empirical and model networks. For different relative costs,
different protocols dominate. For high vaccination costs and low costs of
gathering information, the so-called acquaintance vaccination is the most cost
efficient. For other parameter values, protocols designed for query-efficient
identification of the network's largest degrees are most efficient
Network inference and community detection, based on covariance matrices, correlations and test statistics from arbitrary distributions
In this paper we propose methodology for inference of binary-valued adjacency
matrices from various measures of the strength of association between pairs of
network nodes, or more generally pairs of variables. This strength of
association can be quantified by sample covariance and correlation matrices,
and more generally by test-statistics and hypothesis test p-values from
arbitrary distributions. Community detection methods such as block modelling
typically require binary-valued adjacency matrices as a starting point. Hence,
a main motivation for the methodology we propose is to obtain binary-valued
adjacency matrices from such pairwise measures of strength of association
between variables. The proposed methodology is applicable to large
high-dimensional data-sets and is based on computationally efficient
algorithms. We illustrate its utility in a range of contexts and data-sets
Personalized PageRank with Node-dependent Restart
Personalized PageRank is an algorithm to classify the improtance of web pages
on a user-dependent basis. We introduce two generalizations of Personalized
PageRank with node-dependent restart. The first generalization is based on the
proportion of visits to nodes before the restart, whereas the second
generalization is based on the probability of visited node just before the
restart. In the original case of constant restart probability, the two measures
coincide. We discuss interesting particular cases of restart probabilities and
restart distributions. We show that the both generalizations of Personalized
PageRank have an elegant expression connecting the so-called direct and reverse
Personalized PageRanks that yield a symmetry property of these Personalized
PageRanks
A Latent Parameter Node-Centric Model for Spatial Networks
Spatial networks, in which nodes and edges are embedded in space, play a
vital role in the study of complex systems. For example, many social networks
attach geo-location information to each user, allowing the study of not only
topological interactions between users, but spatial interactions as well. The
defining property of spatial networks is that edge distances are associated
with a cost, which may subtly influence the topology of the network. However,
the cost function over distance is rarely known, thus developing a model of
connections in spatial networks is a difficult task.
In this paper, we introduce a novel model for capturing the interaction
between spatial effects and network structure. Our approach represents a unique
combination of ideas from latent variable statistical models and spatial
network modeling. In contrast to previous work, we view the ability to form
long/short-distance connections to be dependent on the individual nodes
involved. For example, a node's specific surroundings (e.g. network structure
and node density) may make it more likely to form a long distance link than
other nodes with the same degree. To capture this information, we attach a
latent variable to each node which represents a node's spatial reach. These
variables are inferred from the network structure using a Markov Chain Monte
Carlo algorithm.
We experimentally evaluate our proposed model on 4 different types of
real-world spatial networks (e.g. transportation, biological, infrastructure,
and social). We apply our model to the task of link prediction and achieve up
to a 35% improvement over previous approaches in terms of the area under the
ROC curve. Additionally, we show that our model is particularly helpful for
predicting links between nodes with low degrees. In these cases, we see much
larger improvements over previous models
Autonomous flight and remote site landing guidance research for helicopters
Automated low-altitude flight and landing in remote areas within a civilian environment are investigated, where initial cost, ongoing maintenance costs, and system productivity are important considerations. An approach has been taken which has: (1) utilized those technologies developed for military applications which are directly transferable to a civilian mission; (2) exploited and developed technology areas where new methods or concepts are required; and (3) undertaken research with the potential to lead to innovative methods or concepts required to achieve a manual and fully automatic remote area low-altitude and landing capability. The project has resulted in a definition of system operational concept that includes a sensor subsystem, a sensor fusion/feature extraction capability, and a guidance and control law concept. These subsystem concepts have been developed to sufficient depth to enable further exploration within the NASA simulation environment, and to support programs leading to the flight test
- …