21 research outputs found
ALPINE : Active Link Prediction using Network Embedding
Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, consumer-product recommendations, and the identification of hidden interactions between actors in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network.
Often, the link status of a node pair can be queried, which can be used as additional information by the link prediction algorithm. Unfortunately, such queries can be expensive or time-consuming, mandating the careful consideration of which node pairs to query. In this paper we estimate the improvement in link prediction accuracy after querying any particular node pair, to use in an active learning setup.
Specifically, we propose ALPINE (Active Link Prediction usIng Network Embedding), the first method to achieve this for link prediction based on network embedding. To this end, we generalized the notion of V-optimality from experimental design to this setting, as well as more basic active learning heuristics originally developed in standard classification settings. Empirical results on real data show that ALPINE is scalable, and boosts link prediction accuracy with far fewer queries
S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification
This paper investigates the problem of active learning for binary label
prediction on a graph. We introduce a simple and label-efficient algorithm
called S2 for this task. At each step, S2 selects the vertex to be labeled
based on the structure of the graph and all previously gathered labels.
Specifically, S2 queries for the label of the vertex that bisects the *shortest
shortest* path between any pair of oppositely labeled vertices. We present a
theoretical estimate of the number of queries S2 needs in terms of a novel
parametrization of the complexity of binary functions on graphs. We also
present experimental results demonstrating the performance of S2 on both real
and synthetic data. While other graph-based active learning algorithms have
shown promise in practice, our algorithm is the first with both good
performance and theoretical guarantees. Finally, we demonstrate the
implications of the S2 algorithm to the theory of nonparametric active
learning. In particular, we show that S2 achieves near minimax optimal excess
risk for an important class of nonparametric classification problems.Comment: A version of this paper appears in the Conference on Learning Theory
(COLT) 201
FAST LEARNING ON GRAPHS
We carry out a systematic study of classification problems on networked data,
presenting novel techniques with good performance both in theory and in
practice.
We assess the power of node classification based on class-linkage information
only. In particular, we propose four new algorithms that exploit the
homiphilic bias (linked entities tend to belong to the same class) in different
ways.
The set of the algorithms we present covers diverse practical needs: some
of them operate in an active transductive setting and others in an on-line
transductive setting. A third group works within an explorative protocol,
in which the vertices of an unknown graph are progressively revealed to the
learner in an on-line fashion.
Within the mistake bound learning model, for each of our algorithms
we provide a rigorous theoretical analysis, together with an interpretation
of the obtained performance bounds. We also design adversarial strategies
achieving matching lower bounds. In particular, we prove optimality for all
input graphs and for all fixed regularity values of suitable labeling complexity
measures. We also analyze the computational requirements of our methods,
showing that our algorithms can to handle very large data sets.
In the case of the on-line protocol, for which we exhibit an optimal algorithm
with constant amortized time per prediction, we validate our theoretical
results carrying out experiments on real-world datasets
Active Nearest-Neighbor Learning in Metric Spaces
We propose a pool-based non-parametric active learning algorithm for general
metric spaces, called MArgin Regularized Metric Active Nearest Neighbor
(MARMANN), which outputs a nearest-neighbor classifier. We give prediction
error guarantees that depend on the noisy-margin properties of the input
sample, and are competitive with those obtained by previously proposed passive
learners. We prove that the label complexity of MARMANN is significantly lower
than that of any passive learner with similar error guarantees. MARMANN is
based on a generalized sample compression scheme, and a new label-efficient
active model-selection procedure