7,683 research outputs found
Propagation Kernels
We introduce propagation kernels, a general graph-kernel framework for
efficiently measuring the similarity of structured data. Propagation kernels
are based on monitoring how information spreads through a set of given graphs.
They leverage early-stage distributions from propagation schemes such as random
walks to capture structural information encoded in node labels, attributes, and
edge information. This has two benefits. First, off-the-shelf propagation
schemes can be used to naturally construct kernels for many graph types,
including labeled, partially labeled, unlabeled, directed, and attributed
graphs. Second, by leveraging existing efficient and informative propagation
schemes, propagation kernels can be considerably faster than state-of-the-art
approaches without sacrificing predictive performance. We will also show that
if the graphs at hand have a regular structure, for instance when modeling
image or video data, one can exploit this regularity to scale the kernel
computation to large databases of graphs with thousands of nodes. We support
our contributions by exhaustive experiments on a number of real-world graphs
from a variety of application domains
On invariant Schreier structures
Schreier graphs, which possess both a graph structure and a Schreier
structure (an edge-labeling by the generators of a group), are objects of
fundamental importance in group theory and geometry. We study the Schreier
structures with which unlabeled graphs may be endowed, with emphasis on
structures which are invariant in some sense (e.g. conjugation-invariant, or
sofic). We give proofs of a number of "folklore" results, such as that every
regular graph of even degree admits a Schreier structure, and show that, under
mild assumptions, the space of invariant Schreier structures over a given
invariant graph structure is very large, in that it contains uncountably many
ergodic measures. Our work is directly connected to the theory of invariant
random subgroups, a field which has recently attracted a great deal of
attention.Comment: 16 pages, added references and figure, to appear in L'Enseignement
Mathematiqu
Asymmetry and structural information in preferential attachment graphs
Graph symmetries intervene in diverse applications, from enumeration, to
graph structure compression, to the discovery of graph dynamics (e.g., node
arrival order inference). Whereas Erd\H{o}s-R\'enyi graphs are typically
asymmetric, real networks are highly symmetric. So a natural question is
whether preferential attachment graphs, where in each step a new node with
edges is added, exhibit any symmetry. In recent work it was proved that
preferential attachment graphs are symmetric for , and there is some
non-negligible probability of symmetry for . It was conjectured that these
graphs are asymmetric when . We settle this conjecture in the
affirmative, then use it to estimate the structural entropy of the model. To do
this, we also give bounds on the number of ways that the given graph structure
could have arisen by preferential attachment. These results have further
implications for information theoretic problems of interest on preferential
attachment graphs.Comment: 24 pages; to appear in Random Structures & Algorithm
Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains
There has been increased interest in devising learning techniques that
combine unlabeled data with labeled data ? i.e. semi-supervised learning.
However, to the best of our knowledge, no study has been performed across
various techniques and different types and amounts of labeled and unlabeled
data. Moreover, most of the published work on semi-supervised learning
techniques assumes that the labeled and unlabeled data come from the same
distribution. It is possible for the labeling process to be associated with a
selection bias such that the distributions of data points in the labeled and
unlabeled sets are different. Not correcting for such bias can result in biased
function approximation with potentially poor performance. In this paper, we
present an empirical study of various semi-supervised learning techniques on a
variety of datasets. We attempt to answer various questions such as the effect
of independence or relevance amongst features, the effect of the size of the
labeled and unlabeled sets and the effect of noise. We also investigate the
impact of sample-selection bias on the semi-supervised learning techniques
under study and implement a bivariate probit technique particularly designed to
correct for such bias
Graph-based Semi-Supervised & Active Learning for Edge Flows
We present a graph-based semi-supervised learning (SSL) method for learning
edge flows defined on a graph. Specifically, given flow measurements on a
subset of edges, we want to predict the flows on the remaining edges. To this
end, we develop a computational framework that imposes certain constraints on
the overall flows, such as (approximate) flow conservation. These constraints
render our approach different from classical graph-based SSL for vertex labels,
which posits that tightly connected nodes share similar labels and leverages
the graph structure accordingly to extrapolate from a few vertex labels to the
unlabeled vertices. We derive bounds for our method's reconstruction error and
demonstrate its strong performance on synthetic and real-world flow networks
from transportation, physical infrastructure, and the Web. Furthermore, we
provide two active learning algorithms for selecting informative edges on which
to measure flow, which has applications for optimal sensor deployment. The
first strategy selects edges to minimize the reconstruction error bound and
works well on flows that are approximately divergence-free. The second approach
clusters the graph and selects bottleneck edges that cross cluster-boundaries,
which works well on flows with global trends
- …