1,383 research outputs found
Factorized Graph Representations for Semi-Supervised Learning from Sparse Data
Node classification is an important problem in graph data management. It is
commonly solved by various label propagation methods that work iteratively
starting from a few labeled seed nodes. For graphs with arbitrary
compatibilities between classes, these methods crucially depend on knowing the
compatibility matrix that must be provided by either domain experts or
heuristics. Can we instead directly estimate the correct compatibilities from a
sparsely labeled graph in a principled and scalable way? We answer this
question affirmatively and suggest a method called distant compatibility
estimation that works even on extremely sparsely labeled graphs (e.g., 1 in
10,000 nodes is labeled) in a fraction of the time it later takes to label the
remaining nodes. Our approach first creates multiple factorized graph
representations (with size independent of the graph) and then performs
estimation on these smaller graph sketches. We define algebraic amplification
as the more general idea of leveraging algebraic properties of an algorithm's
update equations to amplify sparse signals. We show that our estimator is by
orders of magnitude faster than an alternative approach and that the end-to-end
classification accuracy is comparable to using gold standard compatibilities.
This makes it a cheap preprocessing step for any existing label propagation
method and removes the current dependence on heuristics.Comment: SIGMOD 2020 (Extended version
Deep Learning of Representations: Looking Forward
Deep learning research aims at discovering learning algorithms that discover
multiple levels of distributed representations, with higher levels representing
more abstract concepts. Although the study of deep learning has already led to
impressive theoretical results, learning algorithms and breakthrough
experiments, several challenges lie ahead. This paper proposes to examine some
of these challenges, centering on the questions of scaling deep learning
algorithms to much larger models and datasets, reducing optimization
difficulties due to ill-conditioning or local minima, designing more efficient
and powerful inference and sampling procedures, and learning to disentangle the
factors of variation underlying the observed data. It also proposes a few
forward-looking research directions aimed at overcoming these challenges
- …