4,647 research outputs found
LASAGNE: Locality And Structure Aware Graph Node Embedding
In this work we propose Lasagne, a methodology to learn locality and
structure aware graph node embeddings in an unsupervised way. In particular, we
show that the performance of existing random-walk based approaches depends
strongly on the structural properties of the graph, e.g., the size of the
graph, whether the graph has a flat or upward-sloping Network Community Profile
(NCP), whether the graph is expander-like, whether the classes of interest are
more k-core-like or more peripheral, etc. For larger graphs with flat NCPs that
are strongly expander-like, existing methods lead to random walks that expand
rapidly, touching many dissimilar nodes, thereby leading to lower-quality
vector representations that are less useful for downstream tasks. Rather than
relying on global random walks or neighbors within fixed hop distances, Lasagne
exploits strongly local Approximate Personalized PageRank stationary
distributions to more precisely engineer local information into node
embeddings. This leads, in particular, to more meaningful and more useful
vector representations of nodes in poorly-structured graphs. We show that
Lasagne leads to significant improvement in downstream multi-label
classification for larger graphs with flat NCPs, that it is comparable for
smaller graphs with upward-sloping NCPs, and that is comparable to existing
methods for link prediction tasks
Randomisation and Derandomisation in Descriptive Complexity Theory
We study probabilistic complexity classes and questions of derandomisation
from a logical point of view. For each logic L we introduce a new logic BPL,
bounded error probabilistic L, which is defined from L in a similar way as the
complexity class BPP, bounded error probabilistic polynomial time, is defined
from PTIME. Our main focus lies on questions of derandomisation, and we prove
that there is a query which is definable in BPFO, the probabilistic version of
first-order logic, but not in Cinf, finite variable infinitary logic with
counting. This implies that many of the standard logics of finite model theory,
like transitive closure logic and fixed-point logic, both with and without
counting, cannot be derandomised. Similarly, we present a query on ordered
structures which is definable in BPFO but not in monadic second-order logic,
and a query on additive structures which is definable in BPFO but not in FO.
The latter of these queries shows that certain uniform variants of AC0
(bounded-depth polynomial sized circuits) cannot be derandomised. These results
are in contrast to the general belief that most standard complexity classes can
be derandomised. Finally, we note that BPIFP+C, the probabilistic version of
fixed-point logic with counting, captures the complexity class BPP, even on
unordered structures
Regular decomposition of large graphs and other structures: scalability and robustness towards missing data
A method for compression of large graphs and matrices to a block structure is
further developed. Szemer\'edi's regularity lemma is used as a generic
motivation of the significance of stochastic block models. Another ingredient
of the method is Rissanen's minimum description length principle (MDL). We
continue our previous work on the subject, considering cases of missing data
and scaling of algorithms to extremely large size of graphs. In this way it
would be possible to find out a large scale structure of a huge graphs of
certain type using only a tiny part of graph information and obtaining a
compact representation of such graphs useful in computations and visualization.Comment: Accepted for publication in: Fourth International Workshop on High
Performance Big Graph Data Management, Analysis, and Mining, December 11,
2017, Bosto U.S.
Combining semantic and syntactic structure for language modeling
Structured language models for speech recognition have been shown to remedy
the weaknesses of n-gram models. All current structured language models are,
however, limited in that they do not take into account dependencies between
non-headwords. We show that non-headword dependencies contribute to
significantly improved word error rate, and that a data-oriented parsing model
trained on semantically and syntactically annotated data can exploit these
dependencies. This paper also contains the first DOP model trained by means of
a maximum likelihood reestimation procedure, which solves some of the
theoretical shortcomings of previous DOP models.Comment: 4 page
- …