4 research outputs found
Learning Representations using Spectral-Biased Random Walks on Graphs
Several state-of-the-art neural graph embedding methods are based on short
random walks (stochastic processes) because of their ease of computation,
simplicity in capturing complex local graph properties, scalability, and
interpretibility. In this work, we are interested in studying how much a
probabilistic bias in this stochastic process affects the quality of the nodes
picked by the process. In particular, our biased walk, with a certain
probability, favors movement towards nodes whose neighborhoods bear a
structural resemblance to the current node's neighborhood. We succinctly
capture this neighborhood as a probability measure based on the spectrum of the
node's neighborhood subgraph represented as a normalized laplacian matrix. We
propose the use of a paragraph vector model with a novel Wasserstein
regularization term. We empirically evaluate our approach against several
state-of-the-art node embedding techniques on a wide variety of real-world
datasets and demonstrate that our proposed method significantly improves upon
existing methods on both link prediction and node classification tasks.Comment: Accepted at IJCNN 2020: International Joint Conference on Neural
Network
Efficient Training on Very Large Corpora via Gramian Estimation
We study the problem of learning similarity functions over very large corpora
using neural network embedding models. These models are typically trained using
SGD with sampling of random observed and unobserved pairs, with a number of
samples that grows quadratically with the corpus size, making it expensive to
scale to very large corpora. We propose new efficient methods to train these
models without having to sample unobserved pairs. Inspired by matrix
factorization, our approach relies on adding a global quadratic penalty to all
pairs of examples and expressing this term as the matrix-inner-product of two
generalized Gramians. We show that the gradient of this term can be efficiently
computed by maintaining estimates of the Gramians, and develop variance
reduction schemes to improve the quality of the estimates. We conduct
large-scale experiments that show a significant improvement in training time
and generalization quality compared to traditional sampling methods
Learning causal representations for robust domain adaptation
Domain adaptation solves the learning problem in a target domain by
leveraging the knowledge in a relevant source domain. While remarkable advances
have been made, almost all existing domain adaptation methods heavily require
large amounts of unlabeled target domain data for learning domain invariant
representations to achieve good generalizability on the target domain. In fact,
in many real-world applications, target domain data may not always be
available. In this paper, we study the cases where at the training phase the
target domain data is unavailable and only well-labeled source domain data is
available, called robust domain adaptation. To tackle this problem, under the
assumption that causal relationships between features and the class variable
are robust across domains, we propose a novel Causal AutoEncoder (CAE), which
integrates deep autoencoder and causal structure learning into a unified model
to learn causal representations only using data from a single source domain.
Specifically, a deep autoencoder model is adopted to learn low-dimensional
representations, and a causal structure learning model is designed to separate
the low-dimensional representations into two groups: causal representations and
task-irrelevant representations. Using three real-world datasets the extensive
experiments have validated the effectiveness of CAE compared to eleven
state-of-the-art methods
The Rediscovery Hypothesis: Language Models Need to Meet Linguistics
There is an ongoing debate in the NLP community whether modern language
models contain linguistic knowledge, recovered through so-called
\textit{probes}. In this paper we study whether linguistic knowledge is a
necessary condition for good performance of modern language models, which we
call the \textit{rediscovery hypothesis}.
In the first place we show that language models that are significantly
compressed but perform well on their pretraining objectives retain good scores
when probed for linguistic structures. This result supports the rediscovery
hypothesis and leads to the second contribution of our paper: an
information-theoretic framework that relates language modeling objective with
linguistic information. This framework also provides a metric to measure the
impact of linguistic information on the word prediction task. We reinforce our
analytical results with various experiments, both on synthetic and on real
tasks