192 research outputs found
Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder
Human annotation for syntactic parsing is expensive, and large resources are
available only for a fraction of languages. A question we ask is whether one
can leverage abundant unlabeled texts to improve syntactic parsers, beyond just
using the texts to obtain more generalisable lexical features (i.e. beyond word
embeddings). To this end, we propose a novel latent-variable generative model
for semi-supervised syntactic dependency parsing. As exact inference is
intractable, we introduce a differentiable relaxation to obtain approximate
samples and compute gradients with respect to the parser parameters. Our method
(Differentiable Perturb-and-Parse) relies on differentiable dynamic programming
over stochastically perturbed edge scores. We demonstrate effectiveness of our
approach with experiments on English, French and Swedish.Comment: Accepted at ICLR 201
LAST: Scalable Lattice-Based Speech Modelling in JAX
We introduce LAST, a LAttice-based Speech Transducer library in JAX. With an
emphasis on flexibility, ease-of-use, and scalability, LAST implements
differentiable weighted finite state automaton (WFSA) algorithms needed for
training \& inference that scale to a large WFSA such as a recognition lattice
over the entire utterance. Despite these WFSA algorithms being well-known in
the literature, new challenges arise from performance characteristics of modern
architectures, and from nuances in automatic differentiation. We describe a
suite of generally applicable techniques employed in LAST to address these
challenges, and demonstrate their effectiveness with benchmarks on TPUv3 and
V100 GPU
Simple Hardware-Efficient PCFGs with Independent Left and Right Productions
Scaling dense PCFGs to thousands of nonterminals via a low-rank
parameterization of the rule probability tensor has been shown to be beneficial
for unsupervised parsing. However, PCFGs scaled this way still perform poorly
as a language model, and even underperform similarly-sized HMMs. This work
introduces \emph{SimplePCFG}, a simple PCFG formalism with independent left and
right productions. Despite imposing a stronger independence assumption than the
low-rank approach, we find that this formalism scales more effectively both as
a language model and as an unsupervised parser. As an unsupervised parser, our
simple PCFG obtains an average F1 of 65.1 on the English PTB, and as a language
model, it obtains a perplexity of 119.0, outperforming similarly-sized low-rank
PCFGs. We further introduce \emph{FlashInside}, a hardware IO-aware
implementation of the inside algorithm for efficiently scaling simple PCFGs.Comment: Accepted to Findings of EMNLP, 202
Recommended from our members
Algorithms for Optimal Paths of One, Many, and an Infinite Number of Agents
In this dissertation, we provide efficient algorithms for modeling the behavior of a single agent, multiple agents, and a continuum of agents. For a single agent, we combine the modeling framework of optimal control with advances in optimization splitting in order to efficiently find optimal paths for problems in very high-dimensions, thus providing alleviation from the curse of dimensionality. For a multiple, but finite, number of agents, we take the framework of multi-agent reinforcement learning and utilize imitation learning in order to decentralize a centralized expert, thus obtaining optimal multi-agents that act in a decentralized fashion. For a continuum of agents, we take the framework of mean-field games and use two neural networks, which we train in an alternating scheme, in order to efficiently find optimal paths for high-dimensional and stochastic problems. These tools cover a wide variety of use-cases that can be immediately deployed for practical applications
On the relationship between predictive coding and backpropagation
Artificial neural networks are often interpreted as abstract models of
biological neuronal networks, but they are typically trained using the
biologically unrealistic backpropagation algorithm and its variants. Predictive
coding has been offered as a potentially more biologically realistic
alternative to backpropagation for training neural networks. In this
manuscript, I review and extend recent work on the mathematical relationship
between predictive coding and backpropagation for training feedforward
artificial neural networks on supervised learning tasks. I discuss some
implications of these results for the interpretation of predictive coding and
deep neural networks as models of biological learning and I describe a
repository of functions, Torch2PC, for performing predictive coding with
PyTorch neural network models
- …