325,314 research outputs found
Advancing State of the Art in Language Modeling
Generalization is arguably the most important goal of statistical language
modeling research. Publicly available benchmarks and papers published with an
open-source code have been critical to advancing the field. However, it is
often very difficult, and sometimes even impossible, to reproduce the results
fully as reported in publications. In this paper, we propose a simple framework
that should help advance the state of the art in language modeling in terms of
generalization. We propose to publish not just the code, but also probabilities
on dev and test sets with future publications so that one can easily add the
new model into an ensemble. This has crucial advantages: it is much easier to
determine whether a newly proposed model is actually complementary to the
current baseline. Therefore, instead of inventing new names for the old tricks,
the scientific community can advance faster. Finally, this approach promotes
diversity of ideas: one does not need to create an individual model that is the
new state of the art to attract attention; it will be sufficient to develop a
new model that learns patterns which other models do not. Thus, even a
suboptimal model can be found to have value. Remarkably, our approach has
yielded new state-of-the-art results across various language modeling
benchmarks up to 10%
Neural Natural Language Inference Models Enhanced with External Knowledge
Modeling natural language inference is a very challenging task. With the
availability of large annotated data, it has recently become feasible to train
complex models such as neural-network-based inference models, which have shown
to achieve the state-of-the-art performance. Although there exist relatively
large annotated data, can machines learn all knowledge needed to perform
natural language inference (NLI) from these data? If not, how can
neural-network-based NLI models benefit from external knowledge and how to
build NLI models to leverage it? In this paper, we enrich the state-of-the-art
neural natural language inference models with external knowledge. We
demonstrate that the proposed models improve neural NLI models to achieve the
state-of-the-art performance on the SNLI and MultiNLI datasets.Comment: Accepted by ACL 201
Memory-Efficient Adaptive Optimization
Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for
achieving state-of-the-art performance in machine translation and language
modeling. However, these methods maintain second-order statistics for each
parameter, thus introducing significant memory overheads that restrict the size
of the model being used as well as the number of examples in a mini-batch. We
describe an effective and flexible adaptive optimization method with greatly
reduced memory overhead. Our method retains the benefits of per-parameter
adaptivity while allowing significantly larger models and batch sizes. We give
convergence guarantees for our method, and demonstrate its effectiveness in
training very large translation and language models with up to 2-fold speedups
compared to the state-of-the-art
- …