6,572 research outputs found
A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models
Beam search is a desirable choice of test-time decoding algorithm for neural
sequence models because it potentially avoids search errors made by simpler
greedy methods. However, typical cross entropy training procedures for these
models do not directly consider the behaviour of the final decoding method. As
a result, for cross-entropy trained models, beam decoding can sometimes yield
reduced test performance when compared with greedy decoding. In order to train
models that can more effectively make use of beam search, we propose a new
training procedure that focuses on the final loss metric (e.g. Hamming loss)
evaluated on the output of beam search. While well-defined, this "direct loss"
objective is itself discontinuous and thus difficult to optimize. Hence, in our
approach, we form a sub-differentiable surrogate objective by introducing a
novel continuous approximation of the beam search decoding procedure. In
experiments, we show that optimizing this new training objective yields
substantially better results on two sequence tasks (Named Entity Recognition
and CCG Supertagging) when compared with both cross entropy trained greedy
decoding and cross entropy trained beam decoding baselines.Comment: Updated for clarity and notational consistenc
Gradient-based Inference for Networks with Output Constraints
Practitioners apply neural networks to increasingly complex problems in
natural language processing, such as syntactic parsing and semantic role
labeling that have rich output structures. Many such structured-prediction
problems require deterministic constraints on the output values; for example,
in sequence-to-sequence syntactic parsing, we require that the sequential
outputs encode valid trees. While hidden units might capture such properties,
the network is not always able to learn such constraints from the training data
alone, and practitioners must then resort to post-processing. In this paper, we
present an inference method for neural networks that enforces deterministic
constraints on outputs without performing rule-based post-processing or
expensive discrete search. Instead, in the spirit of gradient-based training,
we enforce constraints with gradient-based inference (GBI): for each input at
test-time, we nudge continuous model weights until the network's unconstrained
inference procedure generates an output that satisfies the constraints. We
study the efficacy of GBI on three tasks with hard constraints: semantic role
labeling, syntactic parsing, and sequence transduction. In each case, the
algorithm not only satisfies constraints but improves accuracy, even when the
underlying network is state-of-the-art.Comment: AAAI 201
- …