1,143 research outputs found
Gradient-based Inference for Networks with Output Constraints
Practitioners apply neural networks to increasingly complex problems in
natural language processing, such as syntactic parsing and semantic role
labeling that have rich output structures. Many such structured-prediction
problems require deterministic constraints on the output values; for example,
in sequence-to-sequence syntactic parsing, we require that the sequential
outputs encode valid trees. While hidden units might capture such properties,
the network is not always able to learn such constraints from the training data
alone, and practitioners must then resort to post-processing. In this paper, we
present an inference method for neural networks that enforces deterministic
constraints on outputs without performing rule-based post-processing or
expensive discrete search. Instead, in the spirit of gradient-based training,
we enforce constraints with gradient-based inference (GBI): for each input at
test-time, we nudge continuous model weights until the network's unconstrained
inference procedure generates an output that satisfies the constraints. We
study the efficacy of GBI on three tasks with hard constraints: semantic role
labeling, syntactic parsing, and sequence transduction. In each case, the
algorithm not only satisfies constraints but improves accuracy, even when the
underlying network is state-of-the-art.Comment: AAAI 201
Decomposition Methods for Large Scale LP Decoding
When binary linear error-correcting codes are used over symmetric channels, a
relaxed version of the maximum likelihood decoding problem can be stated as a
linear program (LP). This LP decoder can be used to decode error-correcting
codes at bit-error-rates comparable to state-of-the-art belief propagation (BP)
decoders, but with significantly stronger theoretical guarantees. However, LP
decoding when implemented with standard LP solvers does not easily scale to the
block lengths of modern error correcting codes. In this paper we draw on
decomposition methods from optimization theory, specifically the Alternating
Directions Method of Multipliers (ADMM), to develop efficient distributed
algorithms for LP decoding.
The key enabling technical result is a "two-slice" characterization of the
geometry of the parity polytope, which is the convex hull of all codewords of a
single parity check code. This new characterization simplifies the
representation of points in the polytope. Using this simplification, we develop
an efficient algorithm for Euclidean norm projection onto the parity polytope.
This projection is required by ADMM and allows us to use LP decoding, with all
its theoretical guarantees, to decode large-scale error correcting codes
efficiently.
We present numerical results for LDPC codes of lengths more than 1000. The
waterfall region of LP decoding is seen to initiate at a slightly higher
signal-to-noise ratio than for sum-product BP, however an error floor is not
observed for LP decoding, which is not the case for BP. Our implementation of
LP decoding using ADMM executes as fast as our baseline sum-product BP decoder,
is fully parallelizable, and can be seen to implement a type of message-passing
with a particularly simple schedule.Comment: 35 pages, 11 figures. An early version of this work appeared at the
49th Annual Allerton Conference, September 2011. This version to appear in
IEEE Transactions on Information Theor
Recommended from our members
Exact and Approximate Methods for Machine Translation Decoding
Statistical methods have been the major force driving the advance of machine translation in recent years. Complex models are designed to improve translation performance, but the added complexity also makes decoding more challenging. In this thesis, we focus on designing exact and approximate algorithms for machine translation decoding. More specifically, we will discuss the decoding problems for phrase-based translation models and bidirectional word alignment.
The techniques explored in this thesis are Lagrangian relaxation and local search. Lagrangian relaxation based algorithms give us exact methods that have formal guarantees while being efficient in practice. We study extensions to Lagrangian relaxation that improve the convergence rate on machine translation decoding problems. The extensions include a tightening technique that adds constraints incrementally, optimality-preserving pruning to manage the search space size and utilizing the bounding properties of Lagrangian relaxation to develop an exact beam search algorithm. In addition to having the potential to improve translation accuracy, exact decoding deepens our understanding of the model that we are using, since it separates model errors from optimization errors.
This leads to the question of designing models that improve the translation quality. We design a syntactic phrase-based model that incorporates a dependency language model to evaluate the fluency level of the target language. By employing local search, an approximate method, to decode this richer model, we discuss the trade-off between the complexity of a model and the decoding efficiency with the model
Exact decoding of phrase-based translation models through Lagrangian relaxation
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 69-72).This thesis describes two algorithms for exact decoding of phrase-based translation models, based on Lagrangian relaxation. Both methods recovers exact solutions, with certificates of optimality, on over 99% of test examples. The first method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our methods to MOSES [6], and give precise estimates of the number and magnitude of search errors that MOSES makes.by Yin-Wen Chang.S.M
Efficient Lagrangian relaxation algorithms for exact inference in natural language tasks
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-99).For many tasks in natural language processing, finding the best solution requires a search over a large set of possible structures. Solving these combinatorial search problems exactly can be inefficient, and so researchers often use approximate techniques at the cost of model accuracy. In this thesis, we turn to Lagrangian relaxation as an alternative to approximate inference in natural language tasks. We demonstrate that Lagrangian relaxation algorithms provide efficient solutions while still maintaining formal guarantees. The approach leads to inference algorithms with the following properties: " The resulting algorithms are simple and efficient, building on standard combinatorial algorithms for relaxed problems. " The algorithms provably solve a linear programming (LP) relaxation of the original inference problem. " Empirically, the relaxation often leads to an exact solution to the original problem. We develop Lagrangian relaxation algorithms for several important tasks in natural language processing including higher-order non-projective dependency parsing, syntactic machine translation, integrated constituency and dependency parsing, and part-of-speech tagging with inter-sentence constraints. For each of these tasks, we show that the Lagrangian relaxation algorithms are often significantly faster than exact methods while finding the exact solution with a certificate of optimality in the vast majority of examples.by Alexander M. Rush.S.M
- …