9,736 research outputs found
Evaluation of Computational Grammar Formalisms for Indian Languages
Natural Language Parsing has been the most prominent research area since the genesis of Natural Language Processing. Probabilistic Parsers are being developed to make the process of parser development much easier, accurate and fast. In Indian context, identification of which Computational Grammar Formalism is to be used is still a question which needs to be answered. In this paper we focus on this problem and try to analyze different formalisms for Indian languages
Packed rules for automatic transfer-rule induction
We present a method of encoding transfer rules in a highly efficient packed structure using contextualized constraints (Maxwell and Kaplan, 1991), an existing method of encoding
adopted from LFG parsing (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001). The packed representation allows us to encode O(2n) transfer rules in a single packed
representation only requiring O(n) storage space. Besides reducing space requirements, the representation also has a high impact on the amount of time taken to load large numbers of transfer rules to memory with very little trade-off in time needed to unpack the rules. We include an experimental evaluation which shows a considerable reduction in space and time requirements for a large set of automatically induced transfer rules by storing the rules in the packed representation
Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance
Models for natural language understanding (NLU) tasks often rely on the
idiosyncratic biases of the dataset, which make them brittle against test cases
outside the training distribution. Recently, several proposed debiasing methods
are shown to be very effective in improving out-of-distribution performance.
However, their improvements come at the expense of performance drop when models
are evaluated on the in-distribution data, which contain examples with higher
diversity. This seemingly inevitable trade-off may not tell us much about the
changes in the reasoning and understanding capabilities of the resulting models
on broader types of examples beyond the small subset represented in the
out-of-distribution data. In this paper, we address this trade-off by
introducing a novel debiasing method, called confidence regularization, which
discourage models from exploiting biases while enabling them to receive enough
incentive to learn from all the training examples. We evaluate our method on
three NLU tasks and show that, in contrast to its predecessors, it improves the
performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset)
while maintaining the original in-distribution accuracy.Comment: to appear at ACL 202
- …