Towards human-like compositional generalization with neural models

Abstract

The human language system exhibits systematic compositionality: the ability to produce and understand a potentially infinite number of novel linguistic expressions by systematically combining known atomic components. This type of systematic compositionality is central to the human ability to learn from limited data and make compositional generalizations. There has been a long-standing debate whether this systematicity can be captured by connectionist architectures. Recent years have witnessed a resurgence of interest in this problem with the revival of neural networks. In particular, neural sequence-to-sequence models, as a powerful workhorse of natural language processing (NLP), have been successfully applied to various NLP tasks. However, despite widespread adoption, there is mounting evidence that neural sequence-to-sequence models are deficient in compositional generalization. In this thesis, we investigate the problem of how to improve compositional generalization of neural sequence-to-sequence models in pursuit of building systems with human-like systematic compositionality. First, assuming that connectionist architectures are fundamentally incapable of acquiring this systematic compositionality which is, in contrast, an inherent part of symbolic (e.g., grammar-based) systems, we attempt to marry symbolic structure with neural models to combine the best of both worlds. We present a two-stage decoding strategy to augment neural sequence-to-sequence models (connectionist architecture) with semantic tagging (symbolic structure), in which an input utterance is tagged with semantic symbols representing the meaning of individual words. Experimental results demonstrate that our framework improves compositional generation for semantic parsing across datasets and model architectures. Secondly, despite superior compositional generalization, it has not yet been empirically established that symbolic models are appropriate for handling the noise and complexity of natural language, as evidenced by their sub-par performance in practical applications. Therefore, tackling compositional generalization via pure architectural modification has the potential to maintain the robustness and flexibility of neural models required to process real language. We thus attempt to devise a more competent neural model than standard sequence-to-sequence models for compositional generalization. To approach this problem, we design Dangle, a new neural network architecture for sequence-to-sequence modeling to learn more disentangled representations for better compositional generalization compared to the Transformer model. Empirical results on both semantic parsing and machine translation verify that our proposal leads to more disentangled representations and better generalization, outperforming competitive baselines and more specialized techniques. Previously, we assess the proposed model on synthetic benchmarks to isolate compositional generalization. However, real-world settings involve both complex natural language and compositional generalization. We thus move on to apply disentangled sequence-to-sequence models to real-world compositional generalization challenges. Before doing so, we first propose a methodology for identifying compositional patterns in real-world data and create a new machine translation benchmark that better represents practical generalization requirements than existing artificial challenges. Then we introduce two key modifications to Dangle which encourage learning more disentangled representations more efficiently. We evaluate the proposed model on existing real-world benchmarks and the benchmark created in this thesis. Experimental results demonstrate that our new architecture achieves better generalization performance across tasks and datasets and is adept at handling real-world challenges

    Similar works