66 research outputs found

    Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

    Get PDF

    Auto-Encoding Variational Neural Machine Translation

    Get PDF
    We present a deep generative model of bilingual sentence pairs for machine translation. The model generates source and target sentences jointly from a shared latent representation and is parameterised by neural networks. We perform efficient training using amortised variational inference and reparameterised gradients. Additionally, we discuss the statistical implications of joint modelling and propose an efficient approximation to maximum a posteriori decoding for fast test-time predictions. We demonstrate the effectiveness of our model in three machine translation scenarios: in-domain training, mixed-domain training, and learning from a mix of gold-standard and synthetic data. Our experiments show consistently that our joint formulation outperforms conditional modelling (i.e. standard neural machine translation) in all such scenarios

    From general language understanding to noisy text comprehension

    Get PDF
    Obtaining meaning-rich representations of social media inputs, such as Tweets (unstructured and noisy text), from general-purpose pre-trained language models has become challenging, as these inputs typically deviate from mainstream English usage. The proposed research establishes effective methods for improving the comprehension of noisy texts. For this, we propose a new generic methodology to derive a diverse set of sentence vectors combining and extracting various linguistic characteristics from latent representations of multi-layer, pre-trained language models. Further, we clearly establish how BERT, a state-of-the-art pre-trained language model, comprehends the linguistic attributes of Tweets to identify appropriate sentence representations. Five new probing tasks are developed for Tweets, which can serve as benchmark probing tasks to study noisy text comprehension. Experiments are carried out for classification accuracy by deriving the sentence vectors from GloVe-based pre-trained models and Sentence-BERT, and by using different hidden layers from the BERT model. We show that the initial and middle layers of BERT have better capability for capturing the key linguistic characteristics of noisy texts than its latter layers. With complex predictive models, we further show that the sentence vector length has lesser importance to capture linguistic information, and the proposed sentence vectors for noisy texts perform better than the existing state-of-the-art sentence vectors. © 2021 by the authors. Licensee MDPI, Basel, Switzerland

    Language Modelling with Pixels

    Full text link
    Language models are defined over a finite set of inputs, which creates a vocabulary bottleneck when we attempt to scale the number of supported languages. Tackling this bottleneck results in a trade-off between what can be represented in the embedding matrix and computational issues in the output layer. This paper introduces PIXEL, the Pixel-based Encoder of Language, which suffers from neither of these issues. PIXEL is a pretrained language model that renders text as images, making it possible to transfer representations across languages based on orthographic similarity or the co-activation of pixels. PIXEL is trained to reconstruct the pixels of masked patches, instead of predicting a distribution over tokens. We pretrain the 86M parameter PIXEL model on the same English data as BERT and evaluate on syntactic and semantic tasks in typologically diverse languages, including various non-Latin scripts. We find that PIXEL substantially outperforms BERT on syntactic and semantic processing tasks on scripts that are not found in the pretraining data, but PIXEL is slightly weaker than BERT when working with Latin scripts. Furthermore, we find that PIXEL is more robust to noisy text inputs than BERT, further confirming the benefits of modelling language with pixels.Comment: work in progres

    Generating CCG Categories

    Full text link
    Previous CCG supertaggers usually predict categories using multi-class classification. Despite their simplicity, internal structures of categories are usually ignored. The rich semantics inside these structures may help us to better handle relations among categories and bring more robustness into existing supertaggers. In this work, we propose to generate categories rather than classify them: each category is decomposed into a sequence of smaller atomic tags, and the tagger aims to generate the correct sequence. We show that with this finer view on categories, annotations of different categories could be shared and interactions with sentence contexts could be enhanced. The proposed category generator is able to achieve state-of-the-art tagging (95.5% accuracy) and parsing (89.8% labeled F1) performances on the standard CCGBank. Furthermore, its performances on infrequent (even unseen) categories, out-of-domain texts and low resource language give promising results on introducing generation models to the general CCG analyses.Comment: Accepted by AAAI 202

    Neural Unsupervised Domain Adaptation in NLP—A Survey

    Get PDF
    Deep neural networks excel at learning from labeled data and achieve state-of-the-art results on a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early approaches in traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future intelligent NLP

    Investigating Entity Knowledge in BERT with Simple Neural End-To-End Entity Linking

    Full text link
    A typical architecture for end-to-end entity linking systems consists of three steps: mention detection, candidate generation and entity disambiguation. In this study we investigate the following questions: (a) Can all those steps be learned jointly with a model for contextualized text-representations, i.e. BERT (Devlin et al., 2019)? (b) How much entity knowledge is already contained in pretrained BERT? (c) Does additional entity knowledge improve BERT's performance in downstream tasks? To this end, we propose an extreme simplification of the entity linking setup that works surprisingly well: simply cast it as a per token classification over the entire entity vocabulary (over 700K classes in our case). We show on an entity linking benchmark that (i) this model improves the entity representations over plain BERT, (ii) that it outperforms entity linking architectures that optimize the tasks separately and (iii) that it only comes second to the current state-of-the-art that does mention detection and entity disambiguation jointly. Additionally, we investigate the usefulness of entity-aware token-representations in the text-understanding benchmark GLUE, as well as the question answering benchmarks SQUAD V2 and SWAG and also the EN-DE WMT14 machine translation benchmark. To our surprise, we find that most of those benchmarks do not benefit from additional entity knowledge, except for a task with very small training data, the RTE task in GLUE, which improves by 2%.Comment: Published at CoNLL 201

    Robust input representations for low-resource information extraction

    Get PDF
    Recent advances in the field of natural language processing were achieved with deep learning models. This led to a wide range of new research questions concerning the stability of such large-scale systems and their applicability beyond well-studied tasks and datasets, such as information extraction in non-standard domains and languages, in particular, in low-resource environments. In this work, we address these challenges and make important contributions across fields such as representation learning and transfer learning by proposing novel model architectures and training strategies to overcome existing limitations, including a lack of training resources, domain mismatches and language barriers. In particular, we propose solutions to close the domain gap between representation models by, e.g., domain-adaptive pre-training or our novel meta-embedding architecture for creating a joint representations of multiple embedding methods. Our broad set of experiments demonstrates state-of-the-art performance of our methods for various sequence tagging and classification tasks and highlight their robustness in challenging low-resource settings across languages and domains.Die jüngsten Fortschritte auf dem Gebiet der Verarbeitung natürlicher Sprache wurden mit Deep-Learning-Modellen erzielt. Dies führte zu einer Vielzahl neuer Forschungsfragen bezüglich der Stabilität solcher großen Systeme und ihrer Anwendbarkeit über gut untersuchte Aufgaben und Datensätze hinaus, wie z. B. die Informationsextraktion für Nicht-Standardsprachen, aber auch Textdomänen und Aufgaben, für die selbst im Englischen nur wenige Trainingsdaten zur Verfügung stehen. In dieser Arbeit gehen wir auf diese Herausforderungen ein und leisten wichtige Beiträge in Bereichen wie Repräsentationslernen und Transferlernen, indem wir neuartige Modellarchitekturen und Trainingsstrategien vorschlagen, um bestehende Beschränkungen zu überwinden, darunter fehlende Trainingsressourcen, ungesehene Domänen und Sprachbarrieren. Insbesondere schlagen wir Lösungen vor, um die Domänenlücke zwischen Repräsentationsmodellen zu schließen, z.B. durch domänenadaptives Vortrainieren oder unsere neuartige Meta-Embedding-Architektur zur Erstellung einer gemeinsamen Repräsentation mehrerer Embeddingmethoden. Unsere umfassende Evaluierung demonstriert die Leistungsfähigkeit unserer Methoden für verschiedene Klassifizierungsaufgaben auf Word und Satzebene und unterstreicht ihre Robustheit in anspruchsvollen, ressourcenarmen Umgebungen in verschiedenen Sprachen und Domänen
    • …
    corecore