1,182 research outputs found

    Semi-supervised sequence tagging with bidirectional language models

    Full text link
    Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively little labeled data. In this paper, we demonstrate a general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks. We evaluate our model on two standard datasets for named entity recognition (NER) and chunking, and in both cases achieve state of the art results, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.Comment: To appear in ACL 201

    Efficient Hierarchical Domain Adaptation for Pretrained Language Models

    Full text link
    The remarkable success of large language models has been driven by dense models trained on massive unlabeled, unstructured corpora. These corpora typically contain text from diverse, heterogeneous sources, but information about the source of the text is rarely used during training. Transferring their knowledge to a target domain is typically done by continuing training in-domain. In this paper, we introduce a method to permit domain adaptation to many diverse domains using a computationally efficient adapter approach. Our method is based on the observation that textual domains are partially overlapping, and we represent domains as a hierarchical tree structure where each node in the tree is associated with a set of adapter weights. When combined with a frozen pretrained language model, this approach enables parameter sharing among related domains, while avoiding negative interference between unrelated ones. Experimental results with GPT-2 and a large fraction of the 100 most represented websites in C4 show across-the-board improvements in-domain. We additionally provide an inference time algorithm for a held-out domain and show that averaging over multiple paths through the tree enables further gains in generalization, while adding only a marginal cost to inference.Comment: NAACL 2022 accepted paper camera ready versio

    AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

    Full text link
    Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results. We explore various approaches for choosing which adapters to combine, such as text clustering and semantic similarity. We find that using clustering leads to the most competitive results on novel domains.Comment: Accepted at EACL 2023; camera-ready versio

    Combining epidemiology with basic biology of sand flies, parasites, and hosts to inform leishmaniasis transmission dynamics and control.

    Get PDF
    Quantitation of the nonlinear heterogeneities in Leishmania parasites, sand fly vectors, and mammalian host relationships provides insights to better understand leishmanial transmission epidemiology towards improving its control. The parasite manipulates the sand fly via production of promastigote secretory gel (PSG), leading to the "blocked sand fly" phenotype, persistent feeding attempts, and feeding on multiple hosts. PSG is injected into the mammalian host with the parasite and promotes the establishment of infection. Animal models demonstrate that sand flies with the highest parasite loads and percent metacyclic promastigotes transmit more parasites with greater frequency, resulting in higher load infections that are more likely to be both symptomatic and efficient reservoirs. The existence of mammalian and sand fly "super-spreaders" provides a biological basis for the spatial and temporal clustering of clinical leishmanial disease. Sand fly blood-feeding behavior will determine the efficacies of indoor residual spraying, topical insecticides, and bed nets. Interventions need to have sufficient coverage to include transmission hot spots, especially in the absence of field tools to assess infectiousness. Interventions that reduce sand fly densities in the absence of elimination could have negative consequences, for example, by interfering with partial immunity conferred by exposure to sand fly saliva. A deeper understanding of both sand fly and host biology and behavior is essential to ensuring effectiveness of vector interventions
    • …
    corecore