1,182 research outputs found
Semi-supervised sequence tagging with bidirectional language models
Pre-trained word embeddings learned from unlabeled text have become a
standard component of neural network architectures for NLP tasks. However, in
most cases, the recurrent network that operates on word-level representations
to produce context sensitive representations is trained on relatively little
labeled data. In this paper, we demonstrate a general semi-supervised approach
for adding pre- trained context embeddings from bidirectional language models
to NLP systems and apply it to sequence labeling tasks. We evaluate our model
on two standard datasets for named entity recognition (NER) and chunking, and
in both cases achieve state of the art results, surpassing previous systems
that use other forms of transfer or joint learning with additional labeled data
and task specific gazetteers.Comment: To appear in ACL 201
Efficient Hierarchical Domain Adaptation for Pretrained Language Models
The remarkable success of large language models has been driven by dense
models trained on massive unlabeled, unstructured corpora. These corpora
typically contain text from diverse, heterogeneous sources, but information
about the source of the text is rarely used during training. Transferring their
knowledge to a target domain is typically done by continuing training
in-domain. In this paper, we introduce a method to permit domain adaptation to
many diverse domains using a computationally efficient adapter approach. Our
method is based on the observation that textual domains are partially
overlapping, and we represent domains as a hierarchical tree structure where
each node in the tree is associated with a set of adapter weights. When
combined with a frozen pretrained language model, this approach enables
parameter sharing among related domains, while avoiding negative interference
between unrelated ones. Experimental results with GPT-2 and a large fraction of
the 100 most represented websites in C4 show across-the-board improvements
in-domain. We additionally provide an inference time algorithm for a held-out
domain and show that averaging over multiple paths through the tree enables
further gains in generalization, while adding only a marginal cost to
inference.Comment: NAACL 2022 accepted paper camera ready versio
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
Pretrained language models (PLMs) are trained on massive corpora, but often
need to specialize to specific domains. A parameter-efficient adaptation method
suggests training an adapter for each domain on the task of language modeling.
This leads to good in-domain scores but can be impractical for domain- or
resource-restricted settings. A solution is to use a related-domain adapter for
the novel domain at test time. In this paper, we introduce AdapterSoup, an
approach that performs weight-space averaging of adapters trained on different
domains. Our approach is embarrassingly parallel: first, we train a set of
domain-specific adapters; then, for each novel domain, we determine which
adapters should be averaged at test time. We present extensive experiments
showing that AdapterSoup consistently improves performance to new domains
without extra training. We also explore weight averaging of adapters trained on
the same domain with different hyper-parameters, and show that it preserves the
performance of a PLM on new domains while obtaining strong in-domain results.
We explore various approaches for choosing which adapters to combine, such as
text clustering and semantic similarity. We find that using clustering leads to
the most competitive results on novel domains.Comment: Accepted at EACL 2023; camera-ready versio
Combining epidemiology with basic biology of sand flies, parasites, and hosts to inform leishmaniasis transmission dynamics and control.
Quantitation of the nonlinear heterogeneities in Leishmania parasites, sand fly vectors, and mammalian host relationships provides insights to better understand leishmanial transmission epidemiology towards improving its control. The parasite manipulates the sand fly via production of promastigote secretory gel (PSG), leading to the "blocked sand fly" phenotype, persistent feeding attempts, and feeding on multiple hosts. PSG is injected into the mammalian host with the parasite and promotes the establishment of infection. Animal models demonstrate that sand flies with the highest parasite loads and percent metacyclic promastigotes transmit more parasites with greater frequency, resulting in higher load infections that are more likely to be both symptomatic and efficient reservoirs. The existence of mammalian and sand fly "super-spreaders" provides a biological basis for the spatial and temporal clustering of clinical leishmanial disease. Sand fly blood-feeding behavior will determine the efficacies of indoor residual spraying, topical insecticides, and bed nets. Interventions need to have sufficient coverage to include transmission hot spots, especially in the absence of field tools to assess infectiousness. Interventions that reduce sand fly densities in the absence of elimination could have negative consequences, for example, by interfering with partial immunity conferred by exposure to sand fly saliva. A deeper understanding of both sand fly and host biology and behavior is essential to ensuring effectiveness of vector interventions
- …