205 research outputs found

    AugCSE: contrastive sentence embedding with diverse augmentations

    Get PDF
    Data augmentation techniques have been proven useful in many applications in NLP fields. Most augmentations are task-specific, and cannot be used as a general-purpose tool. In our work, we present AugCSE, a unified framework to utilize diverse sets of data augmentations to achieve a better, general-purpose, sentence embedding model. Building upon the latest sentence embedding models, our approach uses a simple antagonistic discriminator that differentiates the augmentation types. With the finetuning objective borrowed from domain adaptation, we show that diverse augmentations, which often lead to conflicting contrastive signals, can be tamed to produce a better and more robust sentence representation. Our methods achieve state-of-the-art results on downstream transfer tasks and perform competitively on semantic textual similarity tasks, using only unsupervised data.000000000000000000000000000000000000000000000000000000010241 - University of California, Berkeleyhttps://aclanthology.org/2022.aacl-main.30/First author draf

    Multi-grained Evidence Inference for Multi-choice Reading Comprehension

    Full text link
    Multi-choice Machine Reading Comprehension (MRC) is a major and challenging task for machines to answer questions according to provided options. Answers in multi-choice MRC cannot be directly extracted in the given passages, and essentially require machines capable of reasoning from accurate extracted evidence. However, the critical evidence may be as simple as just one word or phrase, while it is hidden in the given redundant, noisy passage with multiple linguistic hierarchies from phrase, fragment, sentence until the entire passage. We thus propose a novel general-purpose model enhancement which integrates multi-grained evidence comprehensively, named Multi-grained evidence inferencer (Mugen), to make up for the inability. Mugen extracts three different granularities of evidence: coarse-, middle- and fine-grained evidence, and integrates evidence with the original passages, achieving significant and consistent performance improvement on four multi-choice MRC benchmarks.Comment: Accepted by TASLP 2023, vol. 31, pp. 3896-390

    FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning

    Full text link
    LLMs have demonstrated great capabilities in various NLP tasks. Different entities can further improve the performance of those LLMs on their specific downstream tasks by fine-tuning LLMs. When several entities have similar interested tasks, but their data cannot be shared because of privacy concerns regulations, federated learning (FL) is a mainstream solution to leverage the data of different entities. However, fine-tuning LLMs in federated learning settings still lacks adequate support from existing FL frameworks because it has to deal with optimizing the consumption of significant communication and computational resources, data preparation for different tasks, and distinct information protection demands. This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution, which consists of the following components: (1) we build an end-to-end benchmarking pipeline, automizing the processes of dataset preprocessing, federated fine-tuning execution, and performance evaluation on federated LLM fine-tuning; (2) we provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios with low communication and computation costs, even without accessing the full model; (3) we adopt several accelerating and resource-efficient operators for fine-tuning LLMs with limited resources and the flexible pluggable sub-routines for interdisciplinary study. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings, which also yields valuable insights into federated fine-tuning LLMs for the research community. To facilitate further research and adoption, we release FS-LLM at https://github.com/alibaba/FederatedScope/tree/llm.Comment: Source code: https://github.com/alibaba/FederatedScope/tree/ll

    CoLT5: Faster Long-Range Transformers with Conditional Computation

    Full text link
    Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.Comment: Added CoDA reference and minor edits to clarify routin

    Causal Reasoning of Entities and Events in Procedural Texts

    Full text link
    Entities and events are crucial to natural language reasoning and common in procedural texts. Existing work has focused either exclusively on entity state tracking (e.g., whether a pan is hot) or on event reasoning (e.g., whether one would burn themselves by touching the pan), while these two tasks are often causally related. We propose CREPE, the first benchmark on causal reasoning of event plausibility and entity states. We show that most language models, including GPT-3, perform close to chance at .35 F1, lagging far behind human at .87 F1. We boost model performance to .59 F1 by creatively representing events as programming languages while prompting language models pretrained on code. By injecting the causal relations between entities and events as intermediate reasoning steps in our representation, we further boost the performance to .67 F1. Our findings indicate not only the challenge that CREPE brings for language models, but also the efficacy of code-like prompting combined with chain-of-thought prompting for multihop event reasoning.Comment: In Findings of EACL 202

    Leveraging Feedback in Conversational Question Answering Systems

    Get PDF
    172 p.Tesi honen helburua martxan jarri eta geroko sistemek gizakiekin duten elkarregina erabiltzeada, gizakien feedbacka sistementzako ikasketa eta egokitzapen seinale bezala erabiliz.Elkarrizketa sistemek martxan jartzerakoan jasaten duten domeinu aldaketan jartzen dugufokua. Helburu honetarako, feedback bitar esplizituaren kasua aztertzen dugu, hau baitagizakientzat feedbacka emateko seinale errazena.Sistemak martxan jarri eta gero hobetzeko, lehenik eta behin DoQA izeneko galdera-erantzunmotako elkarriketez osatutako datu multzo bat eraiki dugu. Datu multzo honekcrowdsourcing bidez jasotako 2.437 dialogo ditu. Aurreko lanekin konparatuz gero, DoQAkbenetazko informazio beharrak islatzen ditu, datu multzo barneko elkarrizketak naturalagoaketa koherenteagoak izanik. Datu multzo sortu eta gero, feedback-weighted learning (FWL)izeneko algoritmo bat diseinatu dugu, feedback bitarra bakarrik erabiliz aurretikentrenatutako sistema gainbegiratu bat hobetzeko gai dena. Azkenik, algoritmo honen mugakaztertzen ditugu jasotako feedbacka zaratatsua den kasuetarako eta FWL moldatzen dugueszenatoki zaratsuari aurre egiteko. Kasu honetan lortzen ditugun emaitza negatiboakerakusten dute erabiltzaileetatik jasotako feedback zaratsua modelatzearen erronka, hauebaztea oraindik ikerkuntza galdera ireki bat delarik

    AutoMix: Automatically Mixing Language Models

    Full text link
    Large language models (LLMs) are now available in various sizes and configurations from cloud API providers. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present AutoMix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM. Central to AutoMix is a few-shot self-verification mechanism, which estimates the reliability of its own outputs without requiring training. Given that verifications can be noisy, we employ a meta verifier in AutoMix to refine the accuracy of these assessments. Our experiments using LLAMA2-13/70B, on five context-grounded reasoning datasets demonstrate that AutoMix surpasses established baselines, improving the incremental benefit per cost by up to 89%. Our code and data are available at https://github.com/automix-llm/automix.Comment: The first two authors contributed equally. Work started and partly done during Aman's internship at Google. This version adds results on mixing 3 models, and will be presented at the workshop on robustness of zero/few-shot learning in foundation models, Neurips 202

    Machine Reading Comprehension using Case-based Reasoning

    Full text link
    We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC) builds on the hypothesis that contextualized answers to similar questions share semantic similarities with each other. Given a target question, CBR-MRC retrieves a set of similar questions from a memory of observed cases and predicts an answer by selecting the span in the target context that is most similar to the contextualized representations of answers in the retrieved cases. The semi-parametric nature of our approach allows CBR-MRC to attribute a prediction to the specific set of cases used during inference, making it a desirable choice for building reliable and debuggable QA systems. We show that CBR-MRC achieves high test accuracy comparable with large reader models, outperforming baselines by 11.5 and 8.4 EM on NaturalQuestions and NewsQA, respectively. Further, we also demonstrate the ability of CBR-MRC in identifying not just the correct answer tokens but also the span with the most relevant supporting evidence. Lastly, we observe that contexts for certain question types show higher lexical diversity than others and find CBR-MRC to be robust to these variations while performance using fully-parametric methods drops.Comment: 9 pages, 2 figure
    corecore