137 research outputs found

    SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences

    Full text link
    Distilling supervision signal from a long sequence to make predictions is a challenging task in machine learning, especially when not all elements in the input sequence contribute equally to the desired output. In this paper, we propose SpanDrop, a simple and effective data augmentation technique that helps models identify the true supervision signal in a long sequence with very few examples. By directly manipulating the input sequence, SpanDrop randomly ablates parts of the sequence at a time and ask the model to perform the same task to emulate counterfactual learning and achieve input attribution. Based on theoretical analysis of its properties, we also propose a variant of SpanDrop based on the beta-Bernoulli distribution, which yields diverse augmented sequences while providing a learning objective that is more consistent with the original dataset. We demonstrate the effectiveness of SpanDrop on a set of carefully designed toy tasks, as well as various natural language processing tasks that require reasoning over long sequences to arrive at the correct answer, and show that it helps models improve performance both when data is scarce and abundant.Comment: Peng Qi and Guangtao Wang contributed equall

    Mapping the tail fiber as the receptor binding protein responsible for differential host specificity of Pseudomonas aeruginosa bacteriophages PaP1 and JG004.

    Get PDF
    The first step in bacteriophage infection is recognition and binding to the host receptor, which is mediated by the phage receptor binding protein (RBP). Different RBPs can lead to differential host specificity. In many bacteriophages, such as Escherichia coli and Lactococcal phages, RBPs have been identified as the tail fiber or protruding baseplate proteins. However, the tail fiber-dependent host specificity in Pseudomonas aeruginosa phages has not been well studied. This study aimed to identify and investigate the binding specificity of the RBP of P. aeruginosa phages PaP1 and JG004. These two phages share high DNA sequence homology but exhibit different host specificities. A spontaneous mutant phage was isolated and exhibited broader host range compared with the parental phage JG004. Sequencing of its putative tail fiber and baseplate region indicated a single point mutation in ORF84 (a putative tail fiber gene), which resulted in the replacement of a positively charged lysine (K) by an uncharged asparagine (N). We further demonstrated that the replacement of the tail fiber gene (ORF69) of PaP1 with the corresponding gene from phage JG004 resulted in a recombinant phage that displayed altered host specificity. Our study revealed the tail fiber-dependent host specificity in P. aeruginosa phages and provided an effective tool for its alteration. These contributions may have potential value in phage therapy

    Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents

    Full text link
    Interpretable multi-hop reading comprehension (RC) over multiple documents is a challenging problem because it demands reasoning over multiple information sources and explaining the answer prediction by providing supporting evidences. In this paper, we propose an effective and interpretable Select, Answer and Explain (SAE) system to solve the multi-document RC problem. Our system first filters out answer-unrelated documents and thus reduce the amount of distraction information. This is achieved by a document classifier trained with a novel pairwise learning-to-rank loss. The selected answer-related documents are then input to a model to jointly predict the answer and supporting sentences. The model is optimized with a multi-task learning objective on both token level for answer prediction and sentence level for supporting sentences prediction, together with an attention-based interaction between these two tasks. Evaluated on HotpotQA, a challenging multi-hop RC data set, the proposed SAE system achieves top competitive performance in distractor setting compared to other existing systems on the leaderboard.Comment: Accepted to AAAI 202

    The progenitors of type Ia supernovae in the semidetached binaries with red giant donors

    Full text link
    Context. The companions of the exploding carbon-oxygen white dwarfs (CO WDs) for producing type Ia supernovae (SNe Ia) are still not conclusively confirmed. A red-giant (RG) star has been suggested to be the mass donor of the exploding WD, named as the symbiotic channel. However, previous studies on the this channel gave a relatively low rate of SNe Ia. Aims. We aim to systematically investigate the parameter space, Galactic rates and delay time distributions of SNe Ia from the symbiotic channel by employing a revised mass-transfer prescription. Methods. We adopted an integrated mass-transfer prescription to calculate the mass-transfer process from a RG star onto the WD. In this prescription, the mass-transfer rate varies with the local material states. Results. We evolved a large number of WD+RG systems, and found that the parameter space of WD+RG systems for producing SNe Ia is significantly enlarged. This channel could produce SNe Ia with intermediate and old ages, contributing to at most 5% of all SNe Ia in the Galaxy. Our model increases the SN Ia rate from this channel by a factor of 5. We suggest that the symbiotic systems RS Oph and T CrB are strong candidates for the progenitors of SNe Ia.Comment: 8 pages, 6 figure
    • …
    corecore