139 research outputs found
SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences
Distilling supervision signal from a long sequence to make predictions is a
challenging task in machine learning, especially when not all elements in the
input sequence contribute equally to the desired output. In this paper, we
propose SpanDrop, a simple and effective data augmentation technique that helps
models identify the true supervision signal in a long sequence with very few
examples. By directly manipulating the input sequence, SpanDrop randomly
ablates parts of the sequence at a time and ask the model to perform the same
task to emulate counterfactual learning and achieve input attribution. Based on
theoretical analysis of its properties, we also propose a variant of SpanDrop
based on the beta-Bernoulli distribution, which yields diverse augmented
sequences while providing a learning objective that is more consistent with the
original dataset. We demonstrate the effectiveness of SpanDrop on a set of
carefully designed toy tasks, as well as various natural language processing
tasks that require reasoning over long sequences to arrive at the correct
answer, and show that it helps models improve performance both when data is
scarce and abundant.Comment: Peng Qi and Guangtao Wang contributed equall
Mapping the tail fiber as the receptor binding protein responsible for differential host specificity of Pseudomonas aeruginosa bacteriophages PaP1 and JG004.
The first step in bacteriophage infection is recognition and binding to the host receptor, which is mediated by the phage receptor binding protein (RBP). Different RBPs can lead to differential host specificity. In many bacteriophages, such as Escherichia coli and Lactococcal phages, RBPs have been identified as the tail fiber or protruding baseplate proteins. However, the tail fiber-dependent host specificity in Pseudomonas aeruginosa phages has not been well studied. This study aimed to identify and investigate the binding specificity of the RBP of P. aeruginosa phages PaP1 and JG004. These two phages share high DNA sequence homology but exhibit different host specificities. A spontaneous mutant phage was isolated and exhibited broader host range compared with the parental phage JG004. Sequencing of its putative tail fiber and baseplate region indicated a single point mutation in ORF84 (a putative tail fiber gene), which resulted in the replacement of a positively charged lysine (K) by an uncharged asparagine (N). We further demonstrated that the replacement of the tail fiber gene (ORF69) of PaP1 with the corresponding gene from phage JG004 resulted in a recombinant phage that displayed altered host specificity. Our study revealed the tail fiber-dependent host specificity in P. aeruginosa phages and provided an effective tool for its alteration. These contributions may have potential value in phage therapy
Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents
Interpretable multi-hop reading comprehension (RC) over multiple documents is
a challenging problem because it demands reasoning over multiple information
sources and explaining the answer prediction by providing supporting evidences.
In this paper, we propose an effective and interpretable Select, Answer and
Explain (SAE) system to solve the multi-document RC problem. Our system first
filters out answer-unrelated documents and thus reduce the amount of
distraction information. This is achieved by a document classifier trained with
a novel pairwise learning-to-rank loss. The selected answer-related documents
are then input to a model to jointly predict the answer and supporting
sentences. The model is optimized with a multi-task learning objective on both
token level for answer prediction and sentence level for supporting sentences
prediction, together with an attention-based interaction between these two
tasks. Evaluated on HotpotQA, a challenging multi-hop RC data set, the proposed
SAE system achieves top competitive performance in distractor setting compared
to other existing systems on the leaderboard.Comment: Accepted to AAAI 202
The progenitors of type Ia supernovae in the semidetached binaries with red giant donors
Context. The companions of the exploding carbon-oxygen white dwarfs (CO WDs)
for producing type Ia supernovae (SNe Ia) are still not conclusively confirmed.
A red-giant (RG) star has been suggested to be the mass donor of the exploding
WD, named as the symbiotic channel. However, previous studies on the this
channel gave a relatively low rate of SNe Ia. Aims. We aim to systematically
investigate the parameter space, Galactic rates and delay time distributions of
SNe Ia from the symbiotic channel by employing a revised mass-transfer
prescription. Methods. We adopted an integrated mass-transfer prescription to
calculate the mass-transfer process from a RG star onto the WD. In this
prescription, the mass-transfer rate varies with the local material states.
Results. We evolved a large number of WD+RG systems, and found that the
parameter space of WD+RG systems for producing SNe Ia is significantly
enlarged. This channel could produce SNe Ia with intermediate and old ages,
contributing to at most 5% of all SNe Ia in the Galaxy. Our model increases the
SN Ia rate from this channel by a factor of 5. We suggest that the symbiotic
systems RS Oph and T CrB are strong candidates for the progenitors of SNe Ia.Comment: 8 pages, 6 figure
- …