420,411 research outputs found
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
The sequence-to-sequence (seq2seq) task aims at generating the target
sequence based on the given input source sequence. Traditionally, most of the
seq2seq task is resolved by the Encoder-Decoder framework which requires an
encoder to encode the source sequence and a decoder to generate the target
text. Recently, a bunch of new approaches have emerged that apply decoder-only
language models directly to the seq2seq task. Despite the significant
advancements in applying language models to the seq2seq task, there is still a
lack of thorough analysis on the effectiveness of the decoder-only language
model architecture. This paper aims to address this gap by conducting a
detailed comparison between the encoder-decoder architecture and the
decoder-only language model framework through the analysis of a regularized
encoder-decoder structure. This structure is designed to replicate all
behaviors in the classical decoder-only language model but has an encoder and a
decoder making it easier to be compared with the classical encoder-decoder
structure. Based on the analysis, we unveil the attention degeneration problem
in the language model, namely, as the generation step number grows, less and
less attention is focused on the source sequence. To give a quantitative
understanding of this problem, we conduct a theoretical sensitivity analysis of
the attention output with respect to the source input. Grounded on our
analysis, we propose a novel partial attention language model to solve the
attention degeneration problem. Experimental results on machine translation,
summarization, and data-to-text generation tasks support our analysis and
demonstrate the effectiveness of our proposed model
A Sequence-to-Sequence&Set Model for Text-to-Table Generation
Recently, the text-to-table generation task has attracted increasing
attention due to its wide applications. In this aspect, the dominant model
formalizes this task as a sequence-to-sequence generation task and serializes
each table into a token sequence during training by concatenating all rows in a
top-down order. However, it suffers from two serious defects: 1) the predefined
order introduces a wrong bias during training, which highly penalizes shifts in
the order between rows; 2) the error propagation problem becomes serious when
the model outputs a long token sequence. In this paper, we first conduct a
preliminary study to demonstrate the generation of most rows is
order-insensitive. Furthermore, we propose a novel sequence-to-sequence&set
text-to-table generation model. Specifically, in addition to a text encoder
encoding the input text, our model is equipped with a table header generator to
first output a table header, i.e., the first row of the table, in the manner of
sequence generation. Then we use a table body generator with learnable row
embeddings and column embeddings to generate a set of table body rows in
parallel. Particularly, to deal with the issue that there is no correspondence
between each generated table body row and target during training, we propose a
target assignment strategy based on the bipartite matching between the first
cells of generated table body rows and targets. Experiment results show that
our model significantly surpasses the baselines, achieving state-of-the-art
performance on commonly-used datasets
PepMLM: Target Sequence-Conditioned Generation of Peptide Binders via Masked Language Modeling
Target proteins that lack accessible binding pockets and conformational
stability have posed increasing challenges for drug development. Induced
proximity strategies, such as PROTACs and molecular glues, have thus gained
attention as pharmacological alternatives, but still require small molecule
docking at binding pockets for targeted protein degradation (TPD). The
computational design of protein-based binders presents unique opportunities to
access undruggable targets, but have often relied on stable 3D structures or
predictions for effective binder generation. Recently, we have leveraged the
expressive latent spaces of protein language models (pLMs) for the
prioritization of peptide binders from sequence alone, which we have then fused
to E3 ubiquitin ligase domains, creating a CRISPR-analogous TPD system for
target proteins. However, our methods rely on training discriminator models for
ranking heuristically or unconditionally-derived guide peptides for their
target binding capability. In this work, we introduce PepMLM, a purely target
sequence-conditioned de novo generator of linear peptide binders. By employing
a novel masking strategy that uniquely positions cognate peptide sequences at
the terminus of target protein sequences, PepMLM tasks the state-of-the-art
ESM-2 pLM to fully reconstruct the binder region, achieving low perplexities
matching or improving upon previously-validated peptide-protein sequence pairs.
After successful in silico benchmarking with AlphaFold-Multimer, we
experimentally verify PepMLM's efficacy via fusion of model-derived peptides to
E3 ubiquitin ligase domains, demonstrating endogenous degradation of target
substrates in cellular models. In total, PepMLM enables the generative design
of candidate binders to any target protein, without the requirement of target
structure, empowering downstream programmable proteome editing applications
AbODE: Ab Initio Antibody Design using Conjoined ODEs
Antibodies are Y-shaped proteins that neutralize pathogens and constitute the
core of our adaptive immune system. De novo generation of new antibodies that
target specific antigens holds the key to accelerating vaccine discovery.
However, this co-design of the amino acid sequence and the 3D structure
subsumes and accentuates some central challenges from multiple tasks, including
protein folding (sequence to structure), inverse folding (structure to
sequence), and docking (binding). We strive to surmount these challenges with a
new generative model AbODE that extends graph PDEs to accommodate both
contextual information and external interactions. Unlike existing approaches,
AbODE uses a single round of full-shot decoding and elicits continuous
differential attention that encapsulates and evolves with latent interactions
within the antibody as well as those involving the antigen. We unravel
fundamental connections between AbODE and temporal networks as well as
graph-matching networks. The proposed model significantly outperforms existing
methods on standard metrics across benchmarks.Comment: Accepted at ICML 202
- …