40 research outputs found

    Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

    No full text
    Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost of training the attention mechanism to learn complex dependencies between distant inputs continues to grow. In response, solutions that exploit the structure and sparsity of the learned attention matrix have blossomed. However, real-world applications that involve long sequences, such as biological sequence analysis, may fall short of meeting these assumptions, precluding exploration of these models. To address this challenge, we present a new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR). Our mechanism scales linearly rather than quadratically in the number of tokens in the sequence, is characterized by sub-quadratic space complexity and does not incorporate any sparsity pattern priors. Furthermore, it provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence. It is also backwards-compatible with pre-trained regular Transformers. We demonstrate its effectiveness on the challenging task of protein sequence modeling and provide detailed theoretical analysis

    Rethinking Attention with Performers

    No full text
    We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+), which may be of independent interest for scalable kernel methods. FAVOR+ can be also used to efficiently model kernelizable attention mechanisms beyond softmax. This representational power is crucial to accurately compare softmax with other kernels for the first time on large-scale tasks, beyond the reach of regular Transformers, and investigate optimal attention-kernels. Performers are linear architectures fully compatible with regular Transformers and with strong theoretical guarantees: unbiased or nearly-unbiased estimation of the attention matrix, uniform convergence and low estimation variance. We tested Performers on a rich set of tasks stretching from pixel-prediction through text models to protein sequence modeling. We demonstrate competitive results with other examined efficient sparse and dense attention methods, showcasing effectiveness of the novel attention-learning paradigm leveraged by Performers

    Personal non-commercial use only

    No full text
    ABSTRACT. Objective. To determine whether anti-peptidylarginine deiminase type 4 (PAD4) antibodies were present in first-degree relatives (FDR) of patients with rheumatoid arthritis (RA) in 2 indigenous North American populations with high prevalence of RA. Methods. Participants were recruited from 2 indigenous populations in Canada and the United States, including patients with RA (probands), their unaffected FDR, and healthy unrelated controls. Sera were tested for the presence of anti-PAD4 antibodies, anticyclic citrullinated peptide (anti-CCP) antibodies, and rheumatoid factor (RF). HLA-DRB1 subtyping was performed and participants were classified according to number of shared-epitope alleles present. Results. Antibodies to PAD4 were detected in 24 of 82 (29.3%) probands; 2 of 147 (1.4%) relatives; and no controls (p < 0.0001). Anti-CCP was present in 39/144 (27.1%) of the relatives, and there was no overlap between positivity for anti-CCP and PAD4 in the relatives. In RA patients, anti-PAD4 antibodies were associated with disease duration (p = 0.0082) and anti-CCP antibodies (p = 0.008), but not smoking or shared-epitope alleles. Conclusion. Despite a significant prevalence of anti-CCP in FDR, anti-PAD4 antibodies were almost exclusively found in established RA. The prevalence of anti-PAD4 antibodies in RA is similar to the prevalence described in other populations and these autoantibodies are associated with disease duration an
    corecore