4,045 research outputs found

    Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

    Full text link
    Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using a vector to measure pairwise dependency, but this requires to expand the alignment matrix to a tensor, which results in memory and computation bottlenecks. In this paper, we propose a novel attention mechanism called "Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token) and global (source2token) dependencies by a novel compatibility function composed of dot-product and additive attentions, 2) uses a tensor to represent the feature-wise alignment scores for better expressive power but only requires parallelizable matrix multiplications, and 3) combines multi-head with multi-dimensional attentions, and applies a distinct positional mask to each head (subspace), so the memory and computation can be distributed to multiple heads, each with sequential information encoded independently. The experiments show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or competitive performance on nine NLP benchmarks with compelling memory- and time-efficiency

    Prediction of the burial status of transmembrane residues of helical membrane proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Helical membrane proteins (HMPs) play a crucial role in diverse cellular processes, yet it still remains extremely difficult to determine their structures by experimental techniques. Given this situation, it is highly desirable to develop sequence-based computational methods for predicting structural characteristics of HMPs.</p> <p>Results</p> <p>We have developed TMX (TransMembrane eXposure), a novel method for predicting the burial status (i.e. buried in the protein structure vs. exposed to the membrane) of transmembrane (TM) residues of HMPs. TMX derives positional scores of TM residues based on their profiles and conservation indices. Then, a support vector classifier is used for predicting their burial status. Its prediction accuracy is 78.71% on a benchmark data set, representing considerable improvements over 68.67% and 71.06% of previously proposed methods. Importantly, unlike the previous methods, TMX automatically yields confidence scores for the predictions made. In addition, a feature selection incorporated in TMX reveals interesting insights into the structural organization of HMPs.</p> <p>Conclusion</p> <p>A novel computational method, TMX, has been developed for predicting the burial status of TM residues of HMPs. Its prediction accuracy is much higher than that of previously proposed methods. It will be useful in elucidating structural characteristics of HMPs as an inexpensive, auxiliary tool. A web server for TMX is established at http://service.bioinformatik.uni-saarland.de/tmx and freely available to academic users, along with the data set used.</p

    RBF-TSS: Identification of Transcription Start Site in Human Using Radial Basis Functions Network and Oligonucleotide Positional Frequencies

    Get PDF
    Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods

    Ribosome signatures aid bacterial translation initiation site identification

    Get PDF
    Background: While methods for annotation of genes are increasingly reliable, the exact identification of translation initiation sites remains a challenging problem. Since the N-termini of proteins often contain regulatory and targeting information, developing a robust method for start site identification is crucial. Ribosome profiling reads show distinct patterns of read length distributions around translation initiation sites. These patterns are typically lost in standard ribosome profiling analysis pipelines, when reads from footprints are adjusted to determine the specific codon being translated. Results: Utilising these signatures in combination with nucleotide sequence information, we build a model capable of predicting translation initiation sites and demonstrate its high accuracy using N-terminal proteomics. Applying this to prokaryotic translatomes, we re-annotate translation initiation sites and provide evidence of N-terminal truncations and extensions of previously annotated coding sequences. These re-annotations are supported by the presence of structural and sequence-based features next to N-terminal peptide evidence. Finally, our model identifies 61 novel genes previously undiscovered in the Salmonella enterica genome. Conclusions: Signatures within ribosome profiling read length distributions can be used in combination with nucleotide sequence information to provide accurate genome-wide identification of translation initiation sites
    • …
    corecore