Search CORE

4,102 research outputs found

Convolutional LSTM Networks for Subcellular Localization of Proteins

Author: A Graves
A Höglund
A Prlić
C Magnan
G Dahl
HY Xiong
LJP Maaten Van Der
M Schuster
MCF Thomsen
O Emanuelsson
P Baldi
P Lena Di
S Briesemeister
S Henikoff
S Hochreiter
SF Altschul
T Blum
T Goldberg
T Petersen
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Online Research Database In Technology

De novo structural modeling and computational sequence analysis of a bacteriocin protein isolated from Rhizobium leguminosarum bv. viciae strain LC-31

Author: Butt AM
Haq F
Khan IB
Tong Y
Publication venue: 'African Journals Online (AJOL)'
Publication date: 01/10/2013
Field of study

Bacteriocins produced by different groups of bacteria are ribosomally synthesized peptides or proteins with antimicrobial and specific antagonistic bacterial interaction activity. Rhizobium leguminosarum is a Gram-negative soil bacterium which plays an important role in nitrogen fixation in leguminose plants. Bacteriocins produced by different strains of R. leguminosarum are known to impart antagonistic effects on other closely related strains. Recently, a bacteriocin gene was isolated from R. leguminosarum bv. viceae strain LC-31. Our study was aimed towards computational proteomic analysis and 3D structural modeling of this novel bacteriocin protein encoded by the earlier aforementioned gene. Different bioinformatics tools and machine learning techniques were used for protein structural classification. De novo protein modeling was performed by using I-TASSER server. The final model obtained was accessed by PROCHECK and DFIRE2, which confirmed that the final model is reliable. Until complete biochemical and structural data of bacteriocin protein produced by R. leguminosarum bv. viceae strain LC-31 are determined by experimental means, this model can serve as a valuable reference for characterizing this multifunctional protein.Key words: Bacteriocin, rhizobium, protein modelling, nodulation, symbiosis, nitrogen fixation

AJOL - African Journals Online

Machine learning-guided directed evolution for protein engineering

Author: Arnold Frances H.
Wu Zachary
Yang Kevin K.
Publication venue
Publication date: 19/04/2019
Field of study

Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

arXiv.org e-Print Archive

Caltech Authors

The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function

Author: A Armon
A Bateman
A Godzik
A Passerini
A Pierleoni
A Stark
AE Todd
AT Laurie
B Rost
B Rost
BA Shoemaker
C Notredame
CA Innis
CA Orengo
CA Wilson
CE Stebbins
CJ Jeffery
CJ Jeffery
CP Ponting
CT Porter
D Brown
D Desveaux
D Devos
D Pal
D Petrey
D Petrey
E Krissinel
E Reynolds
EP Gianchandani
F Corpet
F Ferron
F Zhou
Fran Lewitter
G Theissen
GJ Bartlett
GJ Kleywegt
GL Holliday
H Nakashima
HL Schubert
HM Berman
IM Wallace
J Hawkins
J Thompson
JA Barker
JB Bard
JC Whisstock
JG Henikoff
JM Thornton
JS Sodhi
JW Torrance
JZ Wang
K Goyal
K Hofmann
K Karplus
K Nakai
L Holm
L Jaroszewski
L Shapiro
L Wang
LJ Jensen
M Babor
M Gruber
M Linial
M Lippi
M Nayal
M Remm
Marco Punta
MJ Hartshorn
O Emanuelsson
O Lichtarge
OA Bateman
OC Redfern
P Puntervoll
PD Thomas
R Apweiler
R Kolodny
R Nair
R Nair
R Nair
RA Laskowski
RL Tatusov
S Altschul
S Shazman
SG Lee
T Gabaldon
TA Binkowski
TJ Hubbard
TK Attwood
VA Ivanisenko
W Humphrey
W Tian
Y Ofran
Y Ye
Yanay Ofran
Publication venue: Public Library of Science
Publication date: 01/10/2008
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

Evaluation of secretion prediction highlights differing approaches needed for oomycete and fungal effectors

Author: Amselem
Andersen
Bendtsen
Bendtsen
Bringans
Brown
Cantu
Choo
de Wit
Dean
Dean
Dodds
Duplessis
Emanuelsson
Emanuelsson
Emanuelsson
Floudas
Galagan
Galagan
Godfrey
Goffeau
Guida
Guyon
Haas
Hane
Hane
Horton
Kamoun
Kemen
Klee
Kloppholz
Klosterman
Krogh
Käll
Kämper
Links
Liu
Lo Presti
Lowe
Lévesque
Ma
Machida
Manning
Martin
Martinez
Meijer
Melhem
Menne
Min
Morin
Nemri
Nielsen
Nielsen
Nielsen
O'Connell
Ohm
Ohm
Paper
Petersen
Petre
Poppe
Raffaele
Raffaele
Reid
Ridout
Rouxel
Saunders
Schornack
Spanu
Sperschneider
Stajich
Testa
Torto
Tyler
Urban
Von Heijne
Wang
Wawra
Wicker
Wiemann
Yang
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2015
Field of study

© 2015 Sperschneider, Williams, Hane, Singh and Taylor. The steadily increasing number of sequenced fungal and oomycete genomes has enabled detailed studies of how these eukaryotic microbes infect plants and cause devastating losses in food crops. During infection, fungal and oomycete pathogens secrete effector molecules which manipulate host plant cell processes to the pathogen's advantage. Proteinaceous effectors are synthesized intracellularly and must be externalized to interact with host cells. Computational prediction of secreted proteins from genomic sequences is an important technique to narrow down the candidate effector repertoire for subsequent experimental validation. In this study, we benchmark secretion prediction tools on experimentally validated fungal and oomycete effectors. We observe that for a set of fungal SwissProt protein sequences, SignalP 4 and the neural network predictors of SignalP 3 (D-score) and SignalP 2 perform best. For effector prediction in particular, the use of a sensitive method can be desirable to obtain the most complete candidate effector set. We show that the neural network predictors of SignalP 2 and 3, as well as TargetP were the most sensitive tools for fungal effector secretion prediction, whereas the hidden Markov model predictors of SignalP 2 and 3 were the most sensitive tools for oomycete effectors. Thus, previous versions of SignalP retain value for oomycete effector prediction, as the current version, SignalP 4, was unable to reliably predict the signal peptide of the oomycete Crinkler effectors in the test set. Our assessment of subcellular localization predictors shows that cytoplasmic effectors are often predicted as not extracellular. This limits the reliability of secretion predictions that depend on these tools. We present our assessment with a view to informing future pathogenomics studies and suggest revised pipelines for secretion prediction to obtain optimal effector predictions in fungi and oomycetes

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

espace@Curtin

Identification And Functional Characterization Of Plant Small Secreted Proteins During Arbuscular Mycorrhizal Symbiosis

Author: Hu Xiaoli
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2021
Field of study

Plant small secreted proteins (SSPs) are sequences of 50 – 250 amino acids in size which are transported out of cells to fulfill multiple functions related to plant growth and development and response to various stresses. With the development of more accurate and affordable genome sequencing technology, an increasing number of SSPs have been predicted using diverse computational tools based on machine learning. Although experimentally validated plant SSPs are still limited, some studies have reported that plant SSPs can be induced and involved in mutualistic relationships between plants and microbes. In Chapter I, known SSPs and their functions in various plant species are reviewed. Additionally, current computational tools and experimental methods that have been widely applied to identify plant SSPs are summarized. A new, robust, and integrated pipeline to discover plant SSPs is proposed. Furthermore, strategies for elucidating the biological functions of SSPs in plants are discussed in Chapter I. Chapter II presents predicted SSPs from 60 plant species and elucidates the evolutionary convergence of changes in SSP sequences. Furthermore, the expression of SSPs induced by arbuscular mycorrhizal fungi (AMF) which correspond to the convergent abilityfor different plants to form mutualistic association with AMF are explored. Overall, this study provides insightful ideas to understand functions of plant SSPs that occur during symbiosis between plants and fungi

University of Tennessee, Knoxville: Trace

Deep Learning for Genomics: A Concise Overview

Author: Wang Haohan
Yue Tianwei
Publication venue
Publication date: 08/05/2018
Field of study

Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.Comment: Invited chapter for Springer Book: Handbook of Deep Learning Application

arXiv.org e-Print Archive

PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding

Author: Liu Runcheng
Lu Jiarui
Ma Chang
Tang Jian
Xu Minghao
Zhang Yangtian
Zhang Zuobai
Zhu Zhaocheng
Publication venue
Publication date: 19/09/2022
Field of study

We are now witnessing significant progress of deep learning methods in a variety of tasks (or datasets) of proteins. However, there is a lack of a standard benchmark to evaluate the performance of different methods, which hinders the progress of deep learning in this field. In this paper, we propose such a benchmark called PEER, a comprehensive and multi-task benchmark for Protein sEquence undERstanding. PEER provides a set of diverse protein understanding tasks including protein function prediction, protein localization prediction, protein structure prediction, protein-protein interaction prediction, and protein-ligand interaction prediction. We evaluate different types of sequence-based methods for each task including traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein language models. In addition, we also investigate the performance of these methods under the multi-task learning setting. Experimental results show that large-scale pre-trained protein language models achieve the best performance for most individual tasks, and jointly training multiple tasks further boosts the performance. The datasets and source codes of this benchmark are all available at https://github.com/DeepGraphLearning/PEER_BenchmarkComment: Accepted by NeurIPS 2022 Dataset and Benchmark Track. arXiv v2: source code released; arXiv v1: release all benchmark result

arXiv.org e-Print Archive