Search CORE

8,768 research outputs found

Automated methods of predicting the function of biological sequences using GO and BLAST

Author: Baumann Ute
Brown Alfred L
Jones Craig E
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with high accuracy. Our aim is to demonstrate how accuracy benchmarking can be used in a decision-making process evaluating competing designs of biological function predictors. We utilise the Gene Ontology, GO, a directed acyclic graph of functional terms, to annotate sequences with functional information describing their biological context. Initially we examine the effect on accuracy scores of increasing the allowed distance between predicted and a test set of curator assigned terms. Next we evaluate several annotator methods using accuracy benchmarking. Given an unannotated sequence we use the Basic Local Alignment Search Tool, BLAST, to find similar sequences that have already been assigned GO terms by curators. A number of methods were developed that utilise terms associated with the best five matching sequences. These methods were compared against a benchmark method of simply using terms associated with the best BLAST-matched sequence (best BLAST approach). RESULTS: The precision and recall of estimates increases rapidly as the amount of distance permitted between a predicted term and a correct term assignment increases. Accuracy benchmarking allows a comparison of annotation methods. A covering graph approach performs poorly, except where the term assignment rate is high. A term distance concordance approach has a similar accuracy to the best BLAST approach, demonstrating lower precision but higher recall. However, a discriminant function method has higher precision and recall than the best BLAST approach and other methods shown here. CONCLUSION: Allowing term predictions to be counted correct if closely related to a correct term decreases the reliability of the accuracy score. As such we recommend using accuracy measures that require exact matching of predicted terms with curator assigned terms. Furthermore, we conclude that competing designs of BLAST-based GO term annotators can be effectively compared using an accuracy benchmarking approach. The most accurate annotation method was developed using data mining techniques. As such we recommend that designers of term annotators utilise accuracy benchmarking and data mining to ensure newly developed annotators are of high quality

Adelaide Research & Scholarship

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ProLanGO: Protein Function Prediction Using Neural~Machine Translation Based on a Recurrent Neural Network

Author: Cao Renzhi
Chan Leong
Chen Zhangxin
Freitas Colton
Jiang Haiqing
Sun Miao
Publication venue
Publication date: 01/10/2017
Field of study

With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

SIFTER search: a web server for accurate phylogeny-based protein function prediction.

Author: Brenner Steven E
Luo Kevin R
Sahraeian Sayed M
Publication venue: eScholarship, University of California
Publication date: 01/01/2015
Field of study

We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded

CiteSeerX

PubMed Central

eScholarship - University of California

Automated data integration for developmental biological research

Author: Sternberg Paul W.
Zhong Weiwei
Publication venue: 'The Company of Biologists'
Publication date: 15/09/2007
Field of study

In an era exploding with genome-scale data, a major challenge for developmental biologists is how to extract significant clues from these publicly available data to benefit our studies of individual genes, and how to use them to improve our understanding of development at a systems level. Several studies have successfully demonstrated new approaches to classic developmental questions by computationally integrating various genome-wide data sets. Such computational approaches have shown great potential for facilitating research: instead of testing 20,000 genes, researchers might test 200 to the same effect. We discuss the nature and state of this art as it applies to developmental research

Caltech Authors

The Phyre2 web portal for protein modeling, prediction and analysis

Author: A González-Pérez
A Lobley
A Marchler-Bauer
A Roy
AA Canutescu
BR Jefferys
C Mao
Christopher M Yates
CM Yates
CT Porter
DT Jones
DT Jones
EV Koonin
G Fucile
IA Adzhubei
IW Davis
J Moult
J Söding
JA Capra
JJ Ward
K Arnold
LA Kelley
Lawrence A Kelley
M Higurashi
M Källberg
M Remmert
Mark N Wass
Michael J E Sternberg
MN Wass
N Siew
Ngak-Leng Sim
P Rotkiewicz
P Schmidtke
R Arjun
S Raman
SF Altschul
Stefans Mezulis
TE Lewis
X Wei
Publication venue: Springer
Publication date: 01/05/2015
Field of study

Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission

Crossref

ZENODO

PubMed Central

Kent Academic Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Spiral - Imperial College Digital Repository

Transcriptome analysis of Taenia solium cysticerci using Open reading Frame ESTS (ORESTES)

Author: Almeida Carolina R.
Bayer-Santos Ethel
Davila Alberto M. R.
Dias-Neto Emmanuel
Ferreira Henrique B.
Grisard Edmundo C.
Maia Antônio A.
Ojopi Elida P. B.
Rodrigues Juliana B.
Rotava Gianinna
Sincero Thaís C. M.
Sperandio Maísa M.
Stoco Patricia H.
Tyler Kevin M.
Wagner Glauber
Zaha Arnaldo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract Background Human infection by the pork tapeworm <it>Taenia solium </it>affects more than 50 million people worldwide, particularly in underdeveloped and developing countries. Cysticercosis which arises from larval encystation can be life threatening and difficult to treat. Here, we investigate for the first time the transcriptome of the clinically relevant cysticerci larval form. Results Using Expressed Sequence Tags (ESTs) produced by the ORESTES method, a total of 1,520 high quality ESTs were generated from 20 ORESTES cDNA mini-libraries and its analysis revealed fragments of genes with promising applications including 51 ESTs matching antigens previously described in other species, as well as 113 sequences representing proteins with potential extracellular localization, with obvious applications for immune-diagnosis or vaccine development. Conclusion The set of sequences described here will contribute to deciphering the expression profile of this important parasite and will be informative for the genome assembly and annotation, as well as for studies of intra- and inter-specific sequence variability. Genes of interest for developing new diagnostic and therapeutic tools are described and discussed.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

University of East Anglia digital repository

MoKCa database - mutations of kinases in cancer

Author: Alfarano
Altschul
Berman
Braconi Quintaje
Bruford
Burnworth
Chatr-aryamontri
Christopher J. Richardson
Clifford
Costas Mitsopoulous
Daley
Diella
Fernández
Finn
Flicek
Forbes
Frances M. G. Pearl
Gene Ontology Consortium
Greenman
Greenman
Hanahan
Hulo
Kaminker
Kaminker
Kerrien
Koorstra
Lappalainen
Laurence H. Pearl
Letunic
Manning
Marketa Zvelebil
Mishra
Ng
O'Brien
Ortutay
Pagel
Pearl
Qiong Gao
Sawyers
Sjöblom
Stark
Torkamani
UniProt Consortium
Vastrik
Velankar
Wheeler
Wood
Yeats
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2009
Field of study

Members of the protein kinase family are amongst the most commonly mutated genes in human cancer, and both mutated and activated protein kinases have proved to be tractable targets for the development of new anticancer therapies The MoKCa database (Mutations of Kinases in Cancer, http://strubiol.icr.ac.uk/extra/mokca) has been developed to structurally and functionally annotate, and where possible predict, the phenotypic consequences of mutations in protein kinases implicated in cancer. Somatic mutation data from tumours and tumour cell lines have been mapped onto the crystal structures of the affected protein domains. Positions of the mutated amino-acids are highlighted on a sequence-based domain pictogram, as well as a 3D-image of the protein structure, and in a molecular graphics package, integrated for interactive viewing. The data associated with each mutation is presented in the Web interface, along with expert annotation of the detailed molecular functional implications of the mutation. Proteins are linked to functional annotation resources and are annotated with structural and functional features such as domains and phosphorylation sites. MoKCa aims to provide assessments available from multiple sources and algorithms for each potential cancer-associated mutation, and present these together in a consistent and coherent fashion to facilitate authoritative annotation by cancer biologists and structural biologists, directly involved in the generation and analysis of new mutational data

Crossref

PubMed Central

Institute of Cancer Research Repository

Sussex Research Online