Search CORE

1,213 research outputs found

A new approach to assess and predict the functional roles of proteins across all known structures

Author: A Bairoch
A Medrano-Soto
A Preumont
AG Murzin
AS Juncker
B Rost
BH Dessailly
C Radauer
CF Schaefer
D Devos
D Lee
D Pal
D Petrey
D Yarullina
Elchin S. Julfayev
EM Marcotte
F Pazos
H Takahashi
HM Berman
I Friedberg
I Levin
J Benach
JS Richardson
JU Bowie
L Aravind
L Jaroszewski
L Xie
M Ashburner
M Chruszcz
M Kanehisa
M Levitt
P Yue
PD Karp
R Nair
R Rentzsch
RA Laskowski
RD Finn
RE Schapire
RL Marsden
RM Ward
Ryan J. McLaughlin
S Singh
SF Altschul
SK Burley
TC Terwilliger
VA McKusick
William A. McLaughlin
Yi-Ping Tao
YYA Godzik
Publication venue: Springer Netherlands
Publication date: 01/01/2011
Field of study

The three dimensional atomic structures of proteins provide information regarding their function; and codified relationships between structure and function enable the assessment of function from structure. In the current study, a new data mining tool was implemented that checks current gene ontology (GO) annotations and predicts new ones across all the protein structures available in the Protein Data Bank (PDB). The tool overcomes some of the challenges of utilizing large amounts of protein annotation and measurement information to form correspondences between protein structure and function. Protein attributes were extracted from the Structural Biology Knowledgebase and open source biological databases. Based on the presence or absence of a given set of attributes, a given protein’s functional annotations were inferred. The results show that attributes derived from the three dimensional structures of proteins enhanced predictions over that using attributes only derived from primary amino acid sequence. Some predictions reflected known but not completely documented GO annotations. For example, predictions for the GO term for copper ion binding reflected used information a copper ion was known to interact with the protein based on information in a ligand interaction database. Other predictions were novel and require further experimental validation. These include predictions for proteins labeled as unknown function in the PDB. Two examples are a role in the regulation of transcription for the protein AF1396 from Archaeoglobus fulgidus and a role in RNA metabolism for the protein psuG from Thermotoga maritima

Crossref

Springer - Publisher Connector

PubMed Central

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci.

Author: Barnes MR
Cabrera CP
John CR
Munroe PB
Nicholls HL
Watson DS
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS - the ability to detect genetic association by linkage disequilibrium (LD) - is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact

UCL Discovery

Oxford University Research Archive

Queen Mary Research Online

BioEve Search: A Novel Framework to Facilitate Interactive Literature Search

Author: Ahmed Syed Toufeeq
Davulcu Hasan
Nair Radhika
Tikves Sukru
Zhao Zhongming
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2012
Field of study

Background. Recent advances in computational and biological methods in last two decades have remarkably changed the scale of biomedical research and with it began the unprecedented growth in both the production of biomedical data and amount of published literature discussing it. An automated extraction system coupled with a cognitive search and navigation service over these document collections would not only save time and effort, but also pave the way to discover hitherto unknown information implicitly conveyed in the texts. Results. We developed a novel framework (named “BioEve”) that seamlessly integrates Faceted Search (Information Retrieval) with Information Extraction module to provide an interactive search experience for the researchers in life sciences. It enables guided step-by-step search query refinement, by suggesting concepts and entities (like genes, drugs, and diseases) to quickly filter and modify search direction, and thereby facilitating an enriched paradigm where user can discover related concepts and keywords to search while information seeking. Conclusions. The BioEve Search framework makes it easier to enable scalable interactive search over large collection of textual articles and to discover knowledge hidden in thousands of biomedical literature articles with ease

Crossref

Directory of Open Access Journals

PubMed Central

What Can Machine Learning Approaches in Genomics Tell Us about the Molecular Basis of Amyotrophic Lateral Sclerosis?

Author: Duddy William
Duguez Stephanie
Giannakopoulos George
Morris Andrew P.
Vasilopoulou Christina
Publication venue: 'MDPI AG'
Publication date: 01/11/2020
Field of study

Ulster University's Research Portal

The University of Manchester - Institutional Repository

Hierarchical ensemble methods for protein function prediction

Author: G. Valentini
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware \u201cflat\u201d prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a \u201cconsensus\u201d ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research

AIR Universita degli studi di Milano

Directory of Open Access Journals

Non-homology-based prediction of gene functions in maize (\u3ci\u3eZea mays\u3c/i\u3e ssp. \u3ci\u3emays\u3c/i\u3e)

Author: Dai Xiuru
Li Pinghua
Liang Zhikai
Schnable James
Tu Xiaoyu
Xu Zheng
Zhong Silin
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2020
Field of study

Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions.As a result, homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were evaluated for their ability to accurately predict gene function from non-homology gene features. Among the eight supervised classification algorithms evaluated, random forest-based prediction consistently provided the most accurate gene function prediction. Non-homology-based functional annotation provides complementary strengths to homology-based annotation, with higher average performance in Biological Process GO terms, the domain where homology-based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology-based functional annotation is highest. GO prediction models trained with homology-based annotations were able to successfully predict annotations from a manually curated “gold standard” GO annotation set. Non-homology-based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology-based functional annotations

DigitalCommons@University of Nebraska