Search CORE

4,647 research outputs found

Inference of Functional Relations in Predicted Protein Networks with a Machine Learning Approach

Author: A Enright
A Valencia
Alfonso Valencia
Beatriz García-Jiménez
C Alfarano
C Drummond
CM Bishop
Cv Mering
Cv Mering
D Juan
David Juan
DE Rumelhart
E Frank
E Morett
EA León
Eduardo Andrés-León
EM Marcotte
F Pazos
F Pazos
F Pazos
G Butland
GF Cooper
GH John
GI Webb
H Hermjakob
Iakes Ezkurdia
IH Witten
IM Keseler
J Wu
JG Cleary
L Breiman
L Salwinski
LJ Lu
M Arifuzzaman
M Kanehisa
M Pellegrini
M Sahami
N Friedman
P Bowers
R Hoffmann
RC Edgar
RR Bouckaert
SF Altschul
Shin-Han Shiu
T Dandekar
T Sato
Y Freund
Y Qi
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Molecular biology is currently facing the challenging task of functionally characterizing the proteome. The large number of possible protein-protein interactions and complexes, the variety of environmental conditions and cellular states in which these interactions can be reorganized, and the multiple ways in which a protein can influence the function of others, requires the development of experimental and computational approaches to analyze and predict functional associations between proteins as part of their activity in the interactome. Methodology/Principal Findings: We have studied the possibility of constructing a classifier in order to combine the output of the several protein interaction prediction methods. The AODE (Averaged One-Dependence Estimators) machine learning algorithm is a suitable choice in this case and it provides better results than the individual prediction methods, and it has better performances than other tested alternative methods in this experimental set up. To illustrate the potential use of this new AODE-based Predictor of Protein InterActions (APPIA), when analyzing high-throughput experimental data, we show how it helps to filter the results of published High-Throughput proteomic studies, ranking in a significant way functionally related pairs. Availability: All the predictions of the individual methods and of the combined APPIA predictor, together with the used datasets of functional associations are available at http://ecid.bioinfo.cnio.es/. Conclusions: We propose a strategy that integrates the main current computational techniques used to predict functional associations into a unified classifier system, specifically focusing on the evaluation of poorly characterized protein pairs. We selected the AODE classifier as the appropriate tool to perform this task. AODE is particularly useful to extract valuable information from large unbalanced and heterogeneous data sets. The combination of the information provided by five prediction interaction prediction methods with some simple sequence features in APPIA is useful in establishing reliability values and helpful to prioritize functional interactions that can be further experimentally characterized.This work was funded by the BioSapiens (grant number LSHG-CT-2003-503265) and the Experimental Network for Functional Integration (ENFIN) Networks of Excellence (contract number LSHG-CT-2005-518254), by Consolider BSC (grant number CSD2007-00050) and by the project “Functions for gene sets” from the Spanish Ministry of Education and Science (BIO2007-66855). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

Universidad Carlos III de Madrid e-Archivo

Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor

Author: Bogdan Istrate
Calem J Bendell
Michael Zhao
Paul T Cernek
Robert A Murgita
Samuel Khan
Sergiu Picioreanu
Shalon Liu
Tristan Aumentado-Armstrong
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: Transient protein-protein interactions (PPIs), which underly most biological processes, are a prime target for therapeutic development. Immense progress has been made towards computational prediction of PPIs using methods such as protein docking and sequence analysis. However, docking generally requires high resolution structures of both of the binding partners and sequence analysis requires that a significant number of recurrent patterns exist for the identification of a potential binding site. Researchers have turned to machine learning to overcome some of the other methods’ restrictions by generalising interface sites with sets of descriptive features. Best practices for dataset generation, features, and learning algorithms have not yet been identified or agreed upon, and an analysis of the overall efficacy of machine learning based PPI predictors is due, in order to highlight potential areas for improvement. RESULTS: The presence of unknown interaction sites as a result of limited knowledge about protein interactions in the testing set dramatically reduces prediction accuracy. Greater accuracy in labelling the data by enforcing higher interface site rates per domain resulted in an average 44% improvement across multiple machine learning algorithms. A set of 10 biologically unrelated proteins that were consistently predicted on with high accuracy emerged through our analysis. We identify seven features with the most predictive power over multiple datasets and machine learning algorithms. Through our analysis, we created a new predictor, RAD-T, that outperforms existing non-structurally specializing machine learning protein interface predictors, with an average 59% increase in MCC score on a dataset with a high number of interactions. CONCLUSION: Current methods of evaluating machine-learning based PPI predictors tend to undervalue their performance, which may be artificially decreased by the presence of un-identified interaction sites. Changes to predictors’ training sets will be integral to the future progress of interface prediction by machine learning methods. We reveal the need for a larger test set of well studied proteins or domain-specific scoring algorithms to compensate for poor interaction site identification on proteins in general

Springer - Publisher Connector

PubMed Central

Land subsidence susceptibility mapping in South Korea using machine learning algorithms

Author: Ahmad BB
Bui DT
Chapi K
Chen W
Khosravi K
Panahi M
Pradhan B
Saro L
Shahabi H
Shirzadi A
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. In this study, land subsidence susceptibility was assessed for a study area in South Korea by using four machine learning models including Bayesian Logistic Regression (BLR), Support Vector Machine (SVM), Logistic Model Tree (LMT) and Alternate Decision Tree (ADTree). Eight conditioning factors were distinguished as the most important affecting factors on land subsidence of Jeong-am area, including slope angle, distance to drift, drift density, geology, distance to lineament, lineament density, land use and rock-mass rating (RMR) were applied to modelling. About 24 previously occurred land subsidence were surveyed and used as training dataset (70% of data) and validation dataset (30% of data) in the modelling process. Each studied model generated a land subsidence susceptibility map (LSSM). The maps were verified using several appropriate tools including statistical indices, the area under the receiver operating characteristic (AUROC) and success rate (SR) and prediction rate (PR) curves. The results of this study indicated that the BLR model produced LSSM with higher acceptable accuracy and reliability compared to the other applied models, even though the other models also had reasonable results

Multidisciplinary Digital Publishing Institute

OPUS - University of Technology Sydney

Directory of Open Access Journals

Universiti Teknologi Malaysia Institutional Repository

The era of big data: Genome-scale modelling meets machine learning

Author: Antonakoudis A
Barbosa R
Kontoravdi K
Kotidis P
Publication venue: 'Elsevier BV'
Publication date: 08/10/2020
Field of study

With omics data being generated at an unprecedented rate, genome-scale modelling has become pivotal in its organisation and analysis. However, machine learning methods have been gaining ground in cases where knowledge is insufficient to represent the mechanisms underlying such data or as a means for data curation prior to attempting mechanistic modelling. We discuss the latest advances in genome-scale modelling and the development of optimisation algorithms for network and error reduction, intracellular constraining and applications to strain design. We further review applications of supervised and unsupervised machine learning methods to omics datasets from microbial and mammalian cell systems and present efforts to harness the potential of both modelling approaches through hybrid modelling

Spiral - Imperial College Digital Repository

Recommended from our members

Identifying metabolic enzymes with multiple types of association evidence

Author: Chen Lifeng
Church George M
Freund Yoav
Kharchenko Peter
Vitkup Dennis
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. RESULTS: We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. CONCLUSION: We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities

Harvard University - DASH

Springer - Publisher Connector

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

High-Throughput, Time-Resolved Mechanical Phenotyping of Prostate Cancer Cells

Author: Belotti Yuri
Conneely Michael
Huang Tianjun
McGloin David
McKenna Stephen
Nabi Ghulam
Tolomeo Serenella
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2019
Field of study

Abstract Worldwide, prostate cancer sits only behind lung cancer as the most commonly diagnosed form of the disease in men. Even the best diagnostic standards lack precision, presenting issues with false positives and unneeded surgical intervention for patients. This lack of clear cut early diagnostic tools is a significant problem. We present a microfluidic platform, the Time-Resolved Hydrodynamic Stretcher (TR-HS), which allows the investigation of the dynamic mechanical response of thousands of cells per second to a non-destructive stress. The TR-HS integrates high-speed imaging and computer vision to automatically detect and track single cells suspended in a fluid and enables cell classification based on their mechanical properties. We demonstrate the discrimination of healthy and cancerous prostate cell lines based on the whole-cell, time-resolved mechanical response to a hydrodynamic load. Additionally, we implement a finite element method (FEM) model to characterise the forces responsible for the cell deformation in our device. Finally, we report the classification of the two different cell groups based on their time-resolved roundness using a decision tree classifier. This approach introduces a modality for high-throughput assessments of cellular suspensions and may represent a viable application for the development of innovative diagnostic devices

Aberdeen University Research

Directory of Open Access Journals

OPUS - University of Technology Sydney

University of Dundee Online Publications

Markov Models of Amino Acid Substitution to Study Proteins with Intrinsically Disordered Regions

Author: Anisimova Maria
Szalkowski Adam M.
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Intrinsically disordered proteins (IDPs) or proteins with disordered regions (IDRs) do not have a well-defined tertiary structure, but perform a multitude of functions, often relying on their native disorder to achieve the binding flexibility through changing to alternative conformations. Intrinsic disorder is frequently found in all three kingdoms of life, and may occur in short stretches or span whole proteins. To date most studies contrasting the differences between ordered and disordered proteins focused on simple summary statistics. Here, we propose an evolutionary approach to study IDPs, and contrast patterns specific to ordered protein regions and the corresponding IDRs.Two empirical Markov models of amino acid substitutions were estimated, based on a large set of multiple sequence alignments with experimentally verified annotations of disordered regions from the DisProt database of IDPs. We applied new methods to detect differences in Markovian evolution and evolutionary rates between IDRs and the corresponding ordered protein regions. Further, we investigated the distribution of IDPs among functional categories, biochemical pathways and their preponderance to contain tandem repeats. disorder prediction using a phylogenetic Hidden Markov Model based on our matrices showed a performance similar to other disorder predictors

Public Library of Science (PLOS)

Repository for Publications and Research Data

Directory of Open Access Journals

PubMed Central

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Aerospace Medicine and Biology, a continuing bibliography with indexes

Author
Publication venue
Publication date
Field of study

This bibliography lists 365 reports, articles and other documents introduced into the NASA scientific and technical information system in October 1984

NASA Technical Reports Server