Search CORE

1,142 research outputs found

Exploring the potential of 3D Zernike descriptors and SVM for protein\u2013protein interface prediction

Author: Daberdaku Sebastian
Ferrari Carlo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Abstract Background The correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. Results In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). Conclusions The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Padova

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Directory of Open Access Journals

eScholarship - University of California

Applications of Support Vector Machines as a Robust tool in High Throughput Virtual Screening

Author: Kulkarni B. D.
Tambe S. S.
Vyas Renu
Publication venue: International Journal for Computational Biology (IJCB)
Publication date: 21/04/2014
Field of study

Chemical space is enormously huge but not all of it is pertinent for the drug designing. Virtual screening methods act as knowledge-based filters to discover the coveted novel lead molecules possessing desired pharmacological properties. Support Vector Machines (SVM) is a reliable virtual screening tool for prioritizing molecules with the required biological activity and minimum toxicity. It has to its credit inherent advantages such as support for noisy data mainly coming from varied high-throughput biological assays, high sensitivity, specificity, prediction accuracy and reduction in false positives. SVM-based classification methods can efficiently discriminate inhibitors from non-inhibitors, actives from inactives, toxic from non-toxic and promiscuous from non-promiscuous molecules. As the principles of drug design are also applicable for agrochemicals, SVM methods are being applied for virtual screening for pesticides too. The current review discusses the basic kernels and models used for binary discrimination and also features used for developing SVM-based scoring functions, which will enhance our understanding of molecular interactions. SVM modeling has also been compared by many researchers with other statistical methods such as Artificial Neural Networks, k-nearest neighbour (kNN), decision trees, partial least squares, etc. Such studies have also been discussed in this review. Moreover, a case study involving the use of SVM method for screening molecules for cancer therapy has been carried out and the preliminary results presented here indicate that the SVM is an excellent classifier for screening the molecules

International Journal for Computational Biology (IJCB)

Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search

Author: A Bhattacharyya
A Szabóová
A Szilágyi
Andrea Szabóová
CJC Burges
CO Pabo
DH Ohlendorf
DW Hosmer
EW Stawiski
Filip Železný
G Nimrod
J Moreland
Jakub Tolar
L Breiman
N Bhardwaj
N Lavrač
Ondřej Kuželka
R Caruana
R Sathyapriya
S Ahmad
S Jones
S Jones
T Cathomen
T Hastie
Y Mandel-Gutfreund
Y Tsuchiya
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

We contribute a novel, ball-histogram approach to DNA-binding propensity prediction of proteins. Unlike state-of-the-art methods based on constructing an ad-hoc set of features describing physicochemical properties of the proteins, the ball-histogram technique enables a systematic, Monte-Carlo exploration of the spatial distribution of amino acids complying with automatically selected properties. This exploration yields a model for the prediction of DNA binding propensity. We validate our method in prediction experiments, improving on state-of-the-art accuracies. Moreover, our method also provides interpretable features involving spatial distributions of selected amino acids

Springer - Publisher Connector

Directory of Open Access Journals

Identification of protein functions using a machine-learning approach based on sequence-derived properties

Author: Lee Bum Ju
Oh Hae Seok
Oh Young Joon
Ryu Keun Ho
Shin Moon Sun
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities. Results A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function. Conclusion We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new <it>PNPRD </it>features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions.</p

Directory of Open Access Journals

Transcriptome-wide functional characterization reveals novel relationships among differentially expressed transcripts in developing soybean embryos

Author: Andrew Schneider
Delasa Aghamirzaie
Dhruv Batra
Eva Collakova
Lenwood S. Heath
Ruth Grene
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Sense and antisense transcripts and primers chosen for validation of RNA-Seq-based expression level changes. Sense and antisense transcripts are shown with the corresponding annotation, primer pairs used for qPCR, time points of differential expression, and notes on the presence of additional melt curve peaks. (PPTX 39 kb

Springer - Publisher Connector

FigShare