Search CORE

649 research outputs found

Prediction of aptamer-protein interacting pairs using an ensemble classifier in combination with various protein sequence attributes

Author: Chengjin Zhang
Lina Zhang
Qing Song
Rui Gao
Runtao Yang
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

The ranked feature list given by the Relief algorithm. Within the list, a feature with a smaller index indicates that it is more important for aptamer-protein interacting pair prediction. Such a list of ranked features are used to establish the optimal feature set in the IFS procedure. (XLS 56.5 kb

Springer - Publisher Connector

FigShare

Playing hide and seek on the genomic playground: unveiling biological function from literature

Author: Van Landeghem Sofie
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications

Author: Liu W
Ma C
Ormerod JT
Yang JYH
Yang P
Zomaya AY
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

© 2018 IEEE. Class labels are required for supervised learning but may be corrupted or missing in various applications. In binary classification, for example, when only a subset of positive instances is labeled whereas the remaining are unlabeled, positive-unlabeled (PU) learning is required to model from both positive and unlabeled data. Similarly, when class labels are corrupted by mislabeled instances, methods are needed for learning in the presence of class label noise (LN). Here we propose adaptive sampling (AdaSampling), a framework for both PU learning and learning with class LN. By iteratively estimating the class mislabeling probability with an adaptive sampling procedure, the proposed method progressively reduces the risk of selecting mislabeled instances for model training and subsequently constructs highly generalizable models even when a large proportion of mislabeled instances is present in the data. We demonstrate the utilities of proposed methods using simulation and benchmark data, and compare them to alternative approaches that are commonly used for PU learning and/or learning with LN. We then introduce two novel bioinformatics applications where AdaSampling is used to: 1) identify kinase-substrates from mass spectrometry-based phosphoproteomics data and 2) predict transcription factor target genes by integrating various next-generation sequencing data

OPUS - University of Technology Sydney

IDENTIFYING MOLECULAR FUNCTIONS OF DYNEIN MOTOR PROTEINS USING EXTREME GRADIENT BOOSTING ALGORITHM WITH MACHINE LEARNING

Author: Ghulam Ali
Maher Zulfikar Ahmed
Saba Erum
Sikander Rahu
Talpur Dhani Bux
Talpur Mir Sajjad Hussain
Tunio Saima
Publication venue: Karakoram International University Gilgit, Pakistan
Publication date: 29/11/2022
Field of study

The majority of cytoplasmic proteins and vesicles move actively primarily to dynein motor proteins, which are the cause of muscle contraction. Moreover, identifying how dynein are used in cells will rely on structural knowledge. Cytoskeletal motor proteins have different molecular roles and structures, and they belong to three superfamilies of dynamin, actin and myosin. Loss of function of specific molecular motor proteins can be attributed to a number of human diseases, such as Charcot-Charcot-Dystrophy and kidney disease.  It is crucial to create a precise model to identify dynein motor proteins in order to aid scientists in understanding their molecular role and designing therapeutic targets based on their influence on human disease. Therefore, we develop an accurate and efficient computational methodology is highly desired, especially when using cutting-edge machine learning methods. In this article, we proposed a machine learning-based superfamily of cytoskeletal motor protein locations prediction method called extreme gradient boosting (XGBoost). We get the initial feature set All by extraction the protein features from the sequence and evolutionary data of the amino acid residues named BLOUSM62. Through our successful eXtreme gradient boosting (XGBoost), accuracy score 0.8676%, Precision score 0.8768%, Sensitivity score 0.760%, Specificity score 0.9752% and MCC score 0.7536%.  Our method has demonstrated substantial improvements in the performance of many of the evaluation parameters compared to other state-of-the-art methods. This study offers an effective model for the classification of dynein proteins and lays a foundation for further research to improve the efficiency of protein functional classification

Journal of Mountain Area Research (Karakoram International University, Gilgit, Pakistan)

Rice_Phospho 1.0: a new rice-specific SVM predictor for protein phosphorylation sites

Author: A Palmeri
AH Gandomi
B Petersen
BR Chitteti
CR Ingrell
GK Agrawal
H He
H Nakagami
HD Huang
J Gao
J Gao
JC Obenauer
JH Kim
JL Heazlewood
K Chen
KC Chou
L Breiman
LM Iakoucheva
M Hall
M Sikic
MM Aziz
N Blom
N Blom
P Han
R Kumar
S Que
SW Chang
V Neduva
X Chen
XW Chen
XW Zhao
Y Ban
Y Ke
Y Xue
Y Xue
YZ Chen
Z Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/07/2015
Field of study

Experimentally-determined or computationally-predicted protein phosphorylation sites for distinctive species are becoming increasingly common. In this paper, we compare the predictive performance of a novel classification algorithm with different encoding schemes to develop a rice-specific protein phosphorylation site predictor. Our results imply that the combination of Amino acid occurrence Frequency with Composition of K-Spaced Amino Acid Pairs (AF-CKSAAP) provides the best description of relevant sequence features that surround a phosphorylation site. A support vector machine (SVM) using AF-CKSAAP achieves the best performance in classifying rice protein phophorylation sites when compared to the other algorithms. We have used SVM with AF-CKSAAP to construct a rice-specific protein phosphorylation sites predictor, Rice-Phospho 1.0 (http://bioinformatics.fafu.edu.cn/rice-phospho1.0). We measure the Accuracy (ACC) and Matthews Correlation Coefficient (MCC) of Rice-Phospho 1.0 to be 82.0% and 0.64, significantly higher than those measures for other predictors such as Scansite, Musite, PlantPhos and PhosphoRice. Rice-Phospho 1.0 also successfully predicted the experimentally identified phosphorylation sites in LOC-Os03g51600.1, a protein sequence which did not appear in the training dataset. In summary, Rice-phospho 1.0 outputs reliable predictions of protein phosphorylation sites in rice, and will serve as a useful tool to the community

University of Essex Research Repository

Crossref

PubMed Central

Selected Works in Bioinformatics

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This book consists of nine chapters covering a variety of bioinformatics subjects, ranging from database resources for protein allergens, unravelling genetic determinants of complex disorders, characterization and prediction of regulatory motifs, computational methods for identifying the best classifiers and key disease genes in large-scale transcriptomic and proteomic experiments, functional characterization of inherently unfolded proteins/regions, protein interaction networks and flexible protein-protein docking. The computational algorithms are in general presented in a way that is accessible to advanced undergraduate students, graduate students and researchers in molecular biology and genetics. The book should also serve as stepping stones for mathematicians, biostatisticians, and computational scientists to cross their academic boundaries into the dynamic and ever-expanding field of bioinformatics

Directory of Open Access Books (DOAB)

Computational approaches in high-throughput proteomics data analysis

Author: Lahesmaa-Korpinen Anna-Maria
Publication venue: 'University of Helsinki Libraries'
Publication date: 29/06/2012
Field of study

Proteins are key components in biological systems as they mediate the signaling responsible for information processing in a cell and organism. In biomedical research, one goal is to elucidate the mechanisms of cellular signal transduction pathways to identify possible defects that cause disease. Advancements in technologies such as mass spectrometry and flow cytometry enable the measurement of multiple proteins from a system. Proteomics, or the large-scale study of proteins of a system, thus plays an important role in biomedical research. The analysis of all high-throughput proteomics data requires the use of advanced computational methods. Thus, the combination of bioinformatics and proteomics has become an important part in research of signal transduction pathways. The main objective in this study was to develop and apply computational methods for the preprocessing, analysis and interpretation of high-throughput proteomics data. The methods focused on data from tandem mass spectrometry and single cell flow cytometry, and integration of proteomics data with gene expression microarray data and information from various biological databases. Overall, the methods developed and applied in this study have led to new ways of management and preprocessing of proteomics data. Additionally, the available tools have successfully been used to help interpret biomedical data and to facilitate analysis of data that would have been cumbersome to do without the use of computational methods.Proteiineilla on tärkeä merkitys biologisissa systeemeissä sillä ne koordinoivat erilaisia solujen ja organismien prosesseja. Yksi biolääketieteellisen tutkimuksen tavoitteista on valottaa solujen viestintäreittejä ja niiden toiminnassa tapahtuvia muutoksia eri sairauksien yhteydessä, jotta tällaisia muutoksia voitaisiin korjata. Proteomiikka on proteiinien laajamittaista tutkimista solusta, kudoksesta tai organismista. Proteomiikan menetelmät kuten massaspektrometria ja virtaussytometria ovat keskeisiä biolääketieteellisen tutkimuksen menetelmiä, joilla voidaan mitata näytteestä samanaikaisesti useita proteiineja. Nykyajan kehittyneet proteomiikan mittausteknologiat tuottavat suuria tulosaineistoja ja edellyttävät laskennallisten menetelmien käyttöä aineiston analyysissä. Bioinformatiikan menetelmät ovatkin nousseet tärkeäksi osaksi proteomiikka-analyysiä ja viestintäreittien tutkimusta. Tämän tutkimuksen päätavoite oli kehittää ja soveltaa tehokkaita laskennallisia menetelmiä laajamittaisten proteomiikka-aineistojen esikäsittelyyn, analyysiin ja tulkintaan. Tässä tutkimuksessa kehitettiin esikäsittelymenetelmä massaspektrometria-aineistolle sekä automatisoitu analyysimenetelmä virtaussytometria-aineistolle. Proteiinitason tietoa yhdistettiin mittauksiin geenien transkriptiotasoista ja olemassaolevaan biologisista tietokannoista poimittuun tietoon. Väitöskirjatyö osoittaa, että laskennallisilla menetelmillä on keskeinen merkitys proteomiikan aineistojen hallinnassa, esikäsittelyssä ja analyysissä. Tutkimuksessa kehitetyt analyysimenetelmät edistävät huomattavasti biolääketieteellisen tiedon laajempaa hyödyntämistä ja ymmärtämistä

Helsingin yliopiston digitaalinen arkisto

Prediction of DNA-Binding Proteins and their Binding Sites

Author: Pokhrel Pujan
Publication venue: ScholarWorks@UNO
Publication date: 01/05/2018
Field of study

DNA-binding proteins play an important role in various essential biological processes such as DNA replication, recombination, repair, gene transcription, and expression. The identification of DNA-binding proteins and the residues involved in the contacts is important for understanding the DNA-binding mechanism in proteins. Moreover, it has been reported in the literature that the mutations of some DNA-binding residues on proteins are associated with some diseases. The identification of these proteins and their binding mechanism generally require experimental techniques, which makes large scale study extremely difficult. Thus, the prediction of DNA-binding proteins and their binding sites from sequences alone is one of the most challenging problems in the field of genome annotation. Since the start of the human genome project, many attempts have been made to solve the problem with different approaches, but the accuracy of these methods is still not suitable to do large scale annotation of proteins. Rather than relying solely on the existing machine learning techniques, I sought to combine those using novel “stacking technique” and used the problem-specific architectures to solve the problem with better accuracy than the existing methods. This thesis presents a possible solution to the DNA-binding proteins prediction problem which performs better than the state-of-the-art approaches

University of New Orleans