Search CORE

5 research outputs found

PDNAsite:identification of DNA-binding site from protein sequence by incorporating spatial and sequence context

Author: A Bochkarev
AN Bullock
AP Bradley
B Liu
C Yan
CA BDavey
CC Chang
CO Pabo
EW Stawiski
H Tjong
HM Berman
IB Kuznetsov
J Wu
JA Swets
KL Griffith
L Wang
L Wang
L Wang
L Wang
M Ptashne
M Radlinska
M Terribilini
MY Gutfreund
N Bhardwaj
NM Luscombe
NM Luscombe
P Ozbek
QW Dong
R Liu
R Liu
R Xu
R Xu
RD Kornberg
S Ahmad
S Ahmad
S Hwang
SY Ho
T Li
W Kabsch
X Ma
X Zhao
Y Ofran
YC Chen
Z Yuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref

PolyU Institutional Repository

PubMed Central

Aston Publications Explorer

Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory

Author: Baozhong Zhu
Haipeng Zhao
Hongjie Wu
Tengsheng Jiang
Zhiming Cui
Publication venue: AIMS Press
Publication date: 01/01/2024
Field of study

DNA-protein binding is crucial for the normal development and function of organisms. The significance of accurately identifying DNA-protein binding sites lies in its role in disease prevention and the development of innovative approaches to disease treatment. In the present study, we introduce a precise and robust identifier for DNA-protein binding residues. In the context of protein representation, we combine the evolutionary information of the protein, represented by its position-specific scoring matrix, with the spatial information of the protein's secondary structure, enriching the overall informational content. This approach initially employs a combination of Bi-directional Long Short-Term Memory and Transformer encoder to jointly extract the interdependencies among residues within the protein sequence. Subsequently, convolutional operations are applied to the resulting feature matrix to capture local features of the residues. Experimental results on the benchmark dataset demonstrate that our method exhibits a higher level of competitiveness when compared to contemporary classifiers. Specifically, our method achieved an MCC of 0.349, SP of 96.50%, SN of 44.03% and ACC of 94.59% on the PDNA-41 dataset

Directory of Open Access Journals

Probabilistic Models to Detect Important Sites in Proteins

Author: Dang Truong Khanh Linh
Publication venue
Publication date: 24/09/2020
Field of study

Georg-August-University Göttingen

Klasifikasi Dna Tuberkulosis Berdasarkan K-Mer Menggunakan Support Vector Machine (Svm) Dan Variable Neighborhood Search (Vns)

Author: Anshori Mochammad
Dr Ahmad Afif Supianto , .Eng., S.Si., M.Kom
Wayan Firdaus Mahmudy , S.Si., M.T., Ph.D.
Publication venue
Publication date: 08/08/2018
Field of study

Tuberkulosis adalah penyakit yang disebabkan oleh mycobacterium tuberculosis dan termasuk kedalam salah satu dari 10 penyebab kematian di dunia. Oleh karena itu diperlukan pendeteksian secara lebih akurat supaya dapat diberikan penanganan yang tepat. Dalam pendeteksiannya, terkadang terjadi kesalahan karena menyerupai dengan penyakit paru-paru lainnya. Penelitian ini menerapkan algoritme machine learning dalam melakukan deteksi penyakit Tuberkulosis dengan menggunakan data DNA karena semua organisme memiliki struktur DNA. Metode yang digunakan adalah support vector machine (SVM) yang dioptimasi dengan variable neighborhood search (VNS). SVM digunakan untuk klasifikasi dan VNS digunakan untuk optimasi dari parameter SVM. SVM dipilih karena bagus dalam generalisasi data. Data DNA sebelum digunakan sebagai masukan kedalam SVM perlu dilakukan preprocessing terlebih dahulu dengan menggunakan k-Mer untuk mengambil substring DNA kemudian mengkonversinya menjadi data berupa numerik dan dilakukan reduksi dimensi karena fitur data yang banyak. Performa dari SVM tergantung dari pemilihan parameter yang tepat, oleh karena itu dioptimasi dengan VNS dan VNS yang digunakan adalah VNS yang telah dimodifikasi, yaitu nested RVNS. k-Mer terbaik pada penelitian ini bernilai k = 5. Hasil akhir setelah dilakukan optimasi adalah akurasi = 0.995708, presisi = 0.995765, recall = 0.995708, F measure = 0.995557, dan MCC = 0.992659. Akurasi ini lebih baik daripada sebelum dilakukan optimasi, yang bernilai 0.927039. Dengan menggunakan nested RVNS, berjalan 2.5 kali lebih cepat daripada VNS dasat dalam mencari parameter SVM yang optima

bkg

Information theoretical approaches for the identification of potentially cooperating transcription factors

Author: Meckbach Cornelia
Publication venue
Publication date: 21/06/2019
Field of study

Georg-August-University Göttingen