Search CORE

15,051 research outputs found

DTi2Vec: Drug-target interaction prediction using network embedding and ensemble learning.

Author: Albaradei Somayah
Bajic Vladimir B
Essack Magbubah
Gao Xin
Gojobori Takashi
Olayan Rawan S
Thafar Maha A
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/09/2021
Field of study

Drug-target interaction (DTI) prediction is a crucial step in drug discovery and repositioning as it reduces experimental validation costs if done right. Thus, developing in-silico methods to predict potential DTI has become a competitive research niche, with one of its main focuses being improving the prediction accuracy. Using machine learning (ML) models for this task, specifically network-based approaches, is effective and has shown great advantages over the other computational methods. However, ML model development involves upstream hand-crafted feature extraction and other processes that impact prediction accuracy. Thus, network-based representation learning techniques that provide automated feature extraction combined with traditional ML classifiers dealing with downstream link prediction tasks may be better-suited paradigms. Here, we present such a method, DTi2Vec, which identifies DTIs using network representation learning and ensemble learning techniques. DTi2Vec constructs the heterogeneous network, and then it automatically generates features for each drug and target using the nodes embedding technique. DTi2Vec demonstrated its ability in drug-target link prediction compared to several state-of-the-art network-based methods, using four benchmark datasets and large-scale data compiled from DrugBank. DTi2Vec showed a statistically significant increase in the prediction performances in terms of AUPR. We verified the novel predicted DTIs using several databases and scientific literature. DTi2Vec is a simple yet effective method that provides high DTI prediction performance while being scalable and efficient in computation, translating into a powerful drug repositioning tool

The Jackson Laboratory: The Mouseion at the JAXlibrary

Directory of Open Access Journals

PubMed Central

Deciphering the Preference and Predicting the Viability of Circular Permutations in Proteins

Author: A Bakan
A Chakrabartty
A Guerler
A Guerler
A Jeltsch
A Kuzmanic
A Pintar
AC Wallace
AE Todd
AR Panchenko
AR van Erkel
AS Aranko
B Anand
B Halle
B Lee
BA Cunningham
BE Jones
C Pommie
C Vogel
CC Chang
CH Lu
CH Shih
CJ Crasto
CP Lin
CP Ponting
D Bordo
DA Case
Darren R. Flower
DL Nelson
DM Carrington
EA Ribeiro Jr
ESC Shih
FH Arnold
G Amitai
G Bulaj
G Pollastri
GS Baird
H Iwai
H Zhang
HK Liang
HM Berman
I Bahar
I Remy
J Chen
J Hennecke
J Weiner III
J Zhu
JD Pedelacq
Jenn-Kang Hwang
JM Bujnicki
JM Word
JM Yang
JR Quinlan
K Nishikawa
KH Paszkiewicz
L Chen
L Li
LC Tsai
LG Gebhard
Li-Fen Wang
M Elarabaty
M Iwakura
M Kojima
M Ostermeier
M Paluszewski
M Zavodszky
ML Connolly
MN Nguyen
PC Lyu
Ping-Chiang Lyu
PJ Werbos
R Garrett
R Vandrunen
RJ Moreau
S Akanuma
S Hovmoller
S Kundu
S Topell
S Uliel
S Uliel
SF Betz
SG Peisajovich
SJ Hubbard
ST Hsu
T Haliloglu
T Hesterberg
T Nakamura
T Noguchi
Tian Dai
TU Schwartz
V Anantharaman
V Muralidharan
W Kabsch
W Li
W Zheng
WC Lo
WC Lo
WC Lo
Wei-Cheng Lo
WR Pearson
Y Lindqvist
Y Yu
Y Zhang
Yen-Yi Liu
Z Qian
Publication venue: Public Library of Science
Publication date: 16/02/2012
Field of study

Circular permutation (CP) refers to situations in which the termini of a protein are relocated to other positions in the structure. CP occurs naturally and has been artificially created to study protein function, stability and folding. Recently CP is increasingly applied to engineer enzyme structure and function, and to create bifunctional fusion proteins unachievable by tandem fusion. CP is a complicated and expensive technique. An intrinsic difficulty in its application lies in the fact that not every position in a protein is amenable for creating a viable permutant. To examine the preferences of CP and develop CP viability prediction methods, we carried out comprehensive analyses of the sequence, structural, and dynamical properties of known CP sites using a variety of statistics and simulation methods, such as the bootstrap aggregating, permutation test and molecular dynamics simulations. CP particularly favors Gly, Pro, Asp and Asn. Positions preferred by CP lie within coils, loops, turns, and at residues that are exposed to solvent, weakly hydrogen-bonded, environmentally unpacked, or flexible. Disfavored positions include Cys, bulky hydrophobic residues, and residues located within helices or near the protein's core. These results fostered the development of an effective viable CP site prediction system, which combined four machine learning methods, e.g., artificial neural networks, the support vector machine, a random forest, and a hierarchical feature integration procedure developed in this work. As assessed by using the hydrofolate reductase dataset as the independent evaluation dataset, this prediction system achieved an AUC of 0.9. Large-scale predictions have been performed for nine thousand representative protein structures; several new potential applications of CP were thus identified. Many unreported preferences of CP are revealed in this study. The developed system is the best CP viability prediction method currently available. This work will facilitate the application of CP in research and biotechnology

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Radiogenomics Framework for Associating Medical Image Features with Tumour Genetic Characteristics

Author: Xia Tian
Publication venue: 'Journal of the Faculty of Engineering and Architecture of Gazi University'
Publication date: 01/01/2023
Field of study

Significant progress has been made in the understanding of human cancers at the molecular genetics level and it is providing new insights into their underlying pathophysiology. This progress has enabled the subclassification of the disease and the development of targeted therapies that address specific biological pathways. However, obtaining genetic information remains invasive and costly. Medical imaging is a non-invasive technique that captures important visual characteristics (i.e. image features) of abnormalities and plays an important role in routine clinical practice. Advancements in computerised medical image analysis have enabled quantitative approaches to extract image features that can reflect tumour genetic characteristics, leading to the emergence of ‘radiogenomics’. Radiogenomics investigates the relationships between medical imaging features and tumour molecular characteristics, and enables the derivation of imaging surrogates (radiogenomics features) to genetic biomarkers that can provide alternative approaches to non-invasive and accurate cancer diagnosis. This thesis presents a new framework that combines several novel methods for radiogenomics analysis that associates medical image features with tumour genetic characteristics, with the main objectives being: i) a comprehensive characterisation of tumour image features that reflect underlying genetic information; ii) a method that identifies radiogenomics features encoding common pathophysiological information across different diseases, overcoming the dependence on large annotated datasets; and iii) a method that quantifies radiogenomics features from multi-modal imaging data and accounts for unique information encoded in tumour heterogeneity sub-regions. The present radiogenomics methods advance radiogenomics analysis and contribute to improving research in computerised medical image analysis

Sydney eScholarship

Merging Ligand-Based and Structure-Based Methods in Drug Discovery: An Overview of Combined Virtual Screening Approaches

Author: Gibert Enric
Herrero Enric
Luque Garriga F. Xavier
López Manel
Vázquez Javier
Publication venue: 'MDPI AG'
Publication date: 22/10/2020
Field of study

Virtual screening (VS) is an outstanding cornerstone in the drug discovery pipeline. A variety of computational approaches, which are generally classified as ligand-based (LB) and structure-based (SB) techniques, exploit key structural and physicochemical properties of ligands and targets to enable the screening of virtual libraries in the search of active compounds. Though LB and SB methods have found widespread application in the discovery of novel drug-like candidates, their complementary natures have stimulated continued e orts toward the development of hybrid strategies that combine LB and SB techniques, integrating them in a holistic computational framework that exploits the available information of both ligand and target to enhance the success of drug discovery projects. In this review, we analyze the main strategies and concepts that have emerged in the last years for defining hybrid LB + SB computational schemes in VS studies. Particularly, attention is focused on the combination of molecular similarity and docking, illustrating them with selected applications taken from the literature

Diposit Digital de la Universitat de Barcelona

TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation

Author: Bin Lin
Hao Li
Junwu Zhang
Li Yuan
Liuzhenghao Lv
Yonghong Tian
Yu-Chian Chen Calvin
Zongying Lin
Publication venue
Publication date: 26/02/2024
Field of study

Designing protein sequences with specific biological functions and structural stability is crucial in biology and chemistry. Generative models already demonstrated their capabilities for reliable protein design. However, previous models are limited to the unconditional generation of protein sequences and lack the controllable generation ability that is vital to biological tasks. In this work, we propose TaxDiff, a taxonomic-guided diffusion model for controllable protein sequence generation that combines biological species information with the generative capabilities of diffusion models to generate structurally stable proteins within the sequence space. Specifically, taxonomic control information is inserted into each layer of the transformer block to achieve fine-grained control. The combination of global and local attention ensures the sequence consistency and structural foldability of taxonomic-specific proteins. Extensive experiments demonstrate that TaxDiff can consistently achieve better performance on multiple protein sequence generation benchmarks in both taxonomic-guided controllable generation and unconditional generation. Remarkably, the sequences generated by TaxDiff even surpass those produced by direct-structure-generation models in terms of confidence based on predicted structures and require only a quarter of the time of models based on the diffusion model. The code for generating proteins and training new versions of TaxDiff is available at:https://github.com/Linzy19/TaxDiff

arXiv.org e-Print Archive