Search CORE

3,014 research outputs found

Scalable Similarity Search for Molecular Descriptors

Author: A Leach
AM Bender
B Chen
D Vida
J Chen
M Keiser
M Kotera
M Kotera
R Nasr
R Sawada
R Todeschini
TG Kristensen
Publication venue
Publication date: 09/08/2017
Field of study

Similarity search over chemical compound databases is a fundamental task in the discovery and design of novel drug-like molecules. Such databases often encode molecules as non-negative integer vectors, called molecular descriptors, which represent rich information on various molecular properties. While there exist efficient indexing structures for searching databases of binary vectors, solutions for more general integer vectors are in their infancy. In this paper we present a time- and space- efficient index for the problem that we call the succinct intervals-splitting tree algorithm for molecular descriptors (SITAd). Our approach extends efficient methods for binary-vector databases, and uses ideas from succinct data structures. Our experiments, on a large database of over 40 million compounds, show SITAd significantly outperforms alternative approaches in practice.Comment: To be appeared in the Proceedings of SISAP'1

arXiv.org e-Print Archive

Crossref

DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Author: Keum Jongsoo
Lee Ingoo
Nam Hojung
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 05/11/2018
Field of study

Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors are shown to be not informative enough to predict accurate DTIs. Thus, in this study, we employ a convolutional neural network (CNN) on raw protein sequences to capture local residue patterns participating in DTIs. With CNN on protein sequences, our model performs better than previous protein descriptor-based models. In addition, our model performs better than the previous deep learning model for massive prediction of DTIs. By examining the pooled convolution results, we found that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure

arXiv.org e-Print Archive

Directory of Open Access Journals