3,014 research outputs found

    Scalable Similarity Search for Molecular Descriptors

    Full text link
    Similarity search over chemical compound databases is a fundamental task in the discovery and design of novel drug-like molecules. Such databases often encode molecules as non-negative integer vectors, called molecular descriptors, which represent rich information on various molecular properties. While there exist efficient indexing structures for searching databases of binary vectors, solutions for more general integer vectors are in their infancy. In this paper we present a time- and space- efficient index for the problem that we call the succinct intervals-splitting tree algorithm for molecular descriptors (SITAd). Our approach extends efficient methods for binary-vector databases, and uses ideas from succinct data structures. Our experiments, on a large database of over 40 million compounds, show SITAd significantly outperforms alternative approaches in practice.Comment: To be appeared in the Proceedings of SISAP'1

    DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

    Full text link
    Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors are shown to be not informative enough to predict accurate DTIs. Thus, in this study, we employ a convolutional neural network (CNN) on raw protein sequences to capture local residue patterns participating in DTIs. With CNN on protein sequences, our model performs better than previous protein descriptor-based models. In addition, our model performs better than the previous deep learning model for massive prediction of DTIs. By examining the pooled convolution results, we found that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure
    • …
    corecore