12,890 research outputs found
A Multiple Classifier System Identifies Novel Cannabinoid CB2 Receptor Ligands
open access articleDrugs have become an essential part of our lives due to their ability to improve people’s
health and quality of life. However, for many diseases, approved drugs are not yet available
or existing drugs have undesirable side effects, making the pharmaceutical industry strive to
discover new drugs and active compounds. The development of drugs is an expensive
process, which typically starts with the detection of candidate molecules (screening) for an
identified protein target. To this end, the use of high-performance screening techniques has
become a critical issue in order to palliate the high costs. Therefore, the popularity of
computer-based screening (often called virtual screening or in-silico screening) has rapidly
increased during the last decade. A wide variety of Machine Learning (ML) techniques has
been used in conjunction with chemical structure and physicochemical properties for
screening purposes including (i) simple classifiers, (ii) ensemble methods, and more recently
(iii) Multiple Classifier Systems (MCS). In this work, we apply an MCS for virtual screening
(D2-MCS) using circular fingerprints. We applied our technique to a dataset of cannabinoid
CB2 ligands obtained from the ChEMBL database. The HTS collection of Enamine
(1.834.362 compounds), was virtually screened to identify 48.432 potential active molecules
using D2-MCS. This list was subsequently clustered based on circular fingerprints and from
each cluster, the most active compound was maintained. From these, the top 60 were kept,
and 21 novel compounds were purchased. Experimental validation confirmed six highly
active hits (>50% displacement at 10 μM and subsequent Ki determination) and an
additional five medium active hits (>25% displacement at 10 μM). D2-MCS hence provided a
hit rate of 29% for highly active compounds and an overall hit rate of 52%
DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
Identification of drug-target interactions (DTIs) plays a key role in drug
discovery. The high cost and labor-intensive nature of in vitro and in vivo
experiments have highlighted the importance of in silico-based DTI prediction
approaches. In several computational models, conventional protein descriptors
are shown to be not informative enough to predict accurate DTIs. Thus, in this
study, we employ a convolutional neural network (CNN) on raw protein sequences
to capture local residue patterns participating in DTIs. With CNN on protein
sequences, our model performs better than previous protein descriptor-based
models. In addition, our model performs better than the previous deep learning
model for massive prediction of DTIs. By examining the pooled convolution
results, we found that our model can detect binding sites of proteins for DTIs.
In conclusion, our prediction model for detecting local residue patterns of
target proteins successfully enriches the protein features of a raw protein
sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure
Systems analysis of host-parasite interactions.
Parasitic diseases caused by protozoan pathogens lead to hundreds of thousands of deaths per year in addition to substantial suffering and socioeconomic decline for millions of people worldwide. The lack of effective vaccines coupled with the widespread emergence of drug-resistant parasites necessitates that the research community take an active role in understanding host-parasite infection biology in order to develop improved therapeutics. Recent advances in next-generation sequencing and the rapid development of publicly accessible genomic databases for many human pathogens have facilitated the application of systems biology to the study of host-parasite interactions. Over the past decade, these technologies have led to the discovery of many important biological processes governing parasitic disease. The integration and interpretation of high-throughput -omic data will undoubtedly generate extraordinary insight into host-parasite interaction networks essential to navigate the intricacies of these complex systems. As systems analysis continues to build the foundation for our understanding of host-parasite biology, this will provide the framework necessary to drive drug discovery research forward and accelerate the development of new antiparasitic therapies
Massive NGS data analysis reveals hundreds of potential novel gene fusions in human cell lines
Background:
Gene fusions derive from chromosomal rearrangements and the resulting chimeric transcripts are often endowed with oncogenic potential. Furthermore, they serve as diagnostic tools for the clinical classification of cancer subgroups with different prognosis and, in some cases, they can provide specific drug targets. So far, many efforts have been carried out to study gene fusion events occurring in tumor samples. In recent years, the availability of a comprehensive Next Generation Sequencing dataset for all the existing human tumor cell lines has provided the opportunity to further investigate these data in order to identify novel and still uncharacterized gene fusion events.
Results:
In our work, we have extensively reanalyzed 935 paired-end RNA-seq experiments downloaded from "The Cancer Cell Line Encyclopedia" repository, aiming at addressing novel putative cell-line specific gene fusion events in human malignancies. The bioinformatics analysis has been performed by the execution of four different gene fusion detection algorithms. The results have been further prioritized by running a bayesian classifier which makes an in silico validation. The collection of fusion events supported by all of the predictive softwares results in a robust set of ∼ 1,700 in-silico predicted novel candidates suitable for downstream analyses. Given the huge amount of data and information produced, computational results have been systematized in a database named LiGeA. The database can be browsed through a dynamical and interactive web portal, further integrated with validated data from other well known repositories. Taking advantage of the intuitive query forms, the users can easily access, navigate, filter and select the putative gene fusions for further validations and studies. They can also find suitable experimental models for a given fusion of interest.
Conclusions:
We believe that the LiGeA resource can represent not only the first compendium of both known and putative novel gene fusion events in the catalog of all of the human malignant cell lines, but it can also become a handy starting point for wet-lab biologists who wish to investigate novel cancer biomarkers and specific drug targets
PolyTB: a genomic variation map for Mycobacterium tuberculosis
Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest
Search algorithms as a framework for the optimization of drug combinations
Combination therapies are often needed for effective clinical outcomes in the
management of complex diseases, but presently they are generally based on
empirical clinical experience. Here we suggest a novel application of search
algorithms, originally developed for digital communication, modified to
optimize combinations of therapeutic interventions. In biological experiments
measuring the restoration of the decline with age in heart function and
exercise capacity in Drosophila melanogaster, we found that search algorithms
correctly identified optimal combinations of four drugs with only one third of
the tests performed in a fully factorial search. In experiments identifying
combinations of three doses of up to six drugs for selective killing of human
cancer cells, search algorithms resulted in a highly significant enrichment of
selective combinations compared with random searches. In simulations using a
network model of cell death, we found that the search algorithms identified the
optimal combinations of 6-9 interventions in 80-90% of tests, compared with
15-30% for an equivalent random search. These findings suggest that modified
search algorithms from information theory have the potential to enhance the
discovery of novel therapeutic drug combinations. This report also helps to
frame a biomedical problem that will benefit from an interdisciplinary effort
and suggests a general strategy for its solution.Comment: 36 pages, 10 figures, revised versio
An Ensemble Model of QSAR Tools for Regulatory Risk Assessment
Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. This feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study
- …