38,400 research outputs found

    A Multiple Classifier System Identifies Novel Cannabinoid CB2 Receptor Ligands

    Get PDF
    open access articleDrugs have become an essential part of our lives due to their ability to improve people’s health and quality of life. However, for many diseases, approved drugs are not yet available or existing drugs have undesirable side effects, making the pharmaceutical industry strive to discover new drugs and active compounds. The development of drugs is an expensive process, which typically starts with the detection of candidate molecules (screening) for an identified protein target. To this end, the use of high-performance screening techniques has become a critical issue in order to palliate the high costs. Therefore, the popularity of computer-based screening (often called virtual screening or in-silico screening) has rapidly increased during the last decade. A wide variety of Machine Learning (ML) techniques has been used in conjunction with chemical structure and physicochemical properties for screening purposes including (i) simple classifiers, (ii) ensemble methods, and more recently (iii) Multiple Classifier Systems (MCS). In this work, we apply an MCS for virtual screening (D2-MCS) using circular fingerprints. We applied our technique to a dataset of cannabinoid CB2 ligands obtained from the ChEMBL database. The HTS collection of Enamine (1.834.362 compounds), was virtually screened to identify 48.432 potential active molecules using D2-MCS. This list was subsequently clustered based on circular fingerprints and from each cluster, the most active compound was maintained. From these, the top 60 were kept, and 21 novel compounds were purchased. Experimental validation confirmed six highly active hits (>50% displacement at 10 ÎŒM and subsequent Ki determination) and an additional five medium active hits (>25% displacement at 10 ÎŒM). D2-MCS hence provided a hit rate of 29% for highly active compounds and an overall hit rate of 52%

    Assessment of metabolomic and proteomic biomarkers in detection and prognosis of progression of renal function in chronic kidney disease

    Get PDF
    Chronic kidney disease (CKD) is part of a number of systemic and renal diseases and may reach epidemic proportions over the next decade. Efforts have been made to improve diagnosis and management of CKD. We hypothesised that combining metabolomic and proteomic approaches could generate a more systemic and complete view of the disease mechanisms. To test this approach, we examined samples from a cohort of 49 patients representing different stages of CKD. Urine samples were analysed for proteomic changes using capillary electrophoresis-mass spectrometry and urine and plasma samples for metabolomic changes using different mass spectrometry-based techniques. The training set included 20 CKD patients selected according to their estimated glomerular filtration rate (eGFR) at mild (59.9±16.5 mL/min/1.73 m2; n = 10) or advanced (8.9±4.5 mL/min/1.73 m2; n = 10) CKD and the remaining 29 patients left for the test set. We identified a panel of 76 statistically significant metabolites and peptides that correlated with CKD in the training set. We combined these biomarkers in different classifiers and then performed correlation analyses with eGFR at baseline and follow-up after 2.8±0.8 years in the test set. A solely plasma metabolite biomarker-based classifier significantly correlated with the loss of kidney function in the test set at baseline and follow-up (ρ = −0.8031; p<0.0001 and ρ = −0.6009; p = 0.0019, respectively). Similarly, a urinary metabolite biomarker-based classifier did reveal significant association to kidney function (ρ = −0.6557; p = 0.0001 and ρ = −0.6574; p = 0.0005). A classifier utilising 46 identified urinary peptide biomarkers performed statistically equivalent to the urinary and plasma metabolite classifier (ρ = −0.7752; p<0.0001 and ρ = −0.8400; p<0.0001). The combination of both urinary proteomic and urinary and plasma metabolic biomarkers did not improve the correlation with eGFR. In conclusion, we found excellent association of plasma and urinary metabolites and urinary peptides with kidney function, and disease progression, but no added value in combining the different biomarkers data

    Evaluation of classical machine learning techniques towards urban sound recognition embedded systems

    Get PDF
    Automatic urban sound classification is a desirable capability for urban monitoring systems, allowing real-time monitoring of urban environments and recognition of events. Current embedded systems provide enough computational power to perform real-time urban audio recognition. Using such devices for the edge computation when acting as nodes of Wireless Sensor Networks (WSN) drastically alleviates the required bandwidth consumption. In this paper, we evaluate classical Machine Learning (ML) techniques for urban sound classification on embedded devices with respect to accuracy and execution time. This evaluation provides a real estimation of what can be expected when performing urban sound classification on such constrained devices. In addition, a cascade approach is also proposed to combine ML techniques by exploiting embedded characteristics such as pipeline or multi-thread execution present in current embedded devices. The accuracy of this approach is similar to the traditional solutions, but provides in addition more flexibility to prioritize accuracy or timing

    Transcription Factor-DNA Binding Via Machine Learning Ensembles

    Full text link
    We present ensemble methods in a machine learning (ML) framework combining predictions from five known motif/binding site exploration algorithms. For a given TF the ensemble starts with position weight matrices (PWM's) for the motif, collected from the component algorithms. Using dimension reduction, we identify significant PWM-based subspaces for analysis. Within each subspace a machine classifier is built for identifying the TF's gene (promoter) targets (Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool. Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string) feature PWM-based subspaces that stand out in identifying gene targets. We approach Problem 3 (binding sites) with a novel machine learning approach that uses promoter string features and ML importance scores in a classification algorithm locating binding sites across the genome. For target gene identification this method improves performance (measured by the F1 score) by about 10 percentage points over the (a) motif scanning method and (b) the coexpression-based association method. Top motif outperformed 5 component algorithms as well as two other common algorithms (BEST and DEME). For identifying individual binding sites on a benchmark cross species database (Tompa et al., 2005) we match the best performer without much human intervention. It also improved the performance on mammalian TFs. The ensemble can integrate orthogonal information from different weak learners (potentially using entirely different types of features) into a machine learner that can perform consistently better for more TFs. The TF gene target identification component (problem 1 above) is useful in constructing a transcriptional regulatory network from known TF-target associations. The ensemble is easily extendable to include more tools as well as future PWM-based information.Comment: 33 page

    Evidence-Based Detection of Pancreatic Canc

    Get PDF
    This study is an effort to develop a tool for early detection of pancreatic cancer using evidential reasoning. An evidential reasoning model predicts the likelihood of an individual developing pancreatic cancer by processing the outputs of a Support Vector Classifier, and other input factors such as smoking history, drinking history, sequencing reads, biopsy location, family and personal health history. Certain features of the genomic data along with the mutated gene sequence of pancreatic cancer patients was obtained from the National Cancer Institute (NIH) Genomic Data Commons (GDC). This data was used to train the SVC. A prediction accuracy of ~85% with a ROC AUC of 83.4% was achieved. Synthetic data was assembled in different combinations to evaluate the working of evidential reasoning model. Using this, variations in the belief interval of developing pancreatic cancer are observed. When the model is provided with an input of high smoking history and family history of cancer, an increase in the evidential reasoning interval in belief of pancreatic cancer and support in the machine learning model prediction is observed. Likewise, decrease in the quantity of genetic material and an irregularity in the cellular structure near the pancreas increases support in the machine learning classifier’s prediction of having pancreatic cancer. This evidence-based approach is an attempt to diagnose the pancreatic cancer at a premalignant stage. Future work includes using the real sequencing reads as well as accurate habits and real medical and family history of individuals to increase the efficiency of the evidential reasoning model. Next steps also involve trying out different machine learning models to observe their performance on the dataset considered in this study

    Machine learning techniques applied to multiband spectrum sensing in cognitive radios

    Get PDF
    This research received funding of the Mexican National Council of Science and Technology (CONACYT), Grant (no. 490180). Also, this work was supported by the Program for Professional Development Teacher (PRODEP).In this work, three specific machine learning techniques (neural networks, expectation maximization and k-means) are applied to a multiband spectrum sensing technique for cognitive radios. All of them have been used as a classifier using the approximation coefficients from a Multiresolution Analysis in order to detect presence of one or multiple primary users in a wideband spectrum. Methods were tested on simulated and real signals showing a good performance. The results presented of these three methods are effective options for detecting primary user transmission on the multiband spectrum. These methodologies work for 99% of cases under simulated signals of SNR higher than 0 dB and are feasible in the case of real signalsPeer ReviewedPostprint (published version
    • 

    corecore