7,574 research outputs found

    Random forests with random projections of the output space for high dimensional multi-label classification

    Full text link
    We adapt the idea of random projections applied to the output space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage

    Drug–target interaction prediction with Bipartite Local Models and hubness-aware regression

    Get PDF
    Computational prediction of drug–target interactions is an essential task with various applications in the pharmaceutical industry, such as adverse effect prediction or drug repositioning. Recently, expert systems based on machine learning have been applied to drug–target interaction prediction. Although hubness-aware machine learning techniques are among the most promising approaches, their potential to enhance drug–target interaction prediction methods has not been exploited yet. In this paper, we extend the Bipartite Local Model (BLM), one of the most prominent interaction prediction methods. In particular, we use BLM with a hubness-aware regression technique, ECkNN. We represent drugs and targets in the similarity space with rich set of features (i.e., chemical, genomic and interaction features), and build a projection-based ensemble of BLMs. In order to assist reproducibility of our work as well as comparison to published results, we perform experiments on widely used publicly available drug-target interaction datasets. The results show that our approach outperforms state-of-the-art drug-target prediction techniques. Additionally, we demonstrate the feasibility of predictions from the point of view of applications

    Drug Target Interaction Prediction Using Machine Learning Techniques – A Review

    Get PDF
    Drug discovery is a key process, given the rising and ubiquitous demand for medication to stay in good shape right through the course of one’s life. Drugs are small molecules that inhibit or activate the function of a protein, offering patients a host of therapeutic benefits. Drug design is the inventive process of finding new medication, based on targets or proteins. Identifying new drugs is a process that involves time and money. This is where computer-aided drug design helps cut time and costs. Drug design needs drug targets that are a protein and a drug compound, with which the interaction between a drug and a target is established. Interaction, in this context, refers to the process of discovering protein binding sites, which are protein pockets that bind with drugs. Pockets are regions on a protein macromolecule that bind to drug molecules. Researchers have been at work trying to determine new Drug Target Interactions (DTI) that predict whether or not a given drug molecule will bind to a target. Machine learning (ML) techniques help establish the interaction between drugs and their targets, using computer-aided drug design. This paper aims to explore ML techniques better for DTI prediction and boost future research. Qualitative and quantitative analyses of ML techniques show that several have been applied to predict DTIs, employing a range of classifiers. Though DTI prediction improves with negative drug target pairs (DTP), the lack of true negative DTPs has led to the use a particular dataset of drugs and targets. Using dynamic DTPs improves DTI prediction. Little attention has so far been paid to developing a new classifier for DTI classification, and there is, unquestionably, a need for better ones

    Prediction of protein-protein interaction sites using an ensemble method

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved.</p> <p>Results</p> <p>In this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites.</p> <p>Conclusion</p> <p>Our experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance.</p

    Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics.

    Get PDF
    The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included

    Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery

    Full text link
    Recent research in representation learning utilizes large databases of proteins or molecules to acquire knowledge of drug and protein structures through unsupervised learning techniques. These pre-trained representations have proven to significantly enhance the accuracy of subsequent tasks, such as predicting the affinity between drugs and target proteins. In this study, we demonstrate that by incorporating knowledge graphs from diverse sources and modalities into the sequences or SMILES representation, we can further enrich the representation and achieve state-of-the-art results on established benchmark datasets. We provide preprocessed and integrated data obtained from 7 public sources, which encompass over 30M triples. Additionally, we make available the pre-trained models based on this data, along with the reported outcomes of their performance on three widely-used benchmark datasets for drug-target binding affinity prediction found in the Therapeutic Data Commons (TDC) benchmarks. Additionally, we make the source code for training models on benchmark datasets publicly available. Our objective in releasing these pre-trained models, accompanied by clean data for model pretraining and benchmark results, is to encourage research in knowledge-enhanced representation learning
    • …
    corecore