12 research outputs found

    ProMateus—an open research approach to protein-binding sites analysis

    Get PDF
    The development of bioinformatic tools by individual labs results in the abundance of parallel programs for the same task. For example, identification of binding site regions between interacting proteins is done using: ProMate, WHISCY, PPI-Pred, PINUP and others. All servers first identify unique properties of binding sites and then incorporate them into a predictor. Obviously, the resulting prediction would improve if the most suitable parameters from each of those predictors would be incorporated into one server. However, because of the variation in methods and databases, this is currently not feasible. Here, the protein-binding site prediction server is extended into a general protein-binding sites research tool, ProMateus. This web tool, based on ProMate's infrastructure enables the easy exploration and incorporation of new features and databases by the user, providing an evaluation of the benefit of individual features and their combination within a set framework. This transforms the individual research into a community exercise, bringing out the best from all users for optimized predictions. The analysis is demonstrated on a database of protein protein and protein-DNA interactions. This approach is basically different from that used in generating meta-servers. The implications of the open-research approach are discussed. ProMateus is available at http://bip.weizmann.ac.il/promate

    Comparison of Classifier Fusion Methods for Predicting Response to Anti HIV-1 Therapy

    Get PDF
    BACKGROUND: Analysis of the viral genome for drug resistance mutations is state-of-the-art for guiding treatment selection for human immunodeficiency virus type 1 (HIV-1)-infected patients. These mutations alter the structure of viral target proteins and reduce or in the worst case completely inhibit the effect of antiretroviral compounds while maintaining the ability for effective replication. Modern anti-HIV-1 regimens comprise multiple drugs in order to prevent or at least delay the development of resistance mutations. However, commonly used HIV-1 genotype interpretation systems provide only classifications for single drugs. The EuResist initiative has collected data from about 18,500 patients to train three classifiers for predicting response to combination antiretroviral therapy, given the viral genotype and further information. In this work we compare different classifier fusion methods for combining the individual classifiers. PRINCIPAL FINDINGS: The individual classifiers yielded similar performance, and all the combination approaches considered performed equally well. The gain in performance due to combining methods did not reach statistical significance compared to the single best individual classifier on the complete training set. However, on smaller training set sizes (200 to 1,600 instances compared to 2,700) the combination significantly outperformed the individual classifiers (p<0.01; paired one-sided Wilcoxon test). Together with a consistent reduction of the standard deviation compared to the individual prediction engines this shows a more robust behavior of the combined system. Moreover, using the combined system we were able to identify a class of therapy courses that led to a consistent underestimation (about 0.05 AUC) of the system performance. Discovery of these therapy courses is a further hint for the robustness of the combined system. CONCLUSION: The combined EuResist prediction engine is freely available at http://engine.euresist.org

    Selecting anti-HIV therapies based on a variety of genomic and clinical factors

    Get PDF
    Motivation: Optimizing HIV therapies is crucial since the virus rapidly develops mutations to evade drug pressure. Recent studies have shown that genotypic information might not be sufficient for the design of therapies and that other clinical and demographical factors may play a role in therapy failure. This study is designed to assess the improvement in prediction achieved when such information is taken into account. We use these factors to generate a prediction engine using a variety of machine learning methods and to determine which clinical conditions are most misleading in terms of predicting the outcome of a therapy. Results: Three different machine learning techniques were used: generative–discriminative method, regression with derived evolutionary features, and regression with a mixture of effects. All three methods had similar performances with an area under the receiver operating characteristic curve (AUC) of 0.77. A set of three similar engines limited to genotypic information only achieved an AUC of 0.75. A straightforward combination of the three engines consistently improves the prediction, with significantly better prediction when the full set of features is employed. The combined engine improves on predictions obtained from an online state-of-the-art resistance interpretation system. Moreover, engines tend to disagree more on the outcome of failure therapies than regarding successful ones. Careful analysis of the differences between the engines revealed those mutations and drugs most closely associated with uncertainty of the therapy outcome. Availability: The combined prediction engine will be available from July 2008, see http://engine.euresist.org Contact: [email protected]

    Learning curves for the individual classifiers, the mean combiner, and the combination on feature level.

    No full text
    <p>The figure shows the development of the mean AUC on the test set depending on the amount of available training data for the individual classifiers, the mean combiner, and the combination on the feature level using the minimal feature set. Error bars indicate the standard deviation on 10 repetitions.</p

    Summary of the EuResist Integrated Database (release 11/2007) and training and test set.

    No full text
    <p>The table displays the number of Patients, Sequences, VL measurements, and Therapies for the complete EuResist Integrated Database (EIDB) and the set of therapies that could be labeled with the definition. 469 of the sequences associated with all labeled therapies belong to historic genotypes and are not directly associated with a therapy change. Moreover, detailed information on training set and test set (comprising labeled therapies with an associated sequence) is given.</p

    Results for the individual classifiers on training set and test set.

    No full text
    <p>The table displays the performance, measured in AUC and Accuracy, achieved by the individual classifiers on the training set (using 10-fold cross validation; standard deviation in brackets) and the test set using different feature sets.</p
    corecore