51,569 research outputs found

    The Evaluation Of Molecular Similarity And Molecular Diversity Methods Using Biological Activity Data

    Get PDF
    This paper reviews the techniques available for quantifying the effectiveness of methods for molecule similarity and molecular diversity, focusing in particular on similarity searching and on compound selection procedures. The evaluation criteria considered are based on biological activity data, both qualitative and quantitative, with rather different criteria needing to be used depending on the type of data available

    Similarity-based virtual screening using 2D fingerprints

    Get PDF
    This paper summarises recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases related to the sizes of the molecules that are being sought. Group fusion involves combining the results of similarity searches based on multiple reference structures and a single similarity measure. We demonstrate the effectiveness of this approach to screening, and also describe an approximate form of group fusion, turbo similarity searching, that can be used when just a single reference structure is available

    Inductive queries for a drug designing robot scientist

    Get PDF
    It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments

    Discovering Valuable Items from Massive Data

    Full text link
    Suppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important problems such as multi-arm bandits, active search and the knapsack problem. We present an algorithm, GP-Select, which utilizes prior knowledge about similarity be- tween items, expressed as a kernel function. GP-Select uses Gaussian process prediction to balance exploration (estimating the unknown value of items) and exploitation (selecting items of high value). We extend GP-Select to be able to discover sets that simultaneously have high utility and are diverse. Our preference for diversity can be specified as an arbitrary monotone submodular function that quantifies the diminishing returns obtained when selecting similar items. Furthermore, we exploit the structure of the model updates to achieve an order of magnitude (up to 40X) speedup in our experiments without resorting to approximations. We provide strong guarantees on the performance of GP-Select and apply it to three real-world case studies of industrial relevance: (1) Refreshing a repository of prices in a Global Distribution System for the travel industry, (2) Identifying diverse, binding-affine peptides in a vaccine de- sign task and (3) Maximizing clicks in a web-scale recommender system by recommending items to users

    Natural selection and genetic variation in a promising Chagas disease drug target: Trypanosoma cruzi trans-sialidase

    Get PDF
    Rational drug design is a powerful method in which new and innovative therapeutics can be designed based on knowledge of the biological target aiming to provide more efficacious and responsible therapeutics. Understanding aspects of the targeted biological agent is important to optimize drug design and preemptively design to slow or avoid drug resistance. Chagas disease, an endemic disease for South and Central America and Mexico is caused by Trypanosoma cruzi, a protozoan parasite known to consist of six separate genetic clusters or DTUs (discrete typing units). Chagas disease therapeutics are problematic and a call for new therapeutics is widespread. Many researchers are working to use rational drug design for developing Chagas drugs and one potential target that receives a lot of attention is the T. cruzi trans-sialidase protein. Trans-sialidase is a nuclear gene that has been shown to be associated with virulence. In T. cruzi, trans-sialidase (TcTS) codes for a protein that catalyzes the transfer of sialic acid from a mammalian host coating the parasitic surface membrane to avoid immuno-detection. Variance in disease pathology depends somewhat on T. cruzi DTU, as well, there is considerable genetic variation within DTUs. However, the role of TcTS in pathology variance among and within DTU’s is not well understood despite numerous studies of TcTS. These previous studies include determining the crystalline structure of TcTS as well as the TS protein structure in other trypanosomes where the enzyme is often inactive. However, no study has examined the role of natural selection in genetic variation in TcTS. In order to understand the role of natural selection in TcTS DNA sequence and protein variation, we sequenced 540 bp of the TcTS gene from 48 insect vectors. Because all 48 sequences had multiple polymorphic bases, we examined cloned sequences from two of the insect vectors. The data are analyzed to understand the role of natural selection in shaping genetic variation in TcTS and interpreted in light of the possible role of TcTS as a drug target

    Enhancing the effectiveness of ligand-based virtual screening using data fusion

    Get PDF
    Data fusion is being increasingly used to combine the outputs of different types of sensor. This paper reviews the application of the approach to ligand-based virtual screening, where the sensors to be combined are functions that score molecules in a database on their likelihood of exhibiting some required biological activity. Much of the literature to date involves the combination of multiple similarity searches, although there is also increasing interest in the combination of multiple machine learning techniques. Both approaches are reviewed here, focusing on the extent to which fusion can improve the effectiveness of searching when compared with a single screening mechanism, and on the reasons that have been suggested for the observed performance enhancement

    Patterns and rates of viral evolution in HIV-1 subtype B infected females and males.

    Get PDF
    Biological sex differences affect the course of HIV infection, with untreated women having lower viral loads compared to their male counterparts but, for a given viral load, women have a higher rate of progression to AIDS. However, the vast majority of data on viral evolution, a process that is clearly impacted by host immunity and could be impacted by sex differences, has been derived from men. We conducted an intensive analysis of HIV-1 gag and env-gp120 evolution taken over the first 6-11 years of infection from 8 Women's Interagency HIV Study (WIHS) participants who had not received combination antiretroviral therapy (ART). This was compared to similar data previously collected from men, with both groups infected with HIV-1 subtype B. Early virus populations in men and women were generally homogenous with no differences in diversity between sexes. No differences in ensuing nucleotide substitution rates were found between the female and male cohorts studied herein. As previously reported for men, time to peak diversity in env-gp120 in women was positively associated with time to CD4+ cell count below 200 (P = 0.017), and the number of predicted N-linked glycosylation sites generally increased over time, followed by a plateau or decline, with the majority of changes localized to the V1-V2 region. These findings strongly suggest that the sex differences in HIV-1 disease progression attributed to immune system composition and sensitivities are not revealed by, nor do they impact, global patterns of viral evolution, the latter of which proceeds similarly in women and men
    • …
    corecore