11,826 research outputs found

    Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures

    Get PDF
    This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    Constraining the Number of Positive Responses in Adaptive, Non-Adaptive, and Two-Stage Group Testing

    Full text link
    Group testing is a well known search problem that consists in detecting the defective members of a set of objects O by performing tests on properly chosen subsets (pools) of the given set O. In classical group testing the goal is to find all defectives by using as few tests as possible. We consider a variant of classical group testing in which one is concerned not only with minimizing the total number of tests but aims also at reducing the number of tests involving defective elements. The rationale behind this search model is that in many practical applications the devices used for the tests are subject to deterioration due to exposure to or interaction with the defective elements. In this paper we consider adaptive, non-adaptive and two-stage group testing. For all three considered scenarios, we derive upper and lower bounds on the number of "yes" responses that must be admitted by any strategy performing at most a certain number t of tests. In particular, for the adaptive case we provide an algorithm that uses a number of "yes" responses that exceeds the given lower bound by a small constant. Interestingly, this bound can be asymptotically attained also by our two-stage algorithm, which is a phenomenon analogous to the one occurring in classical group testing. For the non-adaptive scenario we give almost matching upper and lower bounds on the number of "yes" responses. In particular, we give two constructions both achieving the same asymptotic bound. An interesting feature of one of these constructions is that it is an explicit construction. The bounds for the non-adaptive and the two-stage cases follow from the bounds on the optimal sizes of new variants of d-cover free families and (p,d)-cover free families introduced in this paper, which we believe may be of interest also in other contexts

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Learning Immune-Defectives Graph through Group Tests

    Full text link
    This paper deals with an abstraction of a unified problem of drug discovery and pathogen identification. Pathogen identification involves identification of disease-causing biomolecules. Drug discovery involves finding chemical compounds, called lead compounds, that bind to pathogenic proteins and eventually inhibit the function of the protein. In this paper, the lead compounds are abstracted as inhibitors, pathogenic proteins as defectives, and the mixture of "ineffective" chemical compounds and non-pathogenic proteins as normal items. A defective could be immune to the presence of an inhibitor in a test. So, a test containing a defective is positive iff it does not contain its "associated" inhibitor. The goal of this paper is to identify the defectives, inhibitors, and their "associations" with high probability, or in other words, learn the Immune Defectives Graph (IDG) efficiently through group tests. We propose a probabilistic non-adaptive pooling design, a probabilistic two-stage adaptive pooling design and decoding algorithms for learning the IDG. For the two-stage adaptive-pooling design, we show that the sample complexity of the number of tests required to guarantee recovery of the inhibitors, defectives, and their associations with high probability, i.e., the upper bound, exceeds the proposed lower bound by a logarithmic multiplicative factor in the number of items. For the non-adaptive pooling design too, we show that the upper bound exceeds the proposed lower bound by at most a logarithmic multiplicative factor in the number of items.Comment: Double column, 17 pages. Updated with tighter lower bounds and other minor edit

    Visual and computational analysis of structure-activity relationships in high-throughput screening data

    Get PDF
    Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets
    • …
    corecore