15,087 research outputs found

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    Visual and computational analysis of structure-activity relationships in high-throughput screening data

    Get PDF
    Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

    The Evaluation Of Molecular Similarity And Molecular Diversity Methods Using Biological Activity Data

    Get PDF
    This paper reviews the techniques available for quantifying the effectiveness of methods for molecule similarity and molecular diversity, focusing in particular on similarity searching and on compound selection procedures. The evaluation criteria considered are based on biological activity data, both qualitative and quantitative, with rather different criteria needing to be used depending on the type of data available

    Scaffold searching: automated identification of similar ring systems for the design of combinatorial libraries

    Get PDF
    Rigid ring systems can be used to position receptor-binding functional groups in 3D space and they thus play an increasingly important role in the design of combinatorial libraries. This paper discusses the use of shape-similarity methods to identify ring systems that are structurally similar to, and aligned with, a user-defined target ring system. These systems can be used as alternative scaffolds for the construction of a combinatorial library

    Exploration of Reaction Pathways and Chemical Transformation Networks

    Full text link
    For the investigation of chemical reaction networks, the identification of all relevant intermediates and elementary reactions is mandatory. Many algorithmic approaches exist that perform explorations efficiently and automatedly. These approaches differ in their application range, the level of completeness of the exploration, as well as the amount of heuristics and human intervention required. Here, we describe and compare the different approaches based on these criteria. Future directions leveraging the strengths of chemical heuristics, human interaction, and physical rigor are discussed.Comment: 48 pages, 4 figure

    Exploration of the High Entropy Alloy Space as a Constraint Satisfaction Problem

    Get PDF
    High Entropy Alloys (HEAs), Multi-principal Component Alloys (MCA), or Compositionally Complex Alloys (CCAs) are alloys that contain multiple principal alloying elements. While many HEAs have been shown to have unique properties, their discovery has been largely done through costly and time-consuming trial-and-error approaches, with only an infinitesimally small fraction of the entire possible composition space having been explored. In this work, the exploration of the HEA composition space is framed as a Continuous Constraint Satisfaction Problem (CCSP) and solved using a novel Constraint Satisfaction Algorithm (CSA) for the rapid and robust exploration of alloy thermodynamic spaces. The algorithm is used to discover regions in the HEA Composition-Temperature space that satisfy desired phase constitution requirements. The algorithm is demonstrated against a new (TCHEA1) CALPHAD HEA thermodynamic database. The database is first validated by comparing phase stability predictions against experiments and then the CSA is deployed and tested against design tasks consisting of identifying not only single phase solid solution regions in ternary, quaternary and quinary composition spaces but also the identification of regions that are likely to yield precipitation-strengthened HEAs.Comment: 14 pages, 13 figure

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
    • …
    corecore