1,775 research outputs found

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    Nanostructure determination from the pair distribution function: A parametric study of the INVERT approach

    Full text link
    We present a detailed study of the mechanism by which the INVERT method [Phys. Rev. Lett. 104, 125501] guides structure refinement of disordered materials. We present a number of different possible implementations of the central algorithm and explore the question of algorithm weighting. Our analysis includes quantification of the relative contributions of variance and fit-to-data terms during structure refinement, which leads us to study the roles of density fluctuations and configurational jamming in the RMC fitting process. We present a parametric study of the pair distribution function solution space for C60, a-Si and a-SiO2, which serves to highlight the difficulties faced in developing a transferable weighting scheme.Comment: 15 pages, 7 figures, formatted for JPCM (RMC issue

    Novel pharmacophore clustering methods for protein binding site comparison

    Full text link
    Proteins perform diverse functions within cells. Some of the functions depend on the protein being involved in a protein complex, interacting with other proteins or with other entities (ligands) through specific binding sites on their surface. Comparison of protein binding sites has potential benefits in many research fields, including drug promiscuity studies, polypharmacology and immunology. While multiple methods have been proposed for comparing binding sites, they tend to focus on comparing very similar proteins and have only been developed for small specific datasets or very targeted applications. None of these methods make use of the powerful representation afforded by 3D complex-based pharmacophores. A pharmacophore model provides a description of a binding site, consisting of a group of chemical features arranged in three-dimensional space, that can be used to represent biological activities. Two different pharmacophore comparison and clustering methods based on the Iterative Closest Point (ICP) algorithm are proposed: a 3-dimensional ICP pharmacophore clustering method, and an N-dimensional ICP pharmacophore clustering method. These methods are complemented by a series of data pre-processing methods for input data preparation. The implementation of the methods takes computational representations (pharmacophores) of single molecule or protein complexes as input and produces distance matrices that can be visualised as dendrograms. The methods integrate both alignment-dependent and alignment-independent concepts. Both clustering methods were successfully evaluated using a 31 globulin-binding steroid dataset and a 41 antibody-antigen dataset, and were able to handle a larger dataset of 159 protein homodimers. For the steroid dataset, the resulting classification of ligands shows good correspondence with a classification based on binding affinity. For the antibody-antigen dataset, the classification of antigens reflected both antigen type and binding antibody. The applications to homodimers demonstrated the ability of both clustering methods to handle a larger dataset, and the possibility to visualise N-D pairwise comparisons using structural superposition of binding sites

    Three dimensional shape comparison of flexible proteins using the local-diameter descriptor

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Techniques for inferring the functions of the protein by comparing their shape similarity have been receiving a lot of attention. Proteins are functional units and their shape flexibility occupies an essential role in various biological processes. Several shape descriptors have demonstrated the capability of protein shape comparison by treating them as rigid bodies. But this may give rise to an incorrect comparison of flexible protein shapes.</p> <p>Results</p> <p>We introduce an efficient approach for comparing flexible protein shapes by adapting a <it>local diameter </it>(LD) <it>descriptor</it>. The LD descriptor, developed recently to handle skeleton based shape deformations <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, is adapted in this work to capture the invariant properties of shape deformations caused by the motion of the protein backbone. Every sampled point on the protein surface is assigned a value measuring the diameter of the 3D shape in the neighborhood of that point. The LD descriptor is built in the form of a one dimensional histogram from the distribution of the diameter values. The histogram based shape representation reduces the shape comparison problem of the flexible protein to a simple distance calculation between 1D feature vectors. Experimental results indicate how the LD descriptor accurately treats the protein shape deformation. In addition, we use the LD descriptor for protein shape retrieval and compare it to the effectiveness of conventional shape descriptors. A sensitivity-specificity plot shows that the LD descriptor performs much better than the conventional shape descriptors in terms of consistency over a family of proteins and discernibility across families of different proteins.</p> <p>Conclusion</p> <p>Our study provides an effective technique for comparing the shape of flexible proteins. The experimental results demonstrate the insensitivity of the LD descriptor to protein shape deformation. The proposed method will be potentially useful for molecule retrieval with similar shapes and rapid structure retrieval for proteins. The demos and supplemental materials are available on <url>https://engineering.purdue.edu/PRECISE/LDD</url>.</p

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Information retrieval and mining in high dimensional databases

    Get PDF
    This dissertation is composed of two parts. In the first part, we present a framework for finding information (more precisely, active patterns) in three dimensional (3D) graphs. Each node in a graph is an undecoraposable or atomic unit and has a label. Edges are links between the atomic units. Patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. (When a pattern appears in a graph only after the graph has been modified, we call that appearance approximate occurrence. ) The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the label-triplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. First, we apply the method to locating frequently occurring motifs in two families of proteins pertaining to RNA-directed DNA Polymerase and Thymidylate Synthase, and use the motifs to classify the proteins. Then we apply the method to clustering chemical compounds pertaining to aromatic, bicyclicalkanes and photosynthesis. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering. We also extend our algorithms for processing a class of similarity queries in databases of 3D graphs. In the second part of the dissertation, we present an index structure, called MetricMap, that takes a set of objects and a distance metric and then maps those objects to a k-dimensional pseudo-Euclidean space in such a way that the distances among objects are approximately preserved. Our approach employs sampling and the calculation of eigenvalues and eigenvectors. The index structure is a useful tool for clustering and visualization in data intensive applications, because it replaces expensive distance calculations by sum-of-square calculations. This can make clustering in large databases with expensive distance metrics practical. We compare the index structure with another data mining index structure, FastMap, proposed by Faloutsos and Lin, according to two criteria: relative error and clustering accuracy. For relative error, we show that (i) FastMap gives a lower relative error than MetrieMap for Euclidean distances, (ii) MetricMap gives a lower relative error than Fast Map for non-Euclidean distances (i.e., general distance metrics), and (iii) combining the two reduces the error yet further. A similar result is obtained when comparing the accuracy of clustering. These results hold for different data sizes. The main qualitative conclusion is that these two index structures capture complenleiltary information about distance metrics and therefore can be used together to great benefit. The net effect is that multi-day computations can be done in minutes. We have implemented the proposed algorithms and the MetricMap index structure into a toolkit. This toolkit will be useful for data mining, visualization, and approximate retrieval in scientific, multimedia and high dimensional databases

    Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Computational comparison of two protein structures is the starting point of many methods that build on existing knowledge, such as structure modeling (including modeling of protein complexes and conformational changes), molecular replacement, or annotation by structural similarity. In a commonly used strategy, significant effort is invested in matching two sets of atoms. In a complementary approach, a global descriptor is assigned to the overall structure, thus losing track of the substructures within.</p> <p>Results</p> <p>Using a small set of geometric features, we define a reduced representation of protein structure, together with an optimizing function for matching two representations, to provide a pre-filtering stage in a database search. We show that, in a straightforward implementation, the representation performs well in terms of resolution in the space of protein structures, and its ability to make new predictions.</p> <p>Conclusions</p> <p>Perhaps unexpectedly, a substantial discriminating power already exists at the level of main features of protein structure, such as directions of secondary structural elements, possibly constrained by their sequential order. This can be used toward efficient comparison of protein (sub)structures, allowing for various degrees of conformational flexibility within the compared pair, which in turn can be used for modeling by homology of protein structure and dynamics.</p

    Computational Analysis of 3D Protein Structures

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore