15,648 research outputs found
Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures
This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures
An optimized TOPS+ comparison method for enhanced TOPS models
This article has been made available through the Brunel Open Access Publishing Fund.Background
Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+.
Results
We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method.
Conclusions
Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun
Heuristics-Guided Exploration of Reaction Mechanisms
For the investigation of chemical reaction networks, the efficient and
accurate determination of all relevant intermediates and elementary reactions
is mandatory. The complexity of such a network may grow rapidly, in particular
if reactive species are involved that might cause a myriad of side reactions.
Without automation, a complete investigation of complex reaction mechanisms is
tedious and possibly unfeasible. Therefore, only the expected dominant reaction
paths of a chemical reaction network (e.g., a catalytic cycle or an enzymatic
cascade) are usually explored in practice. Here, we present a computational
protocol that constructs such networks in a parallelized and automated manner.
Molecular structures of reactive complexes are generated based on heuristic
rules derived from conceptual electronic-structure theory and subsequently
optimized by quantum chemical methods to produce stable intermediates of an
emerging reaction network. Pairs of intermediates in this network that might be
related by an elementary reaction according to some structural similarity
measure are then automatically detected and subjected to an automated search
for the connecting transition state. The results are visualized as an
automatically generated network graph, from which a comprehensive picture of
the mechanism of a complex chemical process can be obtained that greatly
facilitates the analysis of the whole network. We apply our protocol to the
Schrock dinitrogen-fixation catalyst to study alternative pathways of catalytic
ammonia production.Comment: 27 pages, 9 figure
Stochastic Constraint Programming
To model combinatorial decision problems involving uncertainty and
probability, we introduce stochastic constraint programming. Stochastic
constraint programs contain both decision variables (which we can set) and
stochastic variables (which follow a probability distribution). They combine
together the best features of traditional constraint satisfaction, stochastic
integer programming, and stochastic satisfiability. We give a semantics for
stochastic constraint programs, and propose a number of complete algorithms and
approximation procedures. Finally, we discuss a number of extensions of
stochastic constraint programming to relax various assumptions like the
independence between stochastic variables, and compare with other approaches
for decision making under uncertainty.Comment: Proceedings of the 15th Eureopean Conference on Artificial
Intelligenc
High-Throughput Inference of Protein-Protein Interaction Sites from Unassigned NMR Data by Analyzing Arrangements Induced By Quadratic Forms on 3-Manifolds
We cast the problem of identifying protein-protein interfaces, using only unassigned NMR spectra, into a geometric clustering problem. Identifying protein-protein interfaces is critical to understanding inter- and intra-cellular communication, and NMR allows the study of protein interaction in solution. However it is often the case that NMR studies of a protein complex are very time-consuming, mainly due to the bottleneck in assigning the chemical shifts, even if the apo structures of the constituent proteins are known. We study whether it is possible, in a high-throughput manner, to identify the interface region of a protein complex using only unassigned chemical shift and residual dipolar coupling (RDC) data. We introduce a geometric optimization problem where we must cluster the cells in an arrangement on the boundary of a 3-manifold. The arrangement is induced by a spherical quadratic form, which in turn is parameterized by SO(3)xR^2. We show that this formalism derives directly from the physics of RDCs. We present an optimal algorithm for this problem that runs in O(n^3 log n) time for an n-residue protein. We then use this clustering algorithm as a subroutine in a practical algorithm for identifying the interface region of a protein complex from unassigned NMR data. We present the results of our algorithm on NMR data for 7 proteins from 5 protein complexes and show that our approach is useful for high-throughput applications in which we seek to rapidly identify the interface region of a protein complex
Protein Structure Determination Using Chemical Shifts
In this PhD thesis, a novel method to determine protein structures using
chemical shifts is presented.Comment: Univ Copenhagen PhD thesis (2014) in Biochemistr
- …