    A novel method for comparing topological models of protein structures enhanced with ligand information

    This article is available open access through the publisher’s website through the link below. Copyright @ 2008 The Authors.We introduce TOPS+ strings, a highly abstract string-based model of protein topology that permits efficient computation of structure comparison, and can optionally represent ligand information. In this model, we consider loops as secondary structure elements (SSEs) as well as helices and strands; in addition we represent ligands as first class objects. Interactions between SSEs and between SSEs and ligands are described by incoming/outgoing arcs and ligand arcs, respectively; and SSEs are annotated with arc interaction direction and type. We are able to abstract away from the ligands themselves, to give a model characterized by a regular grammar rather than the context sensitive grammar of the original TOPS model. Our TOPS+ strings model is sufficiently descriptive to obtain biologically meaningful results and has the advantage of permitting fast string-based structure matching and comparison as well as avoiding issues of Non-deterministic Polynomial time (NP)-completeness associated with graph problems. Our structure comparison method is computationally more efficient in identifying distantly related proteins than BLAST, CLUSTALW, SSAP and TOPS because of the compact and abstract string-based representation of protein structure which records both topological and biochemical information including the functionally important loop regions of the protein structures. The accuracy of our comparison method is comparable with that of TOPS. Also, we have demonstrated that our TOPS+ strings method out-performs the TOPS method for the ligand-dependent protein structures and provides biologically meaningful results. Availability: The TOPS+ strings comparison server is available from http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/topsplus.html.University of Glasgo

    An optimized TOPS+ comparison method for enhanced TOPS models

    This article has been made available through the Brunel Open Access Publishing Fund.Background Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. Results We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. Conclusions Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun

    Ligand-guided homology modeling drives identification of novel histamine H3 receptor ligands

    In this study, we report a ligand-guided homology modeling approach allowing the analysis of relevant binding site residue conformations and the identification of two novel histamine H3 receptor ligands with binding affinity in the nanomolar range. The newly developed method is based on exploiting an essential charge interaction characteristic for aminergic G-protein coupled receptors for ranking 3D receptor models appropriate for the discovery of novel compounds through virtual screening

    TOPS++FATCAT: Fast flexible structural alignment using constraints derived from TOPS+ Strings Model

    <p>Abstract</p> <p>Background</p> <p>Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses.</p> <p>Results</p> <p>We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions.</p> <p>Software Availability</p> <p>The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: <url>http://fatcat.burnham.org/TOPS/</url></p> <p>Conclusion</p> <p>TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.</p

    Sobiva omaduste profiiliga ühendite tuvastamine keemiliste struktuuride andmekogudest

    Keemiliste ühendite digitaalsete andmebaaside kasutuselevõtuga kaasneb vajadus leida neist arvutuslikke vahendeid kasutades sobivate omadustega molekule. Probleem on eriti huvipakkuv ravimitööstuses, kus aja- ja ressursimahukate katsete asendamine arvutustega, võimaldab märkimisväärset säästu. Kuigi tänapäevaste arvutusmeetodite piiratud võimsuse tõttu ei ole lähemas tulevikus võimalik kogu ravimidisaini protsessi algusest lõpuni arvutitesse ümber kolida, on lugu teine, kui vaadelda suuri andmekogusid. Arvutusmeetod, mis töötab teadaoleva statistilise vea piires, visates välja mõne sobiva ühendi ja lugedes mõni ekslikult aktiivseks, tihendab lõppkokkuvõttes andmekomplekti tuntaval määral huvitavate ühendite suhtes. Seetõttu on ravimiarenduse lihtsamate ja vähenõudlikkumade etappide puhul, nagu juhtühendite või ravimikandidaatide leidmine, edukalt võimalik rakendada arvutuslikke vahendeid. Selline tegevus on tuntud virtuaalsõelumisena ning käesolevasse töösse on sellest avarast ja kiiresti arenevast valdkonnast valitud mõningad suunad, ning uuritud nende võimekust ja tulemuslikkust erinevate projektide raames. Töö tulemusena on valminud arvutusmudelid teatud tüüpi ühendite HIV proteaasi vastase aktiivsuse ja tsütotoksilisuse hindamiseks; koostatud uus sõelumismeetod; leitud potentsiaalsed ligandid HIV proteaasile ja pöördtranskriptaasile; ning kokku pandud farmakokineetiliste filtritega eeltöödeldud andmekomplekt – mugav lähtepositsioon edasisteks töödeks.With the implementation of digital chemical compound libraries, creates the need for finding compounds from them that fit the desired profile. The problem is of particular interest in drug design, where replacing the resource-intensive experiments with computational methods, would result in significant savings in time and cost. Although due to the limitations of current computational methods, it is not possible in foreseeable future to transfer all of the drug development process into computers, it is a different story with large molecular databases. An in silico method, working within a known error margin, is still capable of significantly concentrating the data set in terms of attractive compounds. That allows the use of computational methods in less stringent steps of drug development, such as finding lead compounds or drug candidates. This approach is known as virtual screening, and today it is a vast and prospective research area comprising of several paradigms and numerous individual methods. The present thesis takes a closer look on some of them, and evaluates their performance in the course of several projects. The results of the thesis include computational models to estimate the HIV protease inhibition activity and cytotoxicity of certain type of compounds; a few prospective ligands for HIV protease and reverse transcriptase; pre-filtered dataset of compounds – convenient starting point for subsequent projects; and finally a new virtual screening method was developed

    Towards an Efficient Discovery of the Topological Representative Subgraphs

    With the emergence of graph databases, the task of frequent subgraph discovery has been extensively addressed. Although the proposed approaches in the literature have made this task feasible, the number of discovered frequent subgraphs is still very high to be efficiently used in any further exploration. Feature selection for graph data is a way to reduce the high number of frequent subgraphs based on exact or approximate structural similarity. However, current structural similarity strategies are not efficient enough in many real-world applications, besides, the combinatorial nature of graphs makes it computationally very costly. In order to select a smaller yet structurally irredundant set of subgraphs, we propose a novel approach that mines the top-k topological representative subgraphs among the frequent ones. Our approach allows detecting hidden structural similarities that existing approaches are unable to detect such as the density or the diameter of the subgraph. In addition, it can be easily extended using any user defined structural or topological attributes depending on the sought properties. Empirical studies on real and synthetic graph datasets show that our approach is fast and scalable

    Evolutionary and Functional Relationships in the Truncated Hemoglobin Family

    Predicting function from sequence is an important goal in current biological research, and although, broad functional assignment is possible when a protein is assigned to a family, predicting functional specificity with accuracy is not straightforward. If function is provided by key structural properties and the relevant properties can be computed using the sequence as the starting point, it should in principle be possible to predict function in detail. The truncated hemoglobin family presents an interesting benchmark study due to their ubiquity, sequence diversity in the context of a conserved fold and the number of characterized members. Their functions are tightly related to O2affinity and reactivity, as determined by the association and dissociation rate constants, both of which can be predicted and analyzed using in-silico based tools. In the present work we have applied a strategy, which combines homology modeling with molecular based energy calculations, to predict and analyze function of all known truncated hemoglobins in an evolutionary context. Our results show that truncated hemoglobins present conserved family features, but that its structure is flexible enough to allow the switch from high to low affinity in a few evolutionary steps. Most proteins display moderate to high oxygen affinities and multiple ligand migration paths, which, besides some minor trends, show heterogeneous distributions throughout the phylogenetic tree, again suggesting fast functional adaptation. Our data not only deepens our comprehension of the structural basis governing ligand affinity, but they also highlight some interesting functional evolutionary trends.Fil: Bustamante, Juan Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química, Física de los Materiales, Medioambiente y Energía. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química, Física de los Materiales, Medioambiente y Energía; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Inorgánica, Analítica y Química Física; ArgentinaFil: Radusky, Leandro Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Biológica; ArgentinaFil: Boechi, Leonardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; ArgentinaFil: Estrin, Dario Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química, Física de los Materiales, Medioambiente y Energía. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química, Física de los Materiales, Medioambiente y Energía; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Química Inorgánica, Analítica y Química Física; ArgentinaFil: Ten Have, Arjen. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Marti, Marcelo Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; Argentin