25 research outputs found

    p3d – Python module for structural bioinformatics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code.</p> <p>Results</p> <p>p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures.</p> <p>Conclusion</p> <p>p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.</p

    MetalPDB: a database of metal sites in biological macromolecular structures

    Get PDF
    We present here MetalPDB (freely accessible a

    Identification of Key Residues for pH Dependent Activation of Violaxanthin De-Epoxidase from Arabidopsis thaliana

    Get PDF
    Plants are often exposed to saturating light conditions, which can lead to oxidative stress. The carotenoid zeaxanthin, synthesized from violaxanthin by Violaxanthin De-Epoxidase (VDE) plays a major role in the protection from excess illumination. VDE activation is triggered by a pH reduction in the thylakoids lumen occurring under saturating light. In this work the mechanism of the VDE activation was investigated on a molecular level using multi conformer continuum electrostatic calculations, site directed mutagenesis and molecular dynamics. The pKa values of residues of the inactive VDE were determined to identify target residues that could be implicated in the activation. Five such target residues were investigated closer by site directed mutagenesis, whereas variants in four residues (D98, D117, H168 and D206) caused a reduction in enzymatic activity indicating a role in the activation of VDE while D86 mutants did not show any alteration. The analysis of the VDE sequence showed that the four putative activation residues are all conserved in plants but not in diatoms, explaining why VDE in these algae is already activated at higher pH. Molecular dynamics showed that the VDE structure was coherent at pH 7 with a low amount of water penetrating the hydrophobic barrel. Simulations carried out with the candidate residues locked into their protonated state showed instead an increased amount of water penetrating the barrel and the rupture of the H121–Y214 hydrogen bond at the end of the barrel, which is essential for VDE activation. These results suggest that VDE activation relies on a robust and redundant network, in which the four residues identified in this study play a major role

    Evolutionary Dynamics on Protein Bi-stability Landscapes Can Potentially Resolve Adaptive Conflicts

    Full text link
    Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bistable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149–21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed

    Early-detection and classification of live bacteria using time-lapse coherent imaging and deep learning

    Full text link
    We present a computational live bacteria detection system that periodically captures coherent microscopy images of bacterial growth inside a 60 mm diameter agar-plate and analyzes these time-lapsed holograms using deep neural networks for rapid detection of bacterial growth and classification of the corresponding species. The performance of our system was demonstrated by rapid detection of Escherichia coli and total coliform bacteria (i.e., Klebsiella aerogenes and Klebsiella pneumoniae subsp. pneumoniae) in water samples. These results were confirmed against gold-standard culture-based results, shortening the detection time of bacterial growth by >12 h as compared to the Environmental Protection Agency (EPA)-approved analytical methods. Our experiments further confirmed that this method successfully detects 90% of bacterial colonies within 7-10 h (and >95% within 12 h) with a precision of 99.2-100%, and correctly identifies their species in 7.6-12 h with 80% accuracy. Using pre-incubation of samples in growth media, our system achieved a limit of detection (LOD) of ~1 colony forming unit (CFU)/L within 9 h of total test time. This computational bacteria detection and classification platform is highly cost-effective (~$0.6 per test) and high-throughput with a scanning speed of 24 cm2/min over the entire plate surface, making it highly suitable for integration with the existing analytical methods currently used for bacteria detection on agar plates. Powered by deep learning, this automated and cost-effective live bacteria detection platform can be transformative for a wide range of applications in microbiology by significantly reducing the detection time, also automating the identification of colonies, without labeling or the need for an expert.Comment: 24 pages, 6 figure

    Computational aspects of NMR in structural biology

    Get PDF

    Large-Scale Analysis of Protein-Ligand Binding Sites using the Binding MOAD Database.

    Full text link
    Current structure-based drug design (SBDD) methods require understanding of general tends of protein-ligand interactions. Informative descriptors of ligand-binding sites provide powerful heuristics to improve SBDD methods designed to infer function from protein structure. These descriptors must have a solid statistical foundation for assessing general trends in large sets of protein-ligand complexes. This dissertation focuses on mining the Binding MOAD database of highly curated protein-ligand complexes to determine frequently observed patterns of binding-site composition. An extension to Binding MOAD’s framework is developed to store structural details of binding sites and facilitate large-scale analysis. This thesis uses the framework to address three topics. It first describes a strategy for determining over-representation of amino acids within ligand-binding sites, comparing the trends of residue propensity for binding sites of biologically relevant ligands to those of spurious molecules with no known function. To determine the significance of these trends and to provide guidelines for residue-propensity studies, the effect of the data set size on the variation in propensity values is evaluated. Next, binding-site residue propensities are applied to improve the performance of a geometry-based, binding-site prediction algorithm. Propensity-based scores are found to perform comparably to the native score in successfully ranking correct predictions. For large proteins, propensity-based and consensus scores improve the scoring success. Finally, current protein-ligand scoring functions are evaluated using a new criterion: the ability to discern biologically relevant ligands from “opportunistic binders,” molecules present in crystal structures due to their high concentrations in the crystallization medium. Four different scoring functions are evaluated against a diverse benchmark set. All are found to perform well for ranking biologically relevant sites over spurious ones, and all performed best when penalties for torsional strain of ligands were included. The final chapter describes a structural alignment method, termed HwRMSD, which can align proteins of very low sequence homology based on their structural similarity using a weighted structure superposition. The overall aims of the dissertation are to collect high-quality binding-site composition data within the largest available set of protein-ligand complexes and to evaluate the appropriate applications of this data to emerging methods for computational proteomics.Ph.D.BioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91400/1/nickolay_1.pd

    Natural Language Processing and Temporal Information Extraction in Emergency Department Triage Notes

    Get PDF
    Electronic patient records, including the Emergency Department (ED) Triage Note (TN), provide a rich source of textual information. Processing clinical texts to create important pieces of structured information will be useful to clinicians treating patients, clinicians in training, and researchers and practitioners in biosurveillance. This work applies natural language processing (NLP) and information extraction (IE) techniques to the TN genre of text. In particular, it presents the Triage Note Temporal Information Extraction System (TN-TIES), which combines a shallow parser, machine learned classifiers, and handwritten rules to identify, extract, and interpret temporal information in TNs in preparation for the automatic creation of a timeline of events leading up to a patient's visit to the ED. The success of TN-TIES suggests that NLP and IE techniques are appropriate for the genre and that the automatic production of a timeline of TN events is a realistic application

    Une approche par composants pour l'analyse visuelle interactive de résultats issus de simulations numériques

    Get PDF
    Component-based approaches are increasingly studied and used for the effective development of the applications in software engineering. They offer, on the one hand, safe architecture to developers, and on the other one, a separation of the various functional parts and particularly in the interactive scientific visualization applications. Modeling such applications enables the behavior description of each component and the global system’s actions. Moreover, the interactions between components are expressed through a communication schemes sometimes very complex with, for example, the possibility to lose messages to enhance performance. This thesis describes ComSA model (Component-based approach for Scientific Applications) that relies on a component-based approach dedicated to interactive and dynamic scientific visualization applications and its formalization in strict Colored FIFO Nets (sCFN). The main contributions of this thesis are, first, the definition of a set of tools to model the component’s behaviors and the various application communication policies. Second, providing some properties on the application to guarantee it starts properly. It is done by analyzing and detecting deadlocks. This ensures the liveness throughout the application execution. Finally, we present dynamic reconfiguration of visual analytics applications by adding or removing on the fly of a component without stopping the whole application. This reconfiguration minimizes the number of unavailable services.Les architectures par composants sont de plus en plus Ă©tudiĂ©es et utilisĂ©es pour le dĂ©veloppement efficace des applications en gĂ©nie logiciel. Elles offrent, d’un cĂŽtĂ©, une architecture claire aux dĂ©veloppeurs, et de l’autre, une sĂ©paration des diffĂ©rentes parties fonctionnelles et en particulier dans les applications de visualisation scientifique interactives. La modĂ©lisation de ces applications doit permettre la description des comportements de chaque composant et les actions globales du systĂšme. De plus, les interactions entre composants s’expriment par des schĂ©mas de communication qui peuvent ĂȘtre trĂšs complexes avec, par exemple, la possibilitĂ© de perdre des messages pour gagner en performance. Cette thĂšse dĂ©crit le modĂšle ComSA (Component-based approach for Scientific Applications) qui est basĂ© sur une approche par composants dĂ©diĂ©e aux applications de visualisation scientifique interactive et dynamique formalisĂ©e par les rĂ©seaux FIFO colorĂ©s stricts (sCFN). Les principales contributions de cette thĂšse sont dans un premier temps, un ensemble d’outils pour modĂ©liser les diffĂ©rents comportements des composants ainsi que les diffĂ©rentes politiques de communication au sein de l’application. Dans un second temps, la dĂ©finition de propriĂ©tĂ©s garantissant un dĂ©marrage propre de l’application en analysant et dĂ©tectant les blocages. Cela permet de garantir la vivacitĂ© tout au long de l’exĂ©cution de l’application. Finalement l’étude de la reconfiguration dynamique des applications d’analyse visuelle par ajout ou suppression Ă  la volĂ©e d’un composant sans arrĂȘter toute l’application. Cette reconfiguration permet de minimiser le nombre de services non disponibles
    corecore