109 research outputs found

    Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

    Get PDF
    Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences\u27 structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm

    Development of Computer-Aided Molecular Design Methods for Bioengineering Applications

    Get PDF
    Computer-aided molecular design (CAMD) offers a methodology for rational product design. The CAMD procedure consists of pre-design, design and post-design phases. CAMD was used to address two bioengineering problems: design of excipients for lyophilized protein formulations and design of ionic liquids for use in bioseparations. Protein stability remains a major concern during protein drug development. Lyophilization, or freeze-drying, is often sought to improve chemical stability. However, lyophilization can result in protein aggregation. Excipients, or additives, are included to stabilize proteins in lyophilized formulations. CAMD was used to rationally select or design excipients for lyophilized protein formulations. The use of solvents to aid separation is common in chemical processes. Ionic liquids offer a class of molecules with tunable properties that can be altered to find optimal solvents for a given application. CAMD was used to design ionic liquids for extractive distillation and in situ extractive fermentation processes. The pre-design phase involves experimental data gathering and problem formulation. When available, data was obtained from literature sources. For excipient design, data of percent protein monomer remaining post-lyophilization was measured for a variety of protein-excipient combinations. In problem formulation, the objective was to minimize the difference between the properties of the designed molecule and the target property values. Problem formulations resulted in either mixed-integer linear programs (MILPs) or mixed-integer non-linear programs (MINLPs). The design phase consists of the forward problem and the reverse problem. In the forward problem, linear quantitative structure-property relationships (QSPRs) were developed using connectivity indices. Chiral connectivity indices were used for excipient property models to improve fit and incorporate three-dimensional structural information. Descriptor selection methods were employed to find models that minimized Mallow's Cp statistic, obtaining models with good fit while avoiding overfitting. Cross-validation was performed to access predictive capabilities. Model development was also performed to develop group contribution models and non-linear QSPRs. A UNIFAC model was developed to predict the thermodynamic properties of ionic liquids. In the reverse problem of the design phase, molecules were proposed with optimal property values. Deterministic methods were used to design ionic liquids entrainers for azeotropic distillation. Tabu search, a stochastic optimization method, was applied to both ionic liquid and excipient design to provide novel molecular candidates. Tabu search was also compared to a genetic algorithm for CAMD applications. Tuning was performed using a test case to determine parameter values for both methods. After tuning, both stochastic methods were used with design cases to provide optimal excipient stabilizers for lyophilized protein formulations. Results suggested that the genetic algorithm provided a faster time to solution while the tabu search provides quality solutions more consistently. The post-design phase provides solution analysis and verification. Process simulation was used to evaluate the energy requirements of azeotropic separations using designed ionic liquids. Results demonstrated that less energy was required than processes using conventional entrainers or ionic liquids that were not optimally designed. Molecular simulation was used to guide protein formulation design and may prove to be a useful tool in post-design verification. Finally, prediction intervals were used for properties predicted from linear QSPRs to quantify the prediction error in the CAMD solutions. Overlapping prediction intervals indicate solutions with statistically similar property values. Prediction interval analysis showed that tabu search returns many results with statistically similar property values in the design of carbohydrate glass formers for lyophilized protein formulations. The best solutions from tabu search and the genetic algorithm were shown to be statistically similar for all design cases considered. Overall the CAMD method developed here provides a comprehensive framework for the design of novel molecules for bioengineering approaches

    Lower-energy conformers search of TPP-1 polypeptide via hybrid particle swarm optimization and genetic algorithm

    Get PDF
    Low-energy conformation search on biological macromolecules remains a challenge in biochemical experiments and theoretical studies. Finding efficient approaches to minimize the energy of peptide structures is critically needed for researchers either studying peptide-protein interactions or designing peptide drugs. In this study, we aim to develop a heuristic-based algorithm to efficiently minimize a promising PD-L1 inhibiting polypeptide, TPP-1, and build its low-energy conformer pool to advance its subsequent structure optimization and molecular docking studies. Through our study, we find that, using backbone dihedral angles as the decision variables, both PSO and GA can outperform other existing heuristic approaches in optimizing the structure of Met-enkephalin, a benchmarking pentapeptide for evaluating the efficiency of conformation optimizers. Using the established algorithm pipeline, hybridizing PSO and GA minimized TPP-1 structure efficiently and a low-energy pool was built with an acceptable computational cost (a couple days using a single laptop). Remarkably, the efficiency of hybrid PSO-GA is hundreds-fold higher than the conventional Molecular Dynamic simulations running under the force filed. Meanwhile, the stereo-chemical quality of the minimized structures was validated using Ramachandran plot. In summary, hybrid PSO-GA minimizes TPP-1 structure efficiently and yields a low-energy conformer pool within a reasonably short time period. Overall, our approach can be extended to biochemical research to speed up the peptide conformation determinations and hence can facilitate peptide-involved drug development

    Bionano-Interfaces through Peptide Design

    Get PDF
    The clinical success of restoring bone and tooth function through implants critically depends on the maintenance of an infection-free, integrated interface between the host tissue and the biomaterial surface. The surgical site infections, which are the infections within one year of surgery, occur in approximately 160,000-300,000 cases in the US annually. Antibiotics are the conventional treatment for the prevention of infections. They are becoming ineffective due to bacterial antibiotic-resistance from their wide-spread use. There is an urgent need both to combat bacterial drug resistance through new antimicrobial agents and to limit the spread of drug resistance by limiting their delivery to the implant site. This work aims to reduce surgical site infections from implants by designing of chimeric antimicrobial peptides to integrate a novel and effective delivery method. In recent years, antimicrobial peptides (AMPs) have attracted interest as natural sources for new antimicrobial agents. By being part of the immune system in all life forms, they are examples of antibacterial agents with successfully maintained efficacy across evolutionary time. Both natural and synthetic AMPs show significant promise for solving the antibiotic resistance problems. In this work, AMP1 and AMP2 was shown to be active against three different strains of pathogens in Chapter 4. In the literature, these peptides have been shown to be effective against multi-drug resistant bacteria. However, their effective delivery to the implantation site limits their clinical use. In recent years, different groups adapted covalent chemistry-based or non-specific physical adsorption methods for antimicrobial peptide coatings on implant surfaces. Many of these procedures use harsh chemical conditions requiring multiple reaction steps. Furthermore, none of these methods allow the orientation control of these molecules on the surfaces, which is an essential consideration for biomolecules. In the last few decades, solid binding peptides attracted high interest due to their material specificity and self-assembly properties. These peptides offer robust surface adsorption and assembly in diverse applications. In this work, a design method for chimeric antimicrobial peptides that can self-assemble and self-orient onto biomaterial surfaces was demonstrated. Three specific aims used to address this two-fold strategy of self-assembly and self-orientation are: 1) Develop classification and design methods using rough set theory and genetic algorithm search to customize antibacterial peptides; 2) Develop chimeric peptides by designing spacer sequences to improve the activity of antimicrobial peptides on titanium surfaces; 3) Verify the approach as an enabling technology by expanding the chimeric design approach to other biomaterials. In Aim 1, a peptide classification tool was developed because the selection of an antimicrobial peptide for an application was difficult among the thousands of peptide sequences available. A rule-based rough-set theory classification algorithm was developed to group antimicrobial peptides by chemical properties. This work is the first time that rough set theory has been applied to peptide activity analysis. The classification method on benchmark data sets resulted in low false discovery rates. The novel rough set theory method was combined with a novel genetic algorithm search, resulting in a method for customizing active antibacterial peptides using sequence-based relationships. Inspired by the fact that spacer sequences play critical roles between functional protein domains, in Aim 2, chimeric peptides were designed to combine solid binding functionality with antimicrobial functionality. To improve how these functions worked together in the same peptide sequence, new spacer sequences were engineered. The rough set theory method from Aim 1 was used to find structure-based relationships to discover new spacer sequences which improved the antimicrobial activity of the chimeric peptides. In Aim 3, the proposed approach is demonstrated as an enabling technology. In this work, calcium phosphate was tested and verified the modularity of the chimeric antimicrobial self-assembling peptide approach. Other chimeric peptides were designed for common biomaterials zirconia and urethane polymer. Finally, an antimicrobial peptide was engineered for a dental adhesive system toward applying spacer design concepts to optimize the antimicrobial activity

    Predicting structural determinants and Ligand poses in proteins involved in neurological diseases: bioinformatics and molecular simulation studies

    Get PDF
    Part I presents the computational tools used in this work: the comparative modeling and molecular docking approaches along with molecular dynamics. Part II presents structural predictions of Ca2+-binding domains in Ca2+-gated channels. A detailed description of the structure and function of these proteins can be found in the following Chapters. Chapter 4 focuses on human large conductance Ca2+- and voltage-gated potassium channel (hBKCa). Bioinformatics approaches and MD simulations were used to construct models of two domains important for Ca2+ binding and channel gating, namely the Regulator of Conductance for K+ (RCK1) and the so called calcium bowl. The relevance of these models for interpreting the available molecular biology data is then discussed. Chapter 5 deals with bestrophins, a recently discovered family of Cl 12 channels. Bestrophins feature a well conserved Asp-rich tract in their C-terminal part, which is homologous to Ca2+-binding motifs in calcium bowl of hBKCa. Based on these considerations, we constructed homology models of human bestrophin-1 Asp-rich domain. MD simulations and free energy calculations were used to identify Asp and Glu residues binding Ca2+ and to predict eects of their mutations to Ala. My work, performed in collaboration with C. Anselmi (SISSA/ISAS), was complemented by free energy calculations carried out by F. Pietrucci (SISSA/ISAS). Selected mutations were investigated by electrophysiological experiments performed by Prof. A. Menini, J. Rievaj, F. W. Grillo, and A. Boccaccio (SISSA/ISAS). The model of Asp-rich domain was then validated against experimental results. Part III is devoted to the prion protein. In this Part, Chapter 6 presents in vitro studies of D18scFv anti-prion effects performed by groups of Prof. C. Zurzolo (Institut Pasteur, Paris, France), Prof. G. Legname (SISSA/ISAS), L. Zentilin and M. Giacca (ICGEB, Trieste, Italy) and by Prof. S. B. Prusiner (Institute for Neurodegenerative Diseases, University of California San Francisco, U.S.A.) and structural prediction of a complex between the small antibody fragment (D18scFv) and PrPC. The complex was modeled using bioinformatics approaches. Initially, the D18scFv fragment alone was modeled based on a similar antibody-fragment template and then docked with prion protein. Based on this, interactions relevant for the recognition between the two proteins and for the mechanism of action of D18scFv are discussed. Chapter 7 describes a computational protocol for the design of ligands targeting cavity-less proteins, like most proteins involved in neurodegenerative diseases. Molecular docking methods are combined with MD simulations and free energy calculations using the metadynamics method [33, 34] to gain insights in ligand binding to such proteins, in our case to prion protein. We focused on a compound showing antiprion activity in vitro. Ligand-target interactions and ligand binding affinity as emerged by using our approach are compared with the available NMR data [35] and experimental constant of dissociation [35]. In this work, also other two students and one postdoc were involved beside myself, namely S. Bongarzone, G. Rossetti and X. Biarnes (SISSA/ISAS). Finally, the conclusions are drawn in the last Chapter. The thesis closes with the List of publications and with the Acknowledgments

    Development of genetic algorithm for optimisation of predicted membrane protein structures

    Get PDF
    Due to the inherent problems with their structural elucidation in the laboratory, the computational prediction of membrane protein structure is an essential step toward understanding the function of these leading targets for drug discovery. In this work, the development of a genetic algorithm technique is described that is able to generate predictive 3D structures of membrane proteins in an ab initio fashion that possess high stability and similarity to the native structure. This is accomplished through optimisation of the distances between TM regions and the end-on rotation of each TM helix. The starting point for the genetic algorithm is from the model of general TM region arrangement predicted using the TMRelate program. From these approximate starting coordinates, the TMBuilder program is used to generate the helical backbone 3D coordinates. The amino acid side chains are constructed using the MaxSprout algorithm. The genetic algorithm is designed to represent a TM protein structure by encoding each alpha carbon atom starting position, the starting atom of the initial residue of each helix, and operates by manipulating these starting positions. To evaluate each predicted structure, the SwissPDBViewer software (incorporating the GROMOS force field software) is employed to calculate the free potential energy. For the first time, a GA has been successfully applied to the problem of predicting membrane protein structure. Comparison between newly predicted structures (tests) and the native structure (control) indicate that the developed GA approach represents an efficient and fast method for refinement of predicted TM protein structures. Further enhancement of the performance of the GA allows the TMGA system to generate predictive structures with comparable energetic stability and reasonable structural similarity to the native structure

    Proteins synthesized in tobacco mosaic virus infected protoplasts

    Get PDF
    The study described here concerns the proteins, synthesized as a result of tobacco mosaic virus (TMV) multiplication in tobacco protoplasts and in cowpea protoplasts. The identification of proteins involved in the TMV infection, for instance in the virus RNA replication, helps to elucidate the infection process in the plant cell. Not only virus coded proteins, but possibly also host coded proteins may play a part in the TMV multiplication.Research on proteins encoded by the TMV RNA, carried out in cell-free protein synthesizing systems, has revealed that five polypeptides are synthesized under the direction of TMV (subgenomic) mRNAs (see table 1.2., chapter L). Whether the polypeptides, synthesized invitro with TMV RNA as messenger, are of functional significance for the TMV infection may only be determined by means of investigating TMV infected leaves and protoplasts.The TMV multiplication runs synchronously in all protoplasts that are infected. Therefore, proteins synthesized in small amounts upon infection, may be thus detected.The search for proteins sythesized in protoplasts as a result of TMV infection has long been hindered by the fact that various factors in the cultivation of the tobacco plants may adversely influence the quality of the protoplasts. The cultivation of the tobacco plants: Nicotiana Tabacum cv. L. Samsun, Samsun NN and Xanthi nc, could be standardized however, as described in chapter 2. When the tobacco plants were cultivated in this way, at least 50 % of the tobacco protoplasts could be infected with TMV and 70 % or more of the protoplasts survived the subsequent incubation period of 36 hours. This could be achieved every time the protoplasts were isolated. The intensity and quality of the light, the way of watering, the age of the tobacco plants and of the leaf, from which the protoplasts are isolated, among others, appeared to affect the quality of the protoplasts (chapter 3.).The proteins, synthesized upon TMV infection, have to be distinguished among a great variety of host proteins. For this reason it is important to determine the incorporation of radioactive amino acids into protein synthesized as a result of TMV multiplication, in comparison with the incorporation into host proteins that are formed independently from the virus infection. Therefore the specific activity of TMV coat protein (cpm/mg protein) and of the proteins of the 27,000 x g supernatant fraction, synthesized in infected tobacco protoplasts were compared. It appeared that the specific activity of TMV coat protein was at least four times higher than of the proteins in the 27,000 x g supernatant (chapter 4.).The proteins synthesized as a result of TMV multiplication were studied not only in tobacco protoplasts, but also in protoplasts from the primary leaves of cowpea ( Vigna unguiculata (L.) Walp. var. 'Blackeye Early Ramshorn'). The method used for the infection of tobacco protoplasts with TMV was not suitable for the infection of cowpea protoplasts with TMV. Best results were obtained when both protoplasts and virus were incubated in the presence of poly-D-lysine, for 7.5 min. before infection. The protoplasts were pre-incubated in 0.1 M potassium phosphate buffer (pH 5.4) at 0°C, at a concentration of 4 x 10 5 protoplasts/mI and 0.75 μg poly-D-lysine/ml. TMV was pre-incubated in the same buffer at room temperature at a concentration of 2 μg TMV/mI and 2 μg poly-D-lysine/ml. During infection the cowpea protoplasts were incubated together with TMV and poly-D-lysine in a concentration of 2 x 10 5 protoplasts/ml, 1 μg TMV/ml and 1 μg poly-D-lysine/ml, for 7.5 min, in the buffer mentioned above at 0°C. In this way 50 to 70 % of the cowpea protoplasts could be infected with TMV.The course of TMV synthesis in cowpea protoplasts was comparable with that in tobacco protoplasts. The TMV multiplication in cowpea protoplasts was preceeded, however, by a period of 16 hours, during which the increase of TMV is slight, while the TMV multiplication in tobacco protoplasts was preceeded by a lag period of 8 hours. A possible explanation is that a much smaller amount of TMV particles penetrates into cowpea protoplasts during inoculation and/or starts to multiply than is the case in tobacco protoplasts (chapter 5.).The proteins of TMV infected and mock-infected protoplasts were analysed therupon by means of SDS-polyacrylamide slabgel electrophoresis and the polypeptide patterns were visualized by autoradiography.Ten polypeptides were distinguished, which are synthesized as a result of TMV multiplication in polypeptide patterns of proteins from infected tobacco protoplasts. The molecular weights were estimated to be 260,000, 240,000, 170,000, 116,500, 96,000, 90,000, 82,000, 72,000, 30,000 and 17,500 (coat protein). Polypeptides of similar molecular weight were absent or were present to much less extent in polypeptide patterns of proteins from mock-infected tobacco protoplasts. Many polypeptides were observed for reason that the detection capacity was improved by means of subcellular fractionation of the protoplast homogenates.The polypeptides of molecular weight 170,000, 116,500, 72,000 and coat protein were present in the 31,000 x g supernatant fraction and the pellet fractions as well. The polypeptide of molecular weight of 30,000 was present exclusively in the pellet fractions. The other polypeptides were observed exclusively in polypeptide patterns of protein of the 31,000 x g supernatant fraction (see table 6. l., chapter 6.).Eight polypeptides were observed, which were synthesized as a result of TMV multiplication in cowpea protoplasts. The molecular weights of the polypeptides were approximately 150,000, 116,500, 86,000, 72,000, 17,500 (coat protein), 16,000,14,000 and 10,000. Polypeptides of similar molecular weight were absent or present on a far less extent in polypeptide patterns of proteins from mockinfected cowpea protoplasts.The polypeptides of molecular weight 116,500, 72,000 and coat protein were present in the 3 1,000 xg pellet and 3 1,000 xg supernatant. The other polyeptides were present exclusively in the 3 1,000 xg supernatant (table 7. l., chapter 7.).It was assumed that the TMV coded polypeptides are similar in different hosts and, on the other hand, that the host polypeptides, synthesized upon TMV infection differ from host to host. When the TMV specific polypeptides, synthesized in infected tobacco protoplasts were compared with the specific polypeptides synthesized in TMV infected cowpea protoplasts, it appeared that only the polypeptides of molecular weight 116,500, 72,000 and coat protein are of similar size in both hosts (table 7.2., chapter 7). This is an indication that not only the polypeptide of 116,500 daltons and coat protein are TMV coded polypeptides, but that also the polypeptide of 72,000 daltons is encoded in the TMV RNA. It has not been reported that a polypeptide of this size is observed when TMV RNAs are translated in cell-free protein synthesizing systems.A polypeptide of 170,000 daltons is synthesized in vitro under the direction of the TMV RNA. It appeared that the polypeptide synthesized in TMV infected tobacco leaves, has a slightly less electrophoretic mobility than the product of 170,000 daltons synthesized in vitro from TMV RNA as messenger. A polypeptide of similar electrophoretic mobility was present to a lesser extent in mockinfected tobacco protoplasts. Furthermore, a polypeptide of 170,000 daltons was not observed in TMV infected cowpea protoplasts. For these reasons it is likely, that the polypeptide of 170,000 daltons, synthesized in TMV infected tobacco protoplasts, is encoded in the genome of tobacco or is encoded in the TMV RNA, but then the polypeptide has no functional significance in the TMV multiplication process.Further the polypeptide of 30,000 was observed only in TMV infected tobacco protoplasts, whereas a polypeptide of similar molecular weight was shown to be synthesized in vitro from a TMV subgenomic mRNA. The polypeptide of 30,000 daltons was detected exclusively in the polypeptide patterns of protein from the pellet fractions of TMV infected tobacco protoplasts. Polypeptide patterns of protein from corresponding fractions of cowpea protoplasts had a predominant, grey background. Due to this the polypeptide of 30,000 daltons may not be distinguished in TMV infected cowpea protoplasts, whereas the polypeptide of 30,000 daltons synthesized in TMV infected tobacco protoplasts can in fact be a polypeptide coded by TMV RNA. The other polypeptides synthesized in infected tobacco protoplasts or cowpea protoplasts as a result of TMV multiplication are presumably synthesized under the genome of tobacco or cowpea respectively.Finally, it was attempted to examine in what way the polypeptides of 116,500 and 72,000 are involved in the TMV infection process. Both polypeptides were shown to be present in the 31,000 x g pellet of TMV infected tobacco and cowpea protoplasts. It was studied whether virus specific polypeptides of similar molecular weight can be observed in RNA-dependent RNA polymerase preparations isolated from the 31,000 x g pellet fraction of cowpea leaves infected with the cowpea strain of TMV (C-TMV). The RNA-dependent RNA polymerase preparations were isolated by extraction of the 31,000 x g pellet fraction and were further purified by means of subsequent DEAE-BioGel column chromatography and glycerol gradient centrifugation. The purification procedure used was the same procedure as described for the isolation of RNA-dependent RNA polymerase from cowpea leaves infected with cowpea mosiac virus (CPMV).Four specific polypeptides of molecular weight of 98,000, 90,000, 72,000 and 46,000 were distinguished in RNA-dependent RNA polymerase preparations from C-TMV infected cowpea leaves, after glycerol gradient purifications. A polypeptide of molecular weight 116,500 was not observed. Polypeptides of molecular weights 72,000 and 46,000 were not found and those of molecular weights 98,000 and 90,000 were distinguished to a less extent in polypeptide patterns of preparations isolated in exactly the same way from mock-inoculated cowpea leaves.RNA-dependent RNA polymerase activity was also observed in preparations isolated from mock-inoculated cowpea leaves. The specific activity (cpm/mg protein) of the preparation from mock-inoculated leaves was one sixth of the specific activity of the RNA-dependent RNA polymerase preparations from CTMV infected cowpea leaves. The RNA-dependent RNA polymerase activity in C-TMV infected cowpea leaves might therefore be attributed to the increase of one or several polypeptides, present already before inoculation. Since it was thought that the polypeptide of 72,000 daltons is a TMV coded polypeptide, it was examined which specific polypeptides are present in RNA-dependent RNA polymerase preparations isolated in a similar way from CPMV infected cowpea leaves. It appeared, that in addition to CPMV specific polypeptides, the polypeptides of molecular weight 98,000 and 90,000 were also observed in RNAdependent RNA polymerase preparations from CPMV infected leaves. The polypeptides of 72,000 and 46,000 daltons were distinguished only in preparations isolated from C-TMV infected cowpea leaves. These results suggest that the polypeptide of 72,000 daltons in involved is the synthesis of TMV RNA (chapter 8.).<p/

    Molecular signatures (unique proteins and conserved indels) that are specific for the epsilon proteobacteria (Campylobacterales)

    Get PDF
    BACKGROUND: The epsilon proteobacteria, which include many important human pathogens, are presently recognized solely on the basis of their branching in rRNA trees. No unique molecular or biochemical characteristics specific for this group are known. RESULTS: Comparative analyses of proteins in the genomes of Wolinella succinogenes DSM 1740 and Campylobacter jejuni RM1221 against all available sequences have identified a large number of proteins that are unique to various epsilon proteobacteria (Campylobacterales), but whose homologs are not detected in other organisms. Of these proteins, 49 are uniquely found in nearly all sequenced epsilon-proteobacteria (viz. Helicobacter pylori (26695 and J99), H. hepaticus, C. jejuni (NCTC 11168, RM1221, HB93-13, 84-25, CF93-6, 260.94, 11168 and 81-176), C. lari, C. coli, C. upsaliensis, C. fetus, W. succinogenes DSM 1740 and Thiomicrospira denitrificans ATCC 33889), 11 are unique for the Wolinella and Helicobacter species (i.e. Helicobacteraceae family) and many others are specific for either some or all of the species within the Campylobacter genus. The primary sequences of many of these proteins are highly conserved and provide novel resources for diagnostics and therapeutics. We also report four conserved indels (i.e. inserts or deletions) in widely distributed proteins (viz. B subunit of exinuclease ABC, phenylalanyl-tRNA synthetase, RNA polymerase β '-subunit and FtsH protein) that are specific for either all epsilon proteobacteria or different subgroups. In addition, a rare genetic event that caused fusion of the genes for the largest subunits of RNA polymerase (rpoB and rpoC) in Wolinella and Helicobacter is also described. The inter-relationships amongst Campylobacterales as deduced from these molecular signatures are in accordance with the phylogenetic trees based on the 16S rRNA and concatenated sequences for nine conserved proteins. CONCLUSION: These molecular signatures provide novel tools for identifying and circumscribing species from the Campylobacterales order and its subgroups in molecular terms. Although sequence information for these signatures is presently limited to Campylobacterales species, it is likely that many of them will also be found in other epsilon proteobacteria. Functional studies on these proteins and conserved indels should reveal novel biochemical or physiological characteristics that are unique to these groups of epsilon proteobacteria

    Modeling Techniques for the High-Resolution Interpretation of Cryo-Electron Microscopy Reconstructions

    Get PDF
    Essential biological processes are governed by organized, dynamic interactions between multiple biomolecular systems. Complexes are thus formed to enable the biological function and get dissembled as the process is completed. Examples of such processes include the translation of the messenger RNA into protein by the ribosome, the folding of proteins by chaperonins or the entry of viruses in host cells. Understanding these fundamental processes by characterizing the molecular mechanisms that enable then, would allow the (better) design of therapies and drugs. Such molecular mechanisms may be revealed trough the structural elucidation of the biomolecular assemblies at the core of these processes. Various experimental techniques may be applied to investigate the molecular architecture of biomolecular assemblies. High-resolution techniques, such as X-ray crystallography, may solve the atomic structure of the system, but are typically constrained to biomolecules of reduced flexibility and dimensions. In particular, X-ray crystallography requires the sample to form a three dimensional (3D) crystal lattice which is technically di‑cult, if not impossible, to obtain, especially for large, dynamic systems. Often these techniques solve the structure of the different constituent components within the assembly, but encounter difficulties when investigating the entire system. On the other hand, imaging techniques, such as cryo-electron microscopy (cryo-EM), are able to depict large systems in near-native environment, without requiring the formation of crystals. The structures solved by cryo-EM cover a wide range of resolutions, from very low level of detail where only the overall shape of the system is visible, to high-resolution that approach, but not yet reach, atomic level of detail. In this dissertation, several modeling methods are introduced to either integrate cryo-EM datasets with structural data from X-ray crystallography, or to directly interpret the cryo-EM reconstruction. Such computational techniques were developed with the goal of creating an atomic model for the cryo-EM data. The low-resolution reconstructions lack the level of detail to permit a direct atomic interpretation, i.e. one cannot reliably locate the atoms or amino-acid residues within the structure obtained by cryo-EM. Thereby one needs to consider additional information, for example, structural data from other sources such as X-ray crystallography, in order to enable such a high-resolution interpretation. Modeling techniques are thus developed to integrate the structural data from the different biophysical sources, examples including the work described in the manuscript I and II of this dissertation. At intermediate and high-resolution, cryo-EM reconstructions depict consistent 3D folds such as tubular features which in general correspond to alpha-helices. Such features can be annotated and later on used to build the atomic model of the system, see manuscript III as alternative. Three manuscripts are presented as part of the PhD dissertation, each introducing a computational technique that facilitates the interpretation of cryo-EM reconstructions. The first manuscript is an application paper that describes a heuristics to generate the atomic model for the protein envelope of the Rift Valley fever virus. The second manuscript introduces the evolutionary tabu search strategies to enable the integration of multiple component atomic structures with the cryo-EM map of their assembly. Finally, the third manuscript develops further the latter technique and apply it to annotate consistent 3D patterns in intermediate-resolution cryo-EM reconstructions. The first manuscript, titled An assembly model for Rift Valley fever virus, was submitted for publication in the Journal of Molecular Biology. The cryo-EM structure of the Rift Valley fever virus was previously solved at 27Å-resolution by Dr. Freiberg and collaborators. Such reconstruction shows the overall shape of the virus envelope, yet the reduced level of detail prevents the direct atomic interpretation. High-resolution structures are not yet available for the entire virus nor for the two different component glycoproteins that form its envelope. However, homology models may be generated for these glycoproteins based on similar structures that are available at atomic resolutions. The manuscript presents the steps required to identify an atomic model of the entire virus envelope, based on the low-resolution cryo-EM map of the envelope and the homology models of the two glycoproteins. Starting with the results of the exhaustive search to place the two glycoproteins, the model is built iterative by running multiple multi-body refinements to hierarchically generate models for the different regions of the envelope. The generated atomic model is supported by prior knowledge regarding virus biology and contains valuable information about the molecular architecture of the system. It provides the basis for further investigations seeking to reveal different processes in which the virus is involved such as assembly or fusion. The second manuscript was recently published in the of Journal of Structural Biology (doi:10.1016/j.jsb.2009.12.028) under the title Evolutionary tabu search strategies for the simultaneous registration of multiple atomic structures in cryo-EM reconstructions. This manuscript introduces the evolutionary tabu search strategies applied to enable a multi-body registration. This technique is a hybrid approach that combines a genetic algorithm with a tabu search strategy to promote the proper exploration of the high-dimensional search space. Similar to the Rift Valley fever virus, it is common that the structure of a large multi-component assembly is available at low-resolution from cryo-EM, while high-resolution structures are solved for the different components but lack for the entire system. Evolutionary tabu search strategies enable the building of an atomic model for the entire system by considering simultaneously the different components. Such registration indirectly introduces spatial constrains as all components need to be placed within the assembly, enabling the proper docked in the low-resolution map of the entire assembly. Along with the method description, the manuscript covers the validation, presenting the benefit of the technique in both synthetic and experimental test cases. Such approach successfully docked multiple components up to resolutions of 40Å. The third manuscript is entitled Evolutionary Bidirectional Expansion for the Annotation of Alpha Helices in Electron Cryo-Microscopy Reconstructions and was submitted for publication in the Journal of Structural Biology. The modeling approach described in this manuscript applies the evolutionary tabu search strategies in combination with the bidirectional expansion to annotate secondary structure elements in intermediate resolution cryo-EM reconstructions. In particular, secondary structure elements such as alpha helices show consistent patterns in cryo-EM data, and are visible as rod-like patterns of high density. The evolutionary tabu search strategy is applied to identify the placement of the different alpha helices, while the bidirectional expansion characterizes their length and curvature. The manuscript presents the validation of the approach at resolutions ranging between 6 and 14Å, a level of detail where alpha helices are visible. Up to resolution of 12 Å, the method measures sensitivities between 70-100% as estimated in experimental test cases, i.e. 70-100% of the alpha-helices were correctly predicted in an automatic manner in the experimental data. The three manuscripts presented in this PhD dissertation cover different computation methods for the integration and interpretation of cryo-EM reconstructions. The methods were developed in the molecular modeling software Sculptor (http://sculptor.biomachina.org) and are available for the scientific community interested in the multi-resolution modeling of cryo-EM data. The work spans a wide range of resolution covering multi-body refinement and registration at low-resolution along with annotation of consistent patterns at high-resolution. Such methods are essential for the modeling of cryo-EM data, and may be applied in other fields where similar spatial problems are encountered, such as medical imaging
    • …
    corecore