338 research outputs found

    Template Based Modeling and Structural Refinement of Protein-Protein Interactions.

    Full text link
    Determining protein structures from sequence is a fundamental problem in molecular biology, as protein structure is essential to understanding protein function. In this study, I developed one of the first fully automated pipelines for template based quaternary structure prediction starting from sequence. Two critical steps for template based modeling are identifying the correct homologous structures by threading which generates sequence to structure alignments and refining the initial threading template coordinates closer to the native conformation. I developed SPRING (single-chain-based prediction of interactions and geometries), a monomer threading to dimer template mapping program, which was compared to the dimer co-threading program, COTH, using 1838 non homologous target complex structures. SPRING’s similarity score outperformed COTH in the first place ranking of templates, correctly identifying 798 and 527 interfaces respectively. More importantly the results were found to be complementary and the programs could be combined in a consensus based threading program showing a 5.1% improvement compared to SPRING. Template based modeling requires a structural analog being present in the PDB. A full search of the PDB, using threading and structural alignment, revealed that only 48.7% of the PDB has a suitable template whereas only 39.4% of the PDB has templates that can be identified by threading. In order to circumvent this, I included intramolecular domain-domain interfaces into the PDB library to boost template recognition of protein dimers; the merging of the two classes of interfaces improved recognition of heterodimers by 40% using benchmark settings. Next the template based assembly of protein complexes pipeline, TACOS, was created. The pipeline combines threading templates and domain knowledge from the PDB into a knowledge based energy score. The energy score is integrated into a Monte Carlo sampling simulation that drives the initial template closer to the native topology. The full pipeline was benchmarked using 350 non homologous structures and compared to two state of the art programs for dimeric structure prediction: ZDOCK and MODELLER. On average, TACOS models global and interface structure have a better quality than the models generated by MODELLER and ZDOCK.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135847/1/bgovi_1.pd

    New Methods to Improve Protein Structure Modeling

    Get PDF
    Proteins are considered the central compound necessary for life, as they play a crucial role in governing several life processes by performing the most essential biological and chemical functions in every living cell. Understanding protein structures and functions will lead to a significant advance in life science and biology. Such knowledge is vital for various fields such as drug development and synthetic biofuels production. Most proteins have definite shapes that they fold into, which are the most stable state they can adopt. Due to the fact that the protein structure information provides important insight into its functions, many research efforts have been conducted to determine the protein 3-dimensional structure from its sequence. The experimental methods for protein 3-dimensional structure determination are often time-consuming, costly, and even not feasible for some proteins. Accordingly, recent research efforts focus more and more on computational approaches to predict protein 3-dimensional structures. Template-based modeling is considered one of the most accurate protein structure prediction methods. The success of template-based modeling relies on correctly identifying one or a few experimentally determined protein structures as structural templates that are likely to resemble the structure of the target sequence as well as accurately producing a sequence alignment that maps the residues in the target sequence to those in the template. In this work, we aim at improving the template-based protein structure modeling by enhancing the correctness of identifying the most appropriate templates and precisely aligning the target and template sequences. Firstly, we investigate employing inter-residue contact score to measure the favorability of a target sequence fitting in the folding topology of a certain template. Secondly, we design a multi-objective alignment algorithm extending the famous Needleman-Wunsch algorithm to obtain a complete set of alignments yielding Pareto optimality. Then, we use protein sequence and structural information as objectives and generate the complete Pareto optimal front of alignments between target sequence and template. The alignments obtained enable one to analyze the trade-offs between the potentially conflicting objectives. These approaches lead to accuracy enhancement in template-based protein structure modeling

    Application of coevolution-based methods and deep learning for structure prediction of protein complexes

    Get PDF
    The three-dimensional structures of proteins play a critical role in determining their biological functions and interactions. Experimental determination of protein and protein complex structures can be expensive and difficult. Computational prediction of protein and protein complex structures has therefore been an open challenge for decades. Recent advances in computational structure prediction techniques have resulted in increasingly accurate protein structure predictions. These techniques include methods that leverage information about coevolving residues to predict residue interactions and that apply deep learning techniques to enable better prediction of residue contacts and protein structures. Prior to the work outlined in this thesis, coevolution-based methods and deep learning had been shown to improve the prediction of single protein domains or single protein chains. Most proteins in living organisms do not function on their own but interact with other proteins either through transient interactions or by forming stable protein complexes. Knowledge of protein complex structures can be useful for biological and disease research, drug discovery and protein engineering. Unfortunately, a large number of protein complexes do not have experimental structures or close homolog structures that can be used as templates. In this thesis, methods previously developed and applied to the de novo prediction of single protein domains or protein monomer chains were modified and leveraged for the prediction of protein heterodimer and homodimer complexes. A number of coevolution-based tools and deep learning methods are explored for the purpose of predicting inter-chain and intra-chain residue contacts in protein dimers. These contacts are combined with existing protein docking methods to explore the prediction of homodimers and heterodimers. Overall, the work in this thesis demonstrates the promise of leveraging coevolution and deep-learning for the prediction of protein complexes, shows improvements in protein complex prediction tasks achieved using coevolution based methods and deep learning methods, and demonstrates remaining challenges in protein complex prediction

    Transcription factor DNA binding- and nucleosome formation energies determined by high performance fluorescence anisotropy

    Get PDF
    Protein DNA binding is the core of transcriptional regulation, the process which controls the flow of information stored in an organism’s genome to react to its environment and to maintain its functionality. The initial event of gene expression is the binding of a transcription factor (TF) to its target site. These binding events are integrated over several binding sites and TFs by which a fine tuned regulation can be achieved. The number, combination and strengths of the different binding sites encode the desired gene expression level and the plasticity of the regulated gene. Efforts have been devoted with the goal of identifying the specific DNA sequences bound by different TFs. For more than two decades, it was thought that mutations at each position in this sequence independently contribute to the binding probability of a TF. This binding preference has therefore been described through position weight matrices (PWMs). PWMs describe the binding preference of a TF towards its target sites by assuming that each nucleotide position contributes independently to the total specificity (linearity assumption). However, current research has shown that this simplified view lacks a significant part of the information needed to precisely describe the binding preference of a TF. It was also shown that the most information missing in the PWM is encoded in dinucleotide mutations. Two questions are important in this regard: (1) Which information about TF-DNA interaction are we missing and are currently employed methods able to provide them? and (2) What is a comprehensive description of non-linearity that is based on biophysical properties rather then on abstract probabilities? One important aspect is the three dimensional configuration of the DNA strand (DNA shape) which is known to affect TF binding to a varying degree. Through recent work by the group of Remo Rohs it is possible to predict shape parameters (features) from a DNA sequence and investigate to which degree they influence binding for any given set of measurements. The first aim of this thesis is therefore to determine non-linearity in TF-DNA interaction and investigate the influence of DNA shape on them. Protein-DNA interactions were studied with a variety of methods using structural biology (NMR, crystallography, cryo EM) or quantitative Methods (EMSA, DNA binding arrays, ChIPSeq, B1H, SELEX, MITOMI, Simile-Seq). Most of these quantitative methods to measure TF-DNA interactions, however, are not very sensitive to weak binders due to stringent washing steps or cutoffs they employ. Especially sequences with two positions differing from the consensus can be very weakly bound - therefore a sensitive method is needed to investigate non-linearity. The method called High Performance Fluorescence Anisotropy (HiP-FA, recently developed in our lab) provides the necessary sensitivity. Using HiP-FA, I determined the affinities of 13 TFs from the Drosophila melanogaster segmentation network and found most of them to contain a significant non-linearity in their specificity. The binding energies of the TFs correlated significantly with certain DNA shape features suggesting shape readout by the TFs. These results could be confirmed in existing structural biology data. Besides the influence of information directly encoded in the DNA sequence, the binding of a TF in the genome is most influenced by the DNA accessibility. This property is a result of the genomic DNA being wrapped around histone octamers forming nucleosomes. Since the underlying sequence can also influence the binding of the histone complex to the DNA, a natural question to ask is which features of the DNA sequence are the major determinant of histone-DNA interaction. Attempts to address this question used existing methods which were either MNase based and are therefore prone to the enzymes intrinsic cutting bias or based on dialysis and/or EMSA readout and have in consequence a low throughput and can only be automated to a small degree. This leads to a limited set of measurements which are usually only based on a single measurement point instead of a complete titration curve. The second aim of my thesis is therefore to develop an in vitro assay to determine free energies of nucleosome formation which improves on the limitations of existing methods. Using the sensitive FA-microscopy setup, I developed an automated assay to determine the free energy of nucleosome formation in a competitive titration. In contrast to existing methods, the throughput of the assays allows for full competitor titration curves. By measuring the free binding energies of 42 sequences, I showed that GC-content is the factor most contributing to the free energy. The relationship between these quantities is non-monotonous with an optimal GC-content of 49 percent. The results provided in this thesis give insight into the nature of non-linearity in TF-DNA interactions and highlight the DNA shape readout therein. Methodical advancements developed in this work can be used as a foundation to investigate other kinds of molecular interactions making use of the high sensitivity of FA-based microscopy

    Graph theory-based sequence descriptors as remote homology predictors

    Get PDF
    Indexación: Scopus.Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical–numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.https://www.mdpi.com/2218-273X/10/1/2

    Transcription factor DNA binding- and nucleosome formation energies determined by high performance fluorescence anisotropy

    Get PDF
    Protein DNA binding is the core of transcriptional regulation, the process which controls the flow of information stored in an organism’s genome to react to its environment and to maintain its functionality. The initial event of gene expression is the binding of a transcription factor (TF) to its target site. These binding events are integrated over several binding sites and TFs by which a fine tuned regulation can be achieved. The number, combination and strengths of the different binding sites encode the desired gene expression level and the plasticity of the regulated gene. Efforts have been devoted with the goal of identifying the specific DNA sequences bound by different TFs. For more than two decades, it was thought that mutations at each position in this sequence independently contribute to the binding probability of a TF. This binding preference has therefore been described through position weight matrices (PWMs). PWMs describe the binding preference of a TF towards its target sites by assuming that each nucleotide position contributes independently to the total specificity (linearity assumption). However, current research has shown that this simplified view lacks a significant part of the information needed to precisely describe the binding preference of a TF. It was also shown that the most information missing in the PWM is encoded in dinucleotide mutations. Two questions are important in this regard: (1) Which information about TF-DNA interaction are we missing and are currently employed methods able to provide them? and (2) What is a comprehensive description of non-linearity that is based on biophysical properties rather then on abstract probabilities? One important aspect is the three dimensional configuration of the DNA strand (DNA shape) which is known to affect TF binding to a varying degree. Through recent work by the group of Remo Rohs it is possible to predict shape parameters (features) from a DNA sequence and investigate to which degree they influence binding for any given set of measurements. The first aim of this thesis is therefore to determine non-linearity in TF-DNA interaction and investigate the influence of DNA shape on them. Protein-DNA interactions were studied with a variety of methods using structural biology (NMR, crystallography, cryo EM) or quantitative Methods (EMSA, DNA binding arrays, ChIPSeq, B1H, SELEX, MITOMI, Simile-Seq). Most of these quantitative methods to measure TF-DNA interactions, however, are not very sensitive to weak binders due to stringent washing steps or cutoffs they employ. Especially sequences with two positions differing from the consensus can be very weakly bound - therefore a sensitive method is needed to investigate non-linearity. The method called High Performance Fluorescence Anisotropy (HiP-FA, recently developed in our lab) provides the necessary sensitivity. Using HiP-FA, I determined the affinities of 13 TFs from the Drosophila melanogaster segmentation network and found most of them to contain a significant non-linearity in their specificity. The binding energies of the TFs correlated significantly with certain DNA shape features suggesting shape readout by the TFs. These results could be confirmed in existing structural biology data. Besides the influence of information directly encoded in the DNA sequence, the binding of a TF in the genome is most influenced by the DNA accessibility. This property is a result of the genomic DNA being wrapped around histone octamers forming nucleosomes. Since the underlying sequence can also influence the binding of the histone complex to the DNA, a natural question to ask is which features of the DNA sequence are the major determinant of histone-DNA interaction. Attempts to address this question used existing methods which were either MNase based and are therefore prone to the enzymes intrinsic cutting bias or based on dialysis and/or EMSA readout and have in consequence a low throughput and can only be automated to a small degree. This leads to a limited set of measurements which are usually only based on a single measurement point instead of a complete titration curve. The second aim of my thesis is therefore to develop an in vitro assay to determine free energies of nucleosome formation which improves on the limitations of existing methods. Using the sensitive FA-microscopy setup, I developed an automated assay to determine the free energy of nucleosome formation in a competitive titration. In contrast to existing methods, the throughput of the assays allows for full competitor titration curves. By measuring the free binding energies of 42 sequences, I showed that GC-content is the factor most contributing to the free energy. The relationship between these quantities is non-monotonous with an optimal GC-content of 49 percent. The results provided in this thesis give insight into the nature of non-linearity in TF-DNA interactions and highlight the DNA shape readout therein. Methodical advancements developed in this work can be used as a foundation to investigate other kinds of molecular interactions making use of the high sensitivity of FA-based microscopy

    Characterisation of the structure and self-assembly of a small cyclic peptide: an analysis using NMR spectroscopy, diffusion and heteronuclear relaxation measurements

    Get PDF
    Pseudodesmin A, a member of the viscosin group, is a cyclic lipodepsipeptide (CLP) consisting out of an oligopeptide that is cyclised through a lactone bond between its C-terminal carboxyl group and the hydroxyl group of a threonine side chain and a 3-hydroxydecanoic acid moiety bonded to the N-terminal end of the peptide. It is found to self-assemble in apolar organic solvents, which is reminiscent of its expected ability of forming ion pores in cellular membranes. The goal of this dissertation is to investigate the self-assembly in organic solvents and the structure of the assemblies formed mainly by translational diffusion and heteronuclear relaxation NMR measurements. After covering some theory and background concerning diffusion, translational diffusion NMR, NMR relaxation, and CLPs, the elucidation of the pseudodesmin A conformation – by both X-ray diffraction and NMR – will be discussed. Next, the self-assembly is studied by translational diffusion measurements at different concentrations in chloroform and acetonitrile/chloroform mixtures. Using these results, a model is proposed for the supramolecular assembly that explains the selective self-assembly in non-polar environment, the limitless nature of the assembly and the biological function of the ion pore. This model, confirmed by the detection of intermolecular rOe contacts, encompasses a side by side aggregation of the amphipathic monomer alpha-helical units followed by a stacking of these aggregates – along a direction closely parallel to the alpha-helix – to cylindrically shaped structures. Heteronuclear 13C-alpha relaxation is then used to gain more insight in the organisation of the supramolecular structure and to confirm some aspects of the proposed model. The dependence of the 13C-alpha R1 and R2 relaxation rate constants on the 13C-1H bond vector orientation within a molecular structure is used to obtain information concerning the rotational diffusion properties of the assembly and the orientation of the monomer within the supramolecular assembly. This is used to acquire the shape and average dimensions of the supramolecular assembly, which confirm the model of cylindrical assemblies that grow in only one dimension, nearly parallel with the helix structure. Finally, a theoretical simulation is performed to assess the impact of the self-association equilibria on the relaxation rate constants

    Bioprospecting for fungal-based biocontrol agents

    Get PDF
    The research objective of this project was to investigate how a virus infecting the fungus can improve its effectiveness as a pesticide. This mycovirus-induced hypervirulence was investigated using microbiological and genomic techniques to characterise the molecular interactions between the virus and the fungus host so that an improved mycopesticide can be deployed commercially. Prior to this work, it was reported that a newly proposed virus family called the Polymycoviridae can confer mild hypervirulence to their fungal host. In this work, attempts to cure a B. bassiana isolate (ATHUM 4946) which harbours a polymycovirus-3 (BbPmV-3) were successful and I built and confirmed two isogenic lines of virus-free and virus-infected. Furthermore, BbPmV-3 has six genomic dsRNA segments and its complete sequence is reported here. Phylogenetic analysis of RNA-dependent RNA polymerase (RdRP) protein sequences revealed that BbPmV-3 is closely related to BbPmV-2 but not BbPmV-1. Consequently, examining the effects of BbPmV-3 and BbPmV-1 on their respective hosts revealed similar phenotypic effects including increased pigmentation, sporulation, and radial growth. However, this polymycovirus-mediated effect on growth is dependent on the carbon and nitrogen sources available to the host fungus. When sucrose is replaced by lactose, trehalose, glucose, or glycerol both BbPmV-3 and BbPmV-1 increase growth of ATHUM 4946 and EABb 92/11-Dm respectively, whereas these effects were reversed on maltose and fructose. Similarly, both BbPmV-3 and BbPmV-1 decrease growth of ATHUM and EABb 92/11-Dm when sodium nitrate is replaced by sodium nitrite, potassium nitrate, or ammonium nitrate. To this extent, this hypervirulent effect was tested on Tenebrio molitor, where a virus-infected EABb 92/11-Dm line demonstrated increased mortality rate when compared to the commercial B. bassiana ATCC 74040, B. bassiana GHA, and the virus-free isogenic line. Furthermore, gene expression data from five timepoints of the two isogenic lines were used in a candidate pathway approach, investigating key pathways known to affect resistance to stresses and carbon uptake. Secretion of organic acid during growth can change the pH of the growth medium creating a toxic environment causing stress and death. Consequently, genes involved in stress tolerance such as heat shock proteins, trehalose and mannitol biosynthesis, and calcium homeostasis were upregulated in virus-infected isolate. Likewise, genes involved in carbon uptake such as BbAGT1 and BbJen1 transporters were upregulated in virus-infected isolate. Equally, here we demonstrate that BbPmV-1 drives the up-regulation of nirA gene which is linked to nitrate uptake and/or assimilation and secondary metabolites such as Tenellin, Beauvericin and Bassianolide. These results reveal a symbiotic relationship between BbPmV-1 and its fungal host. To conclude, these data present a crucial first step in characterising how mycopesticides can be improved to deliver better and safer pest management

    Studies on the relationships between oligonucleotide probe properties and hybridization signal intensities

    Get PDF
    Microarray technology is a commonly used tool in biomedical research for assessing global gene expression, surveying DNA sequence variations, and studying alternative gene splicing. Given the wide range of applications of this technology, comprehensive understanding of its underlying mechanisms is of importance. The focus of this work is on contributions from microarray probe properties (probe secondary structure: ?Gss, probe-target binding energy: ?G, probe-target mismatch) to the signal intensity. The benefits of incorporating or ignoring these properties to the process of microarray probe design and selection, as well as to microarray data preprocessing and analysis, are reported. Four related studies are described in this thesis. In the first, probe secondary structure was found to account for up to 3% of all variation on Affymetrix microarrays. In the second, a dinucleotide affinity model was developed and found to enhance the detection of differentially expressed genes when implemented as a background correction procedure in GeneChip preprocessing algorithms. This model is consistent with physical models of binding affinity of the probe target pair, which depends on the nearest-neighbor stacking interactions in addition to base-pairing. In the remaining studies, the importance of incorporating biophysical factors in both the design and the analysis of microarrays ‘percent bound’, predicted by equilibrium models of hybridization, is a useful factor in predicting and assessing the behavior of long oligonucleotide probes. However, a universal probe-property-independent three-parameter Langmuir model has also been tested, and this simple model has been shown to be as, or more, effective as complex, computationally expensive models developed for microarray target concentration estimation. The simple, platform-independent model can equal or even outperform models that explicitly incorporate probe properties, such as the model incorporating probe percent bound developed in Chapter Three. This suggests that with a “spiked-in” concentration series targeting as few as 5-10 genes, reliable estimation of target concentration can be achieved for the entire microarray
    corecore