2,307 research outputs found

    Functional Sites in Structure and Sequence. Protein Active Sites and miRNA Target Recognition -

    Get PDF
    The number of protein three-dimensional structures is increasing steeply, and structural genomics projects aim to solve the structures for all proteins as a means to understanding function. In the first part of my thesis, I developed a method for the comparison of local structural patterns (e.g. enzyme active sites) that provides a reliable statistical measure to discern meaningful matches from noise. The method is complementary to structural alignment as it is able to confirm functional similarities suggested by an overall similar structure but also detects functional similarities between different folds. An easy-to-use interface is available on the Internet for functional annotation of protein structures (http://pints.embl.de). In the second part of my thesis, I present a computational screen for microRNA (miRNA) targets in Drosophila. miRNAs are short RNAs that inhibit translation of target messenger RNAs in animals by binding to complementary sites in their 3ďż˝ untranslated regions. Target predictions were urgently needed as targets were known for only three of the more than 700 miRNAs. Of my predictions, six were validated experimentally and others are likely to be functional, making the results a useful resource for miRNA research. The screen extended miRNA function to pathway control, nervous system development and regulation of metabolism, and revealed that one miRNA typically regulates several targets but also that one gene is likely to be targeted by several miRNAs

    Introduction to Protein Structure Prediction

    Get PDF
    This chapter gives a graceful introduction to problem of protein three- dimensional structure prediction, and focuses on how to make structural sense out of a single input sequence with unknown structure, the 'query' or 'target' sequence. We give an overview of the different classes of modelling techniques, notably template-based and template free. We also discuss the way in which structural predictions are validated within the global com- munity, and elaborate on the extent to which predicted structures may be trusted and used in practice. Finally we discuss whether the concept of a sin- gle fold pertaining to a protein structure is sustainable given recent insights. In short, we conclude that the general protein three-dimensional structure prediction problem remains unsolved, especially if we desire quantitative predictions. However, if a homologous structural template is available in the PDB model or reasonable to high accuracy may be generated

    Template Based Modeling and Structural Refinement of Protein-Protein Interactions.

    Full text link
    Determining protein structures from sequence is a fundamental problem in molecular biology, as protein structure is essential to understanding protein function. In this study, I developed one of the first fully automated pipelines for template based quaternary structure prediction starting from sequence. Two critical steps for template based modeling are identifying the correct homologous structures by threading which generates sequence to structure alignments and refining the initial threading template coordinates closer to the native conformation. I developed SPRING (single-chain-based prediction of interactions and geometries), a monomer threading to dimer template mapping program, which was compared to the dimer co-threading program, COTH, using 1838 non homologous target complex structures. SPRING’s similarity score outperformed COTH in the first place ranking of templates, correctly identifying 798 and 527 interfaces respectively. More importantly the results were found to be complementary and the programs could be combined in a consensus based threading program showing a 5.1% improvement compared to SPRING. Template based modeling requires a structural analog being present in the PDB. A full search of the PDB, using threading and structural alignment, revealed that only 48.7% of the PDB has a suitable template whereas only 39.4% of the PDB has templates that can be identified by threading. In order to circumvent this, I included intramolecular domain-domain interfaces into the PDB library to boost template recognition of protein dimers; the merging of the two classes of interfaces improved recognition of heterodimers by 40% using benchmark settings. Next the template based assembly of protein complexes pipeline, TACOS, was created. The pipeline combines threading templates and domain knowledge from the PDB into a knowledge based energy score. The energy score is integrated into a Monte Carlo sampling simulation that drives the initial template closer to the native topology. The full pipeline was benchmarked using 350 non homologous structures and compared to two state of the art programs for dimeric structure prediction: ZDOCK and MODELLER. On average, TACOS models global and interface structure have a better quality than the models generated by MODELLER and ZDOCK.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/135847/1/bgovi_1.pd

    Optimizing Data Selection for Contact Prediction in Proteins

    Get PDF
    Proteins are essential to life across all organisms. They act as enzymes, antibodies, transporters of molecules, structural elements, among other important roles. Their ability to interact with specific molecules in a selective manner, is what makes them important. Being able to understand their interaction can provide many advantages in fields such as drug design and metabolic engineering. Current methods of predicting protein interaction attempt to geometrically fit the structures of two proteins together by generating a large amount of potential configurations and then discriminating the correct pose from the remaining ones. Given the large search space, approaches to reduce the complexity are often employed. Identifying a contact point between the pairing proteins is a good constraining factor. If at least one contact can be predicted among a small set of possibilities (e.g. 100), the search space will be significantly reduced. Using structural and evolutionary information of the interacting proteins, a machine learning predictor can be developed for this task. Such evolutionary measures are computed over a substantial amount of homologous sequences, which can be filtered and ordered in many different ways. As a result, a machine learning solution was developed that focused in measuring the effects that differing homolog arrangements can have over the final prediction

    Investigating Hfq-Mrna Interactions In Bacteria

    Get PDF
    Regulatory RNAs (sRNAs) are essential for bacteria to thrive in diverse environments and they also play a key role in virulence [11]. Trans-sRNAs affect the stability and/or translation of their target mRNAs through complementary base-pairing. The base-pairing interaction is not perfect and requires the action of an RNA binding protein, Hfq. Hfq facilitates these RNA-RNA interactions by stabilizing duplex formation, aiding in structural rearrangements, increasing the rate of structural opening, and/or by increasing the rate of annealing [18-21]. Hfq has two well characterized binding surfaces: the proximal surface, which binds AU rich stretches typical of sRNAs, and the distal surface, which binds (ARN)x motifs typically found in target mRNAs [30, 33, 36]. Studies on Hfq-RNA interactions have focused largely on sRNAs until the more recent discovery of an (ARN)x motif within the 5\u27UTR of target mRNAs[36, 37]. The importance of this motif in facilitating Hfq-mRNA binding and its requirement for regulation of a couple well known target mRNAs led us to further characterize the motif in the work described in this thesis. We performed bioinformatic and in vitro analyses to investigate the prevalence, location, structural contexts, and Hfq-binding of (ARN)x motifs in known target mRNAs. We found that the known targets contain single stranded (ARN)x sequences in their 5\u27UTRs that bind to Hfq. Two predominant structural contexts of the single stranded (ARN)x motifs became clear: they were either flanked by stem loop structures or within a loop of an internal bulge, multi-branch junction or hairpin. The key features of the motifs were then used as a bioinformatic tool on a genome wide scale to identify mRNAs that might bind to Hfq. We found that 21% of mRNAs have a suitable (ARN)x motif and therefore likely bind to Hfq. Messages that bind to Hfq may be novel sRNA targets so we investigated this possibility using an in vivo reporter assay and found that 63% of the mRNAs tested are regulated by a specific sRNA. The novel targets are involved in pathways including iron salvage, biofilm formation, and amino acid metabolism. Overall, we defined key features of (ARN)x motifs and were able to use those to predict novel target mRNAs in E. coli. This approach is efficient, effective and adaptable other bacterial species

    Exploration of the Disambiguation of Amino Acid Types to Chi-1 Rotamer Types in Protein Structure Prediction and Design

    Full text link
    A protein’s global fold provide insight into function; however, function specificity is often detailed in sidechain orientation. Thus, determining the rotamer conformations is often crucial in the contexts of protein structure/function prediction and design. For all non-glycine and non-alanine types, chi-1 rotamers occupy a small number of discrete number of states. Herein, we explore the possibility of describing evolution from the perspective of the sidechains’ structure versus the traditional twenty amino acid types. To validate our hypothesis that this perspective is more crucial to our understanding of evolutionary relationships, we investigate its uses as evolutionary, substitution matrices for sequence alignments for fold recognition purposes and computational protein design with specific focus in designing beta sheet environments, where previous studies have been done on amino acid-types alone. Throughout this study, we also propose the concept of the “chi-1 rotamer sequence” that describes the chi-1 rotamer composition of a protein. We also present attempts to predict these sequences and real-value torsion angles from amino acid sequence information. First, we describe our developments of log-odds scoring matrices for sequence alignments. Log-odds substitution matrices are widely used in sequence alignments for their ability to determine evolutionary relationship between proteins. Traditionally, databases of sequence information guide the construction of these matrices which illustrates its power in discovering distant or weak homologs. Weak homologs, typically those that share low sequence identity (< 30%), are often difficult to identify when only using basic amino acid sequence alignment. While protein threading approaches have addressed this issue, many of these approaches include sequenced-based information or profiles guided by amino acid-based substitution matrices, namely BLOSUM62. Here, we generated a structural-based substitution matrix born by TM-align structural alignments that captures both the sequence mutation rate within same protein family folds and the chi-1 rotamer that represents each amino acid. These rotamer substitution matrices (ROTSUMs) discover new homologs and improved alignments in the PDB that traditional substitution matrices, based solely on sequence information, cannot identify. Certain tools and algorithms to estimate rotamer torsions angles have been developed but typically require either knowledge of backbone coordinates and/or experimental data to help guide the prediction. Herein, we developed a fragment-based algorithm, Rot1Pred, to determine the chi-1 states in each position of a given amino acid sequence, yielding a chi-1 rotamer sequence. This approach employs fragment matching of the query sequence to sequence-structure fragment pairs in the PDB to predict the query’s sidechain structure information. Real-value torsion angles were also predicted and compared against SCWRL4. Results show that overall and for most amino-acid types, Rot1Pred can calculate chi-1 torsion angles significantly closer to native angles compared to SCWRL4 when evaluated on I-TASSER generated model backbones. Finally, we’ve developed and explored chi-1-rotamer-based statistical potentials and evolutionary profiles constructed for de novo computational protein design. Previous analyses which aim to energetically describe the preference of amino acid types in beta sheet environments (parallel vs antiparallel packing or n- and c-terminal beta strand capping) have been performed with amino acid types although no explicit rotamer representation is given in their scoring functions. In our study, we construct statistical functions which describes chi-1 rotamer preferences in these environments and illustrate their improvement over previous methods. These specialized knowledge-based energy functions have generated sequences whose I-TASSER predicted models are structurally-alike to their input structures yet consist of low sequence identity.PHDChemical BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145951/1/jarrettj_1.pd

    RSEARCH: Finding homologs of single structured RNA sequences

    Get PDF
    BACKGROUND: For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure. RESULTS: We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website. CONCLUSION: RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences

    Protein structure recognition: from eigenvector analysis to structural threading method

    Get PDF
    In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition.;In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle.;In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains.;In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction.;In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches
    • …
    corecore