Exploration of the Disambiguation of Amino Acid Types to Chi-1 Rotamer Types in Protein Structure Prediction and Design

Abstract

A protein’s global fold provide insight into function; however, function specificity is often detailed in sidechain orientation. Thus, determining the rotamer conformations is often crucial in the contexts of protein structure/function prediction and design. For all non-glycine and non-alanine types, chi-1 rotamers occupy a small number of discrete number of states. Herein, we explore the possibility of describing evolution from the perspective of the sidechains’ structure versus the traditional twenty amino acid types. To validate our hypothesis that this perspective is more crucial to our understanding of evolutionary relationships, we investigate its uses as evolutionary, substitution matrices for sequence alignments for fold recognition purposes and computational protein design with specific focus in designing beta sheet environments, where previous studies have been done on amino acid-types alone. Throughout this study, we also propose the concept of the “chi-1 rotamer sequence” that describes the chi-1 rotamer composition of a protein. We also present attempts to predict these sequences and real-value torsion angles from amino acid sequence information. First, we describe our developments of log-odds scoring matrices for sequence alignments. Log-odds substitution matrices are widely used in sequence alignments for their ability to determine evolutionary relationship between proteins. Traditionally, databases of sequence information guide the construction of these matrices which illustrates its power in discovering distant or weak homologs. Weak homologs, typically those that share low sequence identity (< 30%), are often difficult to identify when only using basic amino acid sequence alignment. While protein threading approaches have addressed this issue, many of these approaches include sequenced-based information or profiles guided by amino acid-based substitution matrices, namely BLOSUM62. Here, we generated a structural-based substitution matrix born by TM-align structural alignments that captures both the sequence mutation rate within same protein family folds and the chi-1 rotamer that represents each amino acid. These rotamer substitution matrices (ROTSUMs) discover new homologs and improved alignments in the PDB that traditional substitution matrices, based solely on sequence information, cannot identify. Certain tools and algorithms to estimate rotamer torsions angles have been developed but typically require either knowledge of backbone coordinates and/or experimental data to help guide the prediction. Herein, we developed a fragment-based algorithm, Rot1Pred, to determine the chi-1 states in each position of a given amino acid sequence, yielding a chi-1 rotamer sequence. This approach employs fragment matching of the query sequence to sequence-structure fragment pairs in the PDB to predict the query’s sidechain structure information. Real-value torsion angles were also predicted and compared against SCWRL4. Results show that overall and for most amino-acid types, Rot1Pred can calculate chi-1 torsion angles significantly closer to native angles compared to SCWRL4 when evaluated on I-TASSER generated model backbones. Finally, we’ve developed and explored chi-1-rotamer-based statistical potentials and evolutionary profiles constructed for de novo computational protein design. Previous analyses which aim to energetically describe the preference of amino acid types in beta sheet environments (parallel vs antiparallel packing or n- and c-terminal beta strand capping) have been performed with amino acid types although no explicit rotamer representation is given in their scoring functions. In our study, we construct statistical functions which describes chi-1 rotamer preferences in these environments and illustrate their improvement over previous methods. These specialized knowledge-based energy functions have generated sequences whose I-TASSER predicted models are structurally-alike to their input structures yet consist of low sequence identity.PHDChemical BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145951/1/jarrettj_1.pd

    Similar works