7,510 research outputs found

    Analysis of Three-Dimensional Protein Images

    Full text link
    A fundamental goal of research in molecular biology is to understand protein structure. Protein crystallography is currently the most successful method for determining the three-dimensional (3D) conformation of a protein, yet it remains labor intensive and relies on an expert's ability to derive and evaluate a protein scene model. In this paper, the problem of protein structure determination is formulated as an exercise in scene analysis. A computational methodology is presented in which a 3D image of a protein is segmented into a graph of critical points. Bayesian and certainty factor approaches are described and used to analyze critical point graphs and identify meaningful substructures, such as alpha-helices and beta-sheets. Results of applying the methodologies to protein images at low and medium resolution are reported. The research is related to approaches to representation, segmentation and classification in vision, as well as to top-down approaches to protein structure prediction.Comment: See http://www.jair.org/ for any accompanying file

    Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families

    Full text link
    In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.Comment: 13 pages, 7 figures, 2 tables (a new subsection added

    Crystal Structure of the Cysteine-Rich Domain of Mannose Receptor Complexed with a Sulfated Carbohydrate Ligand

    Get PDF
    The macrophage and epithelial cell mannose receptor (MR) binds carbohydrates on foreign and host molecules. Two portions of MR recognize carbohydrates: tandemly arranged C-type lectin domains facilitate carbohydrate-dependent macrophage uptake of infectious organisms, and the NH2-terminal cysteine-rich domain (Cys-MR) binds to sulfated glycoproteins including pituitary hormones. To elucidate the mechanism of sulfated carbohydrate recognition, we determined crystal structures of Cys-MR alone and complexed with 4-sulfated-N-acetylgalactosamine at 1.7 and 2.2 Å resolution, respectively. Cys-MR folds into an approximately three-fold symmetric β-trefoil shape resembling fibroblast growth factor. The sulfate portions of 4-sulfated-N-acetylgalactosamine and an unidentified ligand found in the native crystals bind in a neutral pocket in the third lobe. We use the structures to rationalize the carbohydrate binding specificities of Cys-MR and compare the recognition properties of Cys-MR with other β-trefoil proteins

    Nucleolin stabilizes G-quadruplex structures folded by the LTR promoter and silences HIV-1 viral transcription

    Get PDF
    Folding of the LTR promoter into dynamic G-quadruplex conformations has been shown to suppress its transcriptional activity in HIV-1. Here we sought to identify the proteins that control the folding of this region of proviral genome by inducing/stabilizing G-quadruplex structures. The implementation of electrophorethic mobility shift assay and pull-down experiments coupled with mass spectrometric analysis revealed that the cellular protein nucleolin is able to specifically recognize G-quadruplex structures present in the LTR promoter. Nucleolin recognized with high affinity and specificity the majority, but not all the possible G-quadruplexes folded by this sequence. In addition, it displayed greater binding preference towards DNA than RNA G-quadruplexes, thus indicating two levels of selectivity based on the sequence and nature of the target. The interaction translated into stabilization of the LTR G-quadruplexes and increased promoter silencing activity; in contrast, disruption of nucleolin binding in cells by both siRNAs and a nucleolin binding aptamer greatly increased LTR promoter activity. These data indicate that nucleolin possesses a specific and regulated activity toward the HIV-1 LTR promoter, which is mediated by G-quadruplexes. These observations provide new essential insights into viral transcription and a possible low mutagenic target for antiretroviral therapy

    Biochemical and Structural Analyses of Budding Yeast Telomere Associated CST Complex

    Full text link
    Telomeres are specialized protein-DNA complexes that compose the natural termini of linear chromosomes. Telomeres prevent chromosome ends from deleterious degradation and fusion events and ensure the complete replication of chromosomes. In Saccharomyces cerevisiae, Cdc13, Stn1 and Ten1 are essential for both chromosome capping and telomere length homeostasis. These three proteins have been proposed to fulfill their roles at chromosome termini as a telomere-dedicated RPA (Replication Protein A, including Rpa70, Rpa32 and Rpa14) complex on the basis of several parallels with the conventional RPA. However, no direct evidence has been provided for this hypothesis. Here I provided the first direct evidence based on our crystal structures. Structural and functional analyses of Candida albicans Stn1-Ten1 revealed striking similarities with Rpa32-Rpa14 and critical roles for these proteins in suppressing aberrant telomerase activities at telomeres. All proved that Stn1-Ten1 is an Rpa32-Rpa14-like complex at telomere. However, the relationship between Cdc13 and Rpa70 remained unclear. The crystal structures of multiple OB (oligonucleotide/oligosaccharide binding)-folds at the N- and C-terminal ends of Cdc13 established an Rpa70-like domain organization, although the structures of Cdc13 OB-folds are significantly different from their Rpa70 counterparts. Furthermore, our structural and biochemical analyses revealed unexpected Cdc13 dimerization by either N- or C-terminal OB-fold and showed that homodimerization is probably a conserved feature of all Cdc13s. We also uncovered the versatility of Cdc13 dimerization in mediating interaction with different targets. The structural characterization of the interaction between the Cdc13 N-terminal OB-fold and Pol1, the catalytic subunit of DNA polymerase α, demonstrated a role for N-terminal dimerization in Pol1-binding. The discovery of Candida spp. Cdc13 dimerization through its OB4 domain revealed its important role in high affinity telomere DNA binding. Collectively, our findings provided novel insights into the mechanisms and evolution of Cdc13. Additionally, we have shown Cdc13’s role in regulating the synthesis of telomere by interacting with telomerase subunit Est1. The interaction involves the second OB-fold in addition to the previously recognized recruitment domain of Cdc13. The finding significantly furthered our understandings about the synthesis of leading and lagging strands of chromosome and the essential role of Cdc13 in solving the end-replication problem.Ph.D.Chemical BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91405/1/jiasun_1.pd

    Rules extraction from neural networks applied to the prediction and recognition of prokaryotic promoters

    Get PDF
    Promoters are DNA sequences located upstream of the gene region and play a central role in gene expression. Computational techniques show good accuracy in gene prediction but are less successful in predicting promoters, primarily because of the high number of false positives that reflect characteristics of the promoter sequences. Many machine learning methods have been used to address this issue. Neural Networks (NN) have been successfully used in this field because of their ability to recognize imprecise and incomplete patterns characteristic of promoter sequences. In this paper, NN was used to predict and recognize promoter sequences in two data sets: (i) one based on nucleotide sequence information and (ii) another based on stability sequence information. The accuracy was approximately 80% for simulation (i) and 68% for simulation (ii). In the rules extracted, biological consensus motifs were important parts of the NN learning process in both simulations

    Ab initio RNA folding

    Full text link
    RNA molecules are essential cellular machines performing a wide variety of functions for which a specific three-dimensional structure is required. Over the last several years, experimental determination of RNA structures through X-ray crystallography and NMR seems to have reached a plateau in the number of structures resolved each year, but as more and more RNA sequences are being discovered, need for structure prediction tools to complement experimental data is strong. Theoretical approaches to RNA folding have been developed since the late nineties when the first algorithms for secondary structure prediction appeared. Over the last 10 years a number of prediction methods for 3D structures have been developed, first based on bioinformatics and data-mining, and more recently based on a coarse-grained physical representation of the systems. In this review we are going to present the challenges of RNA structure prediction and the main ideas behind bioinformatic approaches and physics-based approaches. We will focus on the description of the more recent physics-based phenomenological models and on how they are built to include the specificity of the interactions of RNA bases, whose role is critical in folding. Through examples from different models, we will point out the strengths of physics-based approaches, which are able not only to predict equilibrium structures, but also to investigate dynamical and thermodynamical behavior, and the open challenges to include more key interactions ruling RNA folding.Comment: 28 pages, 18 figure

    Highly Accurate Fragment Library for Protein Fold Recognition

    Get PDF
    Proteins play a crucial role in living organisms as they perform many vital tasks in every living cell. Knowledge of protein folding has a deep impact on understanding the heterogeneity and molecular functions of proteins. Such information leads to crucial advances in drug design and disease understanding. Fold recognition is a key step in the protein structure discovery process, especially when traditional computational methods fail to yield convincing structural homologies. In this work, we present a new protein fold recognition approach using machine learning and data mining methodologies. First, we identify a protein structural fragment library (Frag-K) composed of a set of backbone fragments ranging from 4 to 20 residues as the structural “keywords” that can effectively distinguish between major protein folds. We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large-scale of high-quality, non-homologous protein structures available in PDB. We analyze the impacts of clustering cut-offs on the performance of the fragment libraries. Then, the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins). Our results show that a structural dictionary with ~400 4- to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy. Then, based on Frag-k, we design a novel deep learning architecture, so-called DeepFrag-k, which identifies fold discriminative features to improve the accuracy of protein fold recognition. DeepFrag-k is composed of two stages: the first stage employs a multimodal Deep Belief Network (DBN) to predict the potential structural fragments given a sequence, represented as a fragment vector, and then the second stage uses a deep convolution neural network (CNN) to classify the fragment vectors into the corresponding folds. Our results show that DeepFrag-k yields 92.98% accuracy in predicting the top-100 most popular fragments, which can be used to generate discriminative fragment feature vectors to improve protein fold recognition

    Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.

    Get PDF
    Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available.ImportanceTo fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available

    Structural Alignment of RNAs Using Profile-csHMMs and Its Application to RNA Homology Search: Overview and New Results

    Get PDF
    Systematic research on noncoding RNAs (ncRNAs) has revealed that many ncRNAs are actively involved in various biological networks. Therefore, in order to fully understand the mechanisms of these networks, it is crucial to understand the roles of ncRNAs. Unfortunately, the annotation of ncRNA genes that give rise to functional RNA molecules has begun only recently, and it is far from being complete. Considering the huge amount of genome sequence data, we need efficient computational methods for finding ncRNA genes. One effective way of finding ncRNA genes is to look for regions that are similar to known ncRNA genes. As many ncRNAs have well-conserved secondary structures, we need statistical models that can represent such structures for this purpose. In this paper, we propose a new method for representing RNA sequence profiles and finding structural alignment of RNAs based on profile context-sensitive hidden Markov models (profile-csHMMs). Unlike existing models, the proposed approach can handle any kind of RNA secondary structures, including pseudoknots. We show that profile-csHMMs can provide an effective framework for the computational analysis of RNAs and the identification of ncRNA genes
    corecore