63,381 research outputs found

    Multiple structure alignment and consensus identification for proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein which captures common substructures present in the given proteins. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins.</p> <p>Results</p> <p>Experimental results show that the algorithm converges quite rapidly and generates consensus structures that are visually similar to the input proteins. A comparison with other coordinate-based alignment algorithms (MAMMOTH and MATT) shows that the proposed algorithm is competitive in terms of speed and the sizes of the conserved regions discovered in an extensive benchmark dataset derived from the HOMSTRAD and SABmark databases.</p> <p>The algorithm has been implemented in C++ and can be downloaded from the project's web page. Alternatively, the algorithm can be used via a web server which makes it possible to align protein structures by uploading files from local disk or by downloading protein data from the RCSB Protein Data Bank.</p> <p>Conclusions</p> <p>An algorithm is presented to compute a multiple structure alignment for a set of proteins, together with their consensus structure. Experimental results show its effectiveness in terms of the quality of the alignment and computational cost.</p

    ConSole: using modularity of contact maps to locate solenoid domains in protein structures.

    Get PDF
    BackgroundPeriodic proteins, characterized by the presence of multiple repeats of short motifs, form an interesting and seldom-studied group. Due to often extreme divergence in sequence, detection and analysis of such motifs is performed more reliably on the structural level. Yet, few algorithms have been developed for the detection and analysis of structures of periodic proteins.ResultsConSole recognizes modularity in protein contact maps, allowing for precise identification of repeats in solenoid protein structures, an important subgroup of periodic proteins. Tests on benchmarks show that ConSole has higher recognition accuracy as compared to Raphael, the only other publicly available solenoid structure detection tool. As a next step of ConSole analysis, we show how detection of solenoid repeats in structures can be used to improve sequence recognition of these motifs and to detect subtle irregularities of repeat lengths in three solenoid protein families.ConclusionsThe ConSole algorithm provides a fast and accurate tool to recognize solenoid protein structures as a whole and to identify individual solenoid repeat units from a structure. ConSole is available as a web-based, interactive server and is available for download at http://console.sanfordburnham.org

    Consensus mutagenesis reveals that non-helical regions influence thermal stability of horseradish peroxidase

    Get PDF
    The enzyme horseradish peroxidase has many uses in biotechnology but a stabilized derivative would have even wider applicability. To enhance thermal stability, we applied consensus mutagenesis (used successfully with other proteins) to recombinant horseradish peroxidase and generated five single-site mutants. Unexpectedly, these mutations had greater effects on steady-state kinetics than on thermal stability. Only two mutants (T102A, T110V) marginally exceeded the wild type's thermal stability (4% and 10% gain in half-life at 50 °C respectively); the others (Q106R, Q107D, I180F) were less stable than wild type. Stability of a five-fold combination mutant matched that of Q106R, the least-stable single mutant. These results were perplexing: the Class III plant peroxidases display wide differences in thermal stability, yet the consensus mutations failed to reflect these natural variations. We examined the sequence content of Class III peroxidases to determine if there are identifiable molecular reasons for the stability differences observed. Bioinformatic analysis validated our choice of sites and mutations and generated an archetypal peroxidase sequence for comparison with extant sequences. It seems that both genetic variation and differences in protein stability are confined to non-helical regions due to the presence of a highly conserved alpha-helical structural scaffold in these enzymes

    Large-scale analysis of influenza A virus nucleoprotein sequence conservation reveals potential drug-target sites

    Get PDF
    The nucleoprotein (NP) of the influenza A virus encapsidates the viral RNA and participates in the infectious life cycle of the virus. The aims of this study were to find the degree of conservation of NP among all virus subtypes and hosts and to identify conserved binding sites, which may be utilised as potential drug target sites. The analysis of conservation based on 4430 amino acid sequences identified high conservation in known functional regions as well as novel highly conserved sites. Highly variable clusters identified on the surface of NP may be associated with adaptation to different hosts and avoidance of the host immune defence. Ligand binding potential overlapping with high conservation was found in the tail-loop binding site and near the putative RNA binding region. The results provide the basis for developing antivirals that may be universally effective and have a reduced potential to induce resistance through mutations.Peer reviewe

    Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome

    Get PDF
    The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]

    Solution structure of a repeated unit of the ABA-1 nematode polyprotein allergen of ascaris reveals a novel fold and two discrete lipid-binding sites

    Get PDF
    Parasitic nematode worms cause serious health problems in humans and other animals. They can induce allergic-type immune responses, which can be harmful but may at the same time protect against the infections. Allergens are proteins that trigger allergic reactions and these parasites produce a type that is confined to nematodes, the nematode polyprotein allergens (NPAs). These are synthesized as large precursor proteins comprising repeating units of similar amino acid sequence that are subsequently cleaved into multiple copies of the allergen protein. NPAs bind small lipids such as fatty acids and retinol (Vitamin A) and probably transport these sensitive and insoluble compounds between the tissues of the worms. Nematodes cannot synthesize these lipids, so NPAs may also be crucial for extracting nutrients from their hosts. They may also be involved in altering immune responses by controlling the lipids by which the immune and inflammatory cells communicate. We describe the molecular structure of one unit of an NPA, the well-known ABA-1 allergen of Ascaris and find its structure to be of a type not previously found for lipid-binding proteins, and we describe the unusual sites where lipids bind within this structur

    Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families

    Full text link
    In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.Comment: 13 pages, 7 figures, 2 tables (a new subsection added
    corecore