148 research outputs found
ProRepeat: an integrated repository for studying amino acid tandem repeats in proteins
ProRepeat (http://prorepeat.bioinformatics.nl/) is an integrated curated repository and analysis platform for in-depth research on the biological characteristics of amino acid tandem repeats. ProRepeat collects repeats from all proteins included in the UniProt knowledgebase, together with 85 completely sequenced eukaryotic proteomes contained within the RefSeq collection. It contains non-redundant perfect tandem repeats, approximate tandem repeats and simple, low-complexity sequences, covering the majority of the amino acid tandem repeat patterns found in proteins. The ProRepeat web interface allows querying the repeat database using repeat characteristics like repeat unit and length, number of repetitions of the repeat unit and position of the repeat in the protein. Users can also search for repeats by the characteristics of repeat containing proteins, such as entry ID, protein description, sequence length, gene name and taxon. ProRepeat offers powerful analysis tools for finding biological interesting properties of repeats, such as the strong position bias of leucine repeats in the N-terminus of eukaryotic protein sequences, the differences of repeat abundance among proteomes, the functional classification of repeat containing proteins and GC content constrains of repeats’ corresponding codons
Proteome sequence features carry signatures of the environmental niche of prokaryotes
<p>Abstract</p> <p>Background</p> <p>Prokaryotic environmental adaptations occur at different levels within cells to ensure the preservation of genome integrity, proper protein folding and function as well as membrane fluidity. Although specific composition and structure of cellular components suitable for the variety of extreme conditions has already been postulated, a systematic study describing such adaptations has not yet been performed. We therefore explored whether the environmental niche of a prokaryote could be deduced from the sequence of its proteome. Finally, we aimed at finding the precise differences between proteome sequences of prokaryotes from different environments.</p> <p>Results</p> <p>We analyzed the proteomes of 192 prokaryotes from different habitats. We collected detailed information about the optimal growth conditions of each microorganism. Furthermore, we selected 42 physico-chemical properties of amino acids and computed their values for each proteome. Further, on the same set of features we applied two fundamentally different machine learning methods, Support Vector Machines and Random Forests, to successfully classify between bacteria and archaea, halophiles and non-halophiles, as well as mesophiles, thermophiles and mesothermophiles. Finally, we performed feature selection by using Random Forests.</p> <p>Conclusions</p> <p>To our knowledge, this is the first time that three different classification cases (domain of life, halophilicity and thermophilicity) of proteome adaptation are successfully performed with the same set of 42 features. The characteristic features of a specific adaptation constitute a signature that may help understanding the mechanisms of adaptation to extreme environments.</p
A Systematic Survey of Mini-Proteins in Bacteria and Archaea
BACKGROUND: Mini-proteins, defined as polypeptides containing no more than 100 amino acids, are ubiquitous in prokaryotes and eukaryotes. They play significant roles in various biological processes, and their regulatory functions gradually attract the attentions of scientists. However, the functions of the majority of mini-proteins are still largely unknown due to the constraints of experimental methods and bioinformatic analysis. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we extracted a total of 180,879 mini-proteins from the annotations of 532 sequenced genomes, including 491 strains of Bacteria and 41 strains of Archaea. The average proportion of mini-proteins among all genomic proteins is approximately 10.99%, but different strains exhibit remarkable fluctuations. These mini-proteins display two notable characteristics. First, the majority are species-specific proteins with an average proportion of 58.79% among six representative phyla. Second, an even larger proportion (70.03% among all strains) is hypothetical proteins. However, a fraction of highly conserved hypothetical proteins potentially play crucial roles in organisms. Among mini-proteins with known functions, it seems that regulatory and metabolic proteins are more abundant than essential structural proteins. Furthermore, domains in mini-proteins seem to have greater distributions in Bacteria than Eukarya. Analysis of the evolutionary progression of these domains reveals that they have diverged to new patterns from a single ancestor. CONCLUSIONS/SIGNIFICANCE: Mini-proteins are ubiquitous in bacterial and archaeal species and play significant roles in various functions. The number of mini-proteins in each genome displays remarkable fluctuation, likely resulting from the differential selective pressures that reflect the respective life-styles of the organisms. The answers to many questions surrounding mini-proteins remain elusive and need to be resolved experimentally
Algorithms for learning parsimonious context trees
Parsimonious context trees, PCTs, provide a sparse parameterization of conditional probability distributions. They are particularly powerful for modeling context-specific independencies in sequential discrete data. Learning PCTs from data is computationally hard due to the combinatorial explosion of the space of model structures as the number of predictor variables grows. Under the score-and-search paradigm, the fastest algorithm for finding an optimal PCT, prior to the present work, is based on dynamic programming. While the algorithm can handle small instances fast, it becomes infeasible already when there are half a dozen four-state predictor variables. Here, we show that common scoring functions enable the use of new algorithmic ideas, which can significantly expedite the dynamic programming algorithm on typical data. Specifically, we introduce a memoization technique, which exploits regularities within the predictor variables by equating different contexts associated with the same data subset, and a bound-and-prune technique, which exploits regularities within the response variable by pruning parts of the search space based on score upper bounds. On real-world data from recent applications of PCTs within computational biology the ideas are shown to reduce the traversed search space and the computation time by several orders of magnitude in typical cases.Peer reviewe
A Chaperonin Subunit with Unique Structures Is Essential for Folding of a Specific Substrate
Type I chaperonins are large, double-ring complexes present in bacteria (GroEL),
mitochondria (Hsp60), and chloroplasts (Cpn60), which are involved in mediating
the folding of newly synthesized, translocated, or stress-denatured proteins. In
Escherichia coli, GroEL comprises 14 identical subunits and
has been exquisitely optimized to fold its broad range of substrates. However,
multiple Cpn60 subunits with different expression profiles have evolved in
chloroplasts. Here, we show that, in Arabidopsis thaliana, the
minor subunit Cpn60β4 forms a heterooligomeric Cpn60 complex with
Cpn60α1 and Cpn60β1–β3 and is specifically required for the
folding of NdhH, a subunit of the chloroplast NADH dehydrogenase-like complex
(NDH). Other Cpn60β subunits cannot complement the function of Cpn60β4.
Furthermore, the unique C-terminus of Cpn60β4 is required for the full
activity of the unique Cpn60 complex containing Cpn60β4 for folding of NdhH.
Our findings suggest that this unusual kind of subunit enables the Cpn60 complex
to assist the folding of some particular substrates, whereas other dominant
Cpn60 subunits maintain a housekeeping chaperonin function by facilitating the
folding of other obligate substrates
Differential expression of HSPA1 and HSPA2 proteins in human tissues; tissue microarray-based immunohistochemical study
In the present study we determined the expression pattern of HSPA1 and HSPA2 proteins in various normal human tissues by tissue-microarray based immunohistochemical analysis. Both proteins belong to the HSPA (HSP70) family of heat shock proteins. The HSPA2 is encoded by the gene originally defined as testis-specific, while HSPA1 is encoded by the stress-inducible genes (HSPA1A and HSPA1B). Our study revealed that both proteins are expressed only in some tissues from the 24 ones examined. HSPA2 was detected in adrenal gland, bronchus, cerebellum, cerebrum, colon, esophagus, kidney, skin, small intestine, stomach and testis, but not in adipose tissue, bladder, breast, cardiac muscle, diaphragm, liver, lung, lymph node, pancreas, prostate, skeletal muscle, spleen, thyroid. Expression of HSPA1 was detected in adrenal gland, bladder, breast, bronchus, cardiac muscle, esophagus, kidney, prostate, skin, but not in other tissues examined. Moreover, HSPA2 and HSPA1 proteins were found to be expressed in a cell-type-specific manner. The most pronounced cell-type expression pattern was found for HSPA2 protein. In the case of stratified squamous epithelia of the skin and esophagus, as well as in ciliated pseudostratified columnar epithelium lining respiratory tract, the HSPA2 positive cells were located in the basal layer. In the colon, small intestine and bronchus epithelia HSPA2 was detected in goblet cells. In adrenal gland cortex HSPA2 expression was limited to cells of zona reticularis. The presented results clearly show that certain human tissues constitutively express varying levels of HSPA1 and HSPA2 proteins in a highly differentiated way. Thus, our study can help designing experimental models suitable for cell- and tissue-type-specific functional differences between HSPA2 and HSPA1 proteins in human tissues
Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes
<p>Abstract</p> <p>Background</p> <p>The sizes of proteins are relevant to their biochemical structure and for their biological function. The statistical distribution of protein lengths across a diverse set of taxa can provide hints about the evolution of proteomes.</p> <p>Results</p> <p>Using the full genomic sequences of over 1,302 prokaryotic and 140 eukaryotic species two datasets containing 1.2 and 6.1 million proteins were generated and analyzed statistically. The lengthwise distribution of proteins can be roughly described with a gamma type or log-normal model, depending on the species. However the shape parameter of the gamma model has not a fixed value of 2, as previously suggested, but varies between 1.5 and 3 in different species. A gamma model with unrestricted shape parameter described best the distributions in ~48% of the species, whereas the log-normal distribution described better the observed protein sizes in 42% of the species. The gamma restricted function and the sum of exponentials distribution had a better fitting in only ~5% of the species. Eukaryotic proteins have an average size of 472 aa, whereas bacterial (320 aa) and archaeal (283 aa) proteins are significantly smaller (33-40% on average). Average protein sizes in different phylogenetic groups were: Alveolata (628 aa), Amoebozoa (533 aa), Fornicata (543 aa), Placozoa (453 aa), Eumetazoa (486 aa), Fungi (487 aa), Stramenopila (486 aa), Viridiplantae (392 aa). Amino acid composition is biased according to protein size. Protein length correlated negatively with %C, %M, %K, %F, %R, %W, %Y and positively with %D, %E, %Q, %S and %T. Prokaryotic proteins had a different protein size bias for %E, %G, %K and %M as compared to eukaryotes.</p> <p>Conclusions</p> <p>Mathematical modeling of protein length empirical distributions can be used to asses the quality of small ORFs annotation in genomic releases (detection of too many false positive small ORFs). There is a negative correlation between average protein size and total number of proteins among eukaryotes but not in prokaryotes. The %GC content is positively correlated to total protein number and protein size in prokaryotes but not in eukaryotes. Small proteins have a different amino acid bias than larger proteins. Compared to prokaryotic species, the evolution of eukaryotic proteomes was characterized by increased protein number (massive gene duplication) and substantial changes of protein size (domain addition/subtraction).</p
Crystal Structures of the ATPase Domains of Four Human Hsp70 Isoforms: HSPA1L/Hsp70-hom, HSPA2/Hsp70-2, HSPA6/Hsp70B', and HSPA5/BiP/GRP78
The 70-kDa heat shock proteins (Hsp70) are chaperones with central roles in processes that involve polypeptide remodeling events. Hsp70 proteins consist of two major functional domains: an N-terminal nucleotide binding domain (NBD) with ATPase activity, and a C-terminal substrate binding domain (SBD). We present the first crystal structures of four human Hsp70 isoforms, those of the NBDs of HSPA1L, HSPA2, HSPA5 and HSPA6. As previously with Hsp70 family members, all four proteins crystallized in a closed cleft conformation, although a slight cleft opening through rotation of subdomain IIB was observed for the HSPA5-ADP complex. The structures presented here support the view that the NBDs of human Hsp70 function by conserved mechanisms and contribute little to isoform specificity, which instead is brought about by the SBDs and by accessory proteins.This article can also be viewed as an enhanced version in which the text of the article is integrated with interactive 3D representations and animated transitions. Please note that a web plugin is required to access this enhanced functionality. Instructions for the installation and use of the web plugin are available in Text S1
The HSP70 modulator MAL3-101 inhibits Merkel cell carcinoma
Merkel Cell Carcinoma (MCC) is a rare and highly aggressive neuroendocrine skin cancer for which no effective treatment is available. MCC represents a human cancer with the best experimental evidence for a causal role of a polyoma virus. Large T antigens (LTA) encoded by polyoma viruses are oncoproteins, which are thought to require support of cellular heat shock protein 70 (HSP70) to exert their transforming activity. Here we evaluated the capability of MAL3-101, a synthetic HSP70 inhibitor, to limit proliferation and survival of various MCC cell lines. Remarkably, MAL3-101 treatment resulted in considerable apoptosis in 5 out of 7 MCC cell lines. While this effect was not associated with the viral status of the MCC cells, quantitative mRNA expression analysis of the known HSP70 isoforms revealed a significant correlation between MAL3-101 sensitivity and HSC70 expression, the most prominent isoform in all cell lines. Moreover, MAL3-101 also exhibited in vivo antitumor activity in an MCC xenograft model suggesting that this substance or related compounds are potential therapeutics for the treatment of MCC in the future. © 2014 Adam et al
- …