300 research outputs found
Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes
Background: Phosphorylation is the most frequent post-translational modification made to proteins and may regulate protein activity as either a molecular digital switch or a rheostat. Despite the cornucopia of high-throughput (HTP) phosphoproteomic data in the last decade, it remains unclear how many proteins are phosphorylated and how many phosphorylation sites (p-sites) can exist in total within a eukaryotic proteome. We present the first reliable estimates of the total number of phosphoproteins and p-sites for four eukaryotes (human, mouse, Arabidopsis, and yeast).
Results: In all, 187 HTP phosphoproteomic datasets were filtered, compiled, and studied along with two low-throughput (LTP) compendia. Estimates of the number of phosphoproteins and p-sites were inferred by two methods: Capture-Recapture, and fitting the saturation curve of cumulative redundant vs. cumulative non-redundant phosphoproteins/p-sites. Estimates were also adjusted for different levels of noise within the individual datasets and other confounding factors. We estimate that in total, 13 000, 11 000, and 3000 phosphoproteins and 230 000, 156 000, and 40 000 p-sites exist in human, mouse, and yeast, respectively, whereas estimates for Arabidopsis were not as reliable.
Conclusions: Most of the phosphoproteins have been discovered for human, mouse, and yeast, while the dataset for Arabidopsis is still far from complete. The datasets for p-sites are not as close to saturation as those for phosphoproteins. Integration of the LTP data suggests that current HTP phosphoproteomics appears to be capable of capturing 70% to 95% of total phosphoproteins, but only 40% to 60% of total p-sites
Comparative analysis of the core proteomes among the Pseudomonas major evolutionary groups reveals species-specific adaptations for Pseudomonas aeruginosa and Pseudomonas chlororaphis
The Pseudomonas genus includes many species living in diverse environments and hosts. It is important to understand which are the major evolutionary groups and what are the genomic/proteomic components they have in common or are unique. Towards this goal, we analyzed 494 complete Pseudomonas proteomes and identified 297 core-orthologues. The subsequent phylogenomic analysis revealed two well-defined species (Pseudomonas aeruginosa and Pseudomonas chlororaphis) and four wider phylogenetic groups (Pseudomonas fluorescens, Pseudomonas stutzeri, Pseudomonas syringae, Pseudomonas putida) with a sufficient number of proteomes. As expected, the genus-level core proteome was highly enriched for proteins involved in metabolism, translation, and transcription. In addition, between 39–70% of the core proteins in each group had a significant presence in each of all the other groups. Group-specific core proteins were also identified, with P. aeruginosa having the highest number of these and P. fluorescens having none. We identified several P. aeruginosa-specific core proteins (such as CntL, CntM, PlcB, Acp1, MucE, SrfA, Tse1, Tsi2, Tse3, and EsrC) that are known to play an important role in its pathogenicity. Finally, a holin family bacteriocin and a mitomycin-like biosynthetic protein were found to be core-specific for P. cholororaphis and we hypothesize that these proteins may confer a competitive advantage against other root-colonizers.</jats:p
The Evolution of Protein Interaction Networks in Regulatory Proteins
Interactions between proteins are essential for intracellular communication. They
form complex networks which have become an important source for functional
analysis of proteins. Combining phylogenies with network analysis, we investigate
the evolutionary history of interaction networks from the bHLH, NR and bZIP
transcription-factor families. The bHLH and NR networks show a hub-like structure
with varying γ values. Mutation and gene duplication play an important role
in adding and removing interactions. We conclude that in several of the protein
families that we have studied, networks have primarily arisen by the development of
heterodimerizing transcription factors, from an ancestral gene which interacts with
any of the newly emerging proteins but also homodimerizes
Spectral Analysis of Protein-Protein Interactions in Drosophila melanogaster
Within a case study on the protein-protein interaction network (PIN) of
Drosophila melanogaster we investigate the relation between the network's
spectral properties and its structural features such as the prevalence of
specific subgraphs or duplicate nodes as a result of its evolutionary history.
The discrete part of the spectral density shows fingerprints of the PIN's
topological features including a preference for loop structures. Duplicate
nodes are another prominent feature of PINs and we discuss their representation
in the PIN's spectrum as well as their biological implications.Comment: 9 pages RevTeX including 8 figure
Reduction/oxidation-phosphorylation control of DNA binding in the bZIP dimerization network
BACKGROUND: bZIPs are transcription factors that are found throughout the eukarya from fungi to flowering plants and mammals. They contain highly conserved basic region (BR) and leucine zipper (LZ) domains and often function as environmental sensors. Specifically, bZIPs frequently have a role in mediating the response to oxidative stress, a crucial environmental signal that needs to be transduced to the gene regulatory network. RESULTS: Based on sequence comparisons and experimental data on a number of important bZIP transcription factors, we predict which bZIPs are under redox control and which are regulated via protein phosphorylation. By integrating genomic, phylogenetic and functional data from the literature, we then propose a link between oxidative stress and the choice of interaction partners for the bZIP proteins. CONCLUSION: This integration permits the bZIP dimerization network to be interpreted in functional terms, especially in the context of the role of bZIP proteins in the response to environmental stress. This analysis demonstrates the importance of abiotic factors in shaping regulatory networks
The Remarkable Evolutionary Plasticity of Coronaviruses by Mutation and Recombination: Insights for the COVID-19 Pandemic and the Future Evolutionary Paths of SARS-CoV-2.
Coronaviruses (CoVs) constitute a large and diverse subfamily of positive-sense single-stranded RNA viruses. They are found in many mammals and birds and have great importance for the health of humans and farm animals. The current SARS-CoV-2 pandemic, as well as many previous epidemics in humans that were of zoonotic origin, highlights the importance of studying the evolution of the entire CoV subfamily in order to understand how novel strains emerge and which molecular processes affect their adaptation, transmissibility, host/tissue tropism, and patho non-homologous genicity. In this review, we focus on studies over the last two years that reveal the impact of point mutations, insertions/deletions, and intratypic/intertypic homologous and non-homologous recombination events on the evolution of CoVs. We discuss whether the next generations of CoV vaccines should be directed against other CoV proteins in addition to or instead of spike. Based on the observed patterns of molecular evolution for the entire subfamily, we discuss five scenarios for the future evolutionary path of SARS-CoV-2 and the COVID-19 pandemic. Finally, within this evolutionary context, we discuss the recently emerged Omicron (B.1.1.529) VoC
An exploration of alternative visualisations of the basic helix-loop-helix protein interaction network
Background: Alternative representations of biochemical networks emphasise different aspects of the data and contribute to the understanding of complex biological systems. In this study we present a variety of automated methods for visualisation of a protein-protein interaction network, using the basic helix-loop-helix ( bHLH) family of transcription factors as an example.
Results: Network representations that arrange nodes ( proteins) according to either continuous or discrete information are investigated, revealing the existence of protein sub-families and the retention of interactions following gene duplication events. Methods of network visualisation in conjunction with a phylogenetic tree are presented, highlighting the evolutionary relationships between proteins, and clarifying the context of network hubs and interaction clusters. Finally, an optimisation technique is used to create a three-dimensional layout of the phylogenetic tree upon which the protein-protein interactions may be projected.
Conclusion: We show that by incorporating secondary genomic, functional or phylogenetic information into network visualisation, it is possible to move beyond simple layout algorithms based on network topology towards more biologically meaningful representations. These new visualisations can give structure to complex networks and will greatly help in interpreting their evolutionary origins and functional implications. Three open source software packages (InterView, TVi and OptiMage) implementing our methods are available
The neighborhood of the Spike gene is a hotspot for modular intertypic homologous and non-homologous recombination in Coronavirus genomes.
Coronaviruses (CoVs) have very large RNA viral genomes with a distinct genomic architecture of core and accessory open reading frames (ORFs). It is of utmost importance to understand their patterns and limits of homologous and non-homologous recombination, because such events may affect the emergence of novel CoV strains, alter their host range, infection rate, tissue tropism pathogenicity, and their ability to escape vaccination programs. Intratypic recombination among closely related CoVs of the same subgenus has often been reported; however, the patterns and limits of genomic exchange between more distantly related CoV lineages (intertypic recombination) needs further investigation. Here, we report computational/evolutionary analyses that clearly demonstrate a substantial ability for CoVs of different subgenera to recombine. Furthermore, we show that CoVs can obtain-through non-homologous recombination-accessory ORFs from core ORFs, exchange accessory ORFs with different CoV genera, with other viruses (i.e., toroviruses, influenza C/D, reoviruses, rotaviruses, astroviruses) and even with hosts. Intriguingly, most of these radical events result from double-crossovers surrounding the Spike ORF, thus highlighting both the instability and mobile nature of this genomic region. While many such events have often occurred during the evolution of various CoVs, the genomic architecture of the relatively young SARS-CoV/SARS-CoV-2 lineage so far appears to be stable
The Complex Evolutionary History of Aminoacyl-tRNA Synthetases
Aminoacyl-tRNA synthetases (AARSs) are a superfamily of enzymes responsible for the faithful translation of the genetic code and have lately become a prominent target for synthetic biologists. Our large-scale analysis of \u3e2500 prokaryotic genomes reveals the complex evolutionary history of these enzymes and their paralogs, in which horizontal gene transfer played an important role. These results show that a widespread belief in the evolutionary stability of this superfamily is misconceived. Although AlaRS, GlyRS, LeuRS, IleRS, ValRS are the most stable members of the family, GluRS, LysRS and CysRS often have paralogs, whereas AsnRS, GlnRS, PylRS and SepRS are often absent from many genomes. In the course of this analysis, highly conserved protein motifs and domains within each of the AARS loci were identified and used to build a web-based computational tool for the genome-wide detection of AARS coding sequences. This is based on hidden Markov models (HMMs) and is available together with a cognate database that may be used for specific analyses. The bioinformatics tools that we have developed may also help to identify new antibiotic agents and targets using these essential enzymes. These tools also may help to identify organisms with alternative pathways that are involved in maintaining the fidelity of the genetic code
A comparative analysis of transcriptomics of newly diagnosed multiple myeloma: exploring drug repurposing
Multiple myeloma (MM) is an incurable malignant plasma cell disorder characterized by the infiltration of clonal plasma cells in the bone marrow compartment. Gene Expression Profiling (GEP) has emerged as a powerful investigation tool in modern myeloma research enabling the dissection of the molecular background of MM and allowing the identification of gene products that could potentially serve as targets for therapeutic intervention. In this study we investigated shared transcriptomic abnormalities across newly diagnosed multiple myeloma (NDMM) patient cohorts. In total, publicly available transcriptomic data of 7 studies from CD138+ cells from 281 NDMM patients and 44 healthy individuals were integrated and analyzed. Overall, we identified 28 genes that were consistently differentially expressed (DE) between NDMM patients and healthy donors (HD) across various studies. Of those, 9 genes were over/under-expressed in more than 75% of NDMM patients. In addition, we identified 4 genes (MT1F, PURPL, LINC01239 and LINC01480) that were not previously considered to participate in MM pathogenesis. Meanwhile, by mining three drug databases (ChEMBL, IUPHAR/BPS and DrugBank) we identified 31 FDA-approved and 144 experimental drugs that target 8 of these 28 over/under-expressed MM genes. Taken together, our study offers new insights in MM pathogenesis and importantly, it reveals potential new treatment options that need to be further investigated in future studies
- …