7 research outputs found

    Common low complexity regions for SARS-CoV-2 and human proteomes as potential multidirectional risk factor in vaccine development

    Get PDF
    Background The rapid spread of the COVID-19 demands immediate response from the scientific communities. Appropriate countermeasures mean thoughtful and educated choice of viral targets (epitopes). There are several articles that discuss such choices in the SARS-CoV-2 proteome, other focus on phylogenetic traits and history of the Coronaviridae genome/proteome. However none consider viral protein low complexity regions (LCRs). Recently we created the first methods that are able to compare such fragments. Results We show that five low complexity regions (LCRs) in three proteins (nsp3, S and N) encoded by the SARS-CoV-2 genome are highly similar to regions from human proteome. As many as 21 predicted T-cell epitopes and 27 predicted B-cell epitopes overlap with the five SARS-CoV-2 LCRs similar to human proteins. Interestingly, replication proteins encoded in the central part of viral RNA are devoid of LCRs. Conclusions Similarity of SARS-CoV-2 LCRs to human proteins may have implications on the ability of the virus to counteract immune defenses. The vaccine targeted LCRs may potentially be ineffective or alternatively lead to autoimmune diseases development. These findings are crucial to the process of selection of new epitopes for drugs or vaccines which should omit such regions

    Insights from analyses of low complexity regions with canonical methods for protein sequence comparison

    No full text
    Low complexity regions are fragments of protein sequences composed of only a few types of amino acids. These regions frequently occur in proteins and can play an important role in their functions. However, scientists are mainly focused on regions characterized by high diversity of amino acid composition. Similarity between regions of protein sequences frequently reflect functional similarity between them. In this article, we discuss strengths and weaknesses of the similarity analysis of low complexity regions using BLAST, HHblits and CD-HIT. These methods are considered to be the gold standard in protein similarity analysis and were designed for comparison of high complexity regions. However, we lack specialized methods that could be used to compare the similarity of low complexity regions. Therefore, we investigated the existing methods in order to understand how they can be applied to compare such regions. Our results are supported by exploratory study, discussion of amino acid composition and biological roles of selected examples. We show that existing methods need improvements to efficiently search for similar low complexity regions. We suggest features that have to be re-designed specifically for comparing low complexity regions: scoring matrix, multiple sequence alignment, e-value, local alignment and clustering based on a set of representative sequences. Results of this analysis can either be used to improve existing methods or to create new methods for the similarity analysis of low complexity regions

    Providing Molecular Characterization for Unexplained Adverse Drug Reactions: Podium Abstract

    Get PDF
    Podium Abstract at MedInfo 2019, Lyon, FranceMining large drug-oriented knowledge graphs enables predicting Adverse Drug Reactions (ADRs). Indeed, these graphs encompass knowledge elements about the molecular mechanism of drugs (e.g. drug targets, Gene Ontology annotations, gene variations, pathways). However, only few works explored further these graphs in the search for mechanistic explanation for this type of events. We assume that features documenting molecular mechanisms that take part in the prediction are particularly interesting features, since they may provide novel knowledge for the mechanism that may be underlying an ADR. We propose to explore PGxLOD, a knowledge graph built around drugs and pharmacogenomic processes in which they are involved, through the lens of several ADR datasets, each focusing on a particular type of ADRs. Particularly, we propose to use features resulting from the exploration of PGxLOD in a prediction task where best predictive features will be considered as potential elements of explanation

    PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins

    No full text
    Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToLoCo - PLAtform of TOols for LOw COmplexity—a meta-server that integrates and collects the output of five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. In addition, the union or intersection of the results of the search on a query sequence can be obtained. By developing the PlaToLoCo meta-server, we provide the community with a fast and easily accessible tool for the analysis of LCRs with additional information included to aid the interpretation of the results. The PlaToLoCo platform is available at: http://platoloco.aei.polsl.pl/

    Quantitative Conformational Analysis of Functionally Important Electrostatic Interactions in the Intrinsically Disordered Region of Delta Subunit of Bacterial RNA Polymerase

    No full text
    International audienceElectrostatic interactions play important roles in the functional mechanisms exploited by intrinsically disordered proteins (IDPs). The atomic resolution description of long-range and local structural propensities that can both be crucial for the function of highly charged IDPs presents significant experimental challenges. Here, we investigate the conformational behavior of the δ subunit of RNA polymerase from Bacillus subtilis whose unfolded domain is highly charged, with 7 positively charged amino acids followed by 51 acidic amino acids. Using a specifically designed analytical strategy, we identify transient contacts between the two regions using a combination of NMR paramagnetic relaxation enhancements, residual dipolar couplings (RDCs), chemical shifts, and small-angle scattering. This strategy allows the resolution of long-range and local ensemble averaged structural contributions to the experimental RDCs, and reveals that the negatively charged segment folds back onto the positively charged strand, compacting the conformational sampling of the protein while remaining highly flexible in solution. Mutation of the positively charged region abrogates the long-range contact, leaving the disordered domain in an extended conformation, possibly due to local repulsion of like-charges along the chain. Remarkably, in vitro studies show that this mutation also has a significant effect on transcription activity, and results in diminished cell fitness of the mutated bacteria in vivo. This study highlights the importance of accurately describing electrostatic interactions for understanding the functional mechanisms of IDPs

    Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

    Get PDF
    The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others
    corecore