112 research outputs found
Limitations of current high-throughput sequencing technologies lead to biased expression estimates of endogenous retroviral elements
Human endogenous retroviruses (HERVs), the remnants of ancient germline retroviral integrations, comprise almost 8% of the human genome. The elucidation of their biological roles is hampered by our inability to link HERV mRNA and protein production with specific HERV loci. To solve the riddle of the integration-specific RNA expression of HERVs, several bioinformatics approaches have been proposed; however, no single process seems to yield optimal results due to the repetitiveness of HERV integrations. The performance of existing data-bioinformatics pipelines has been evaluated against real world datasets whose true expression profile is unknown, thus the accuracy of widely-used approaches remains unclear. Here, we simulated mRNA production from specific HERV integrations to evaluate second and third generation sequencing technologies along with widely used bioinformatic approaches to estimate the accuracy in describing integration-specific expression. We demonstrate that, while a HERV-family approach offers accurate results, per-integration analyses of HERV expression suffer from substantial expression bias, which is only partially mitigated by algorithms developed for calculating the per-integration HERV expression, and is more pronounced in recent integrations. Hence, this bias could erroneously result into biologically meaningful inferences. Finally, we demonstrate the merits of accurate long-read high-throughput sequencing technologies in the resolution of per-locus HERV expression
Larger mammalian body size leads to lower retroviral activity
Retroviruses have been infecting mammals for at least 100 million years, leaving descendants in host genomes known as endogenous retroviruses (ERVs). The abundance of ERVs is partly determined by their mode of replication, but it has also been suggested that host life history traits could enhance or suppress their activity. We show that larger bodied species have lower levels of ERV activity by reconstructing the rate of ERV integration across 38 mammalian species. Body size explains 37% of the variance in ERV integration rate over the last 10 million years, controlling for the effect of confounding due to other life history traits. Furthermore, 68% of the variance in the mean age of ERVs per genome can also be explained by body size. These results indicate that body size limits the number of recently replicating ERVs due to their detrimental effects on their host. To comprehend the possible mechanistic links between body size and ERV integration we built a mathematical model, which shows that ERV abundance is favored by lower body size and higher horizontal transmission rates. We argue that because retroviral integration is tumorigenic, the negative correlation between body size and ERV numbers results from the necessity to reduce the risk of cancer, under the assumption that this risk scales positively with body size. Our model also fits the empirical observation that the lifetime risk of cancer is relatively invariant among mammals regardless of their body size, known as Peto's paradox, and indicates that larger bodied mammals may have evolved mechanisms to limit ERV activity
Nanopore sequencing and full genome de novo assembly of human cytomegalovirus TB40/E reveals clonal diversity and structural variations.
BACKGROUND: Human cytomegalovirus (HCMV) has a double-stranded DNA genome of approximately 235 Kbp that is structurally complex including extended GC-rich repeated regions. Genomic recombination events are frequent in HCMV cultures but have also been observed in vivo. Thus, the assembly of HCMV whole genomes from technologies producing shorter than 500 bp sequences is technically challenging. Here we improved the reconstruction of HCMV full genomes by means of a hybrid, de novo genome-assembly bioinformatics pipeline upon data generated from the recently released MinION MkI B sequencer from Oxford Nanopore Technologies. RESULTS: The MinION run of the HCMV (strain TB40/E) library resulted in ~ 47,000 reads from a single R9 flowcell and in ~ 100× average read depth across the virus genome. We developed a novel, self-correcting bioinformatics algorithm to assemble the pooled HCMV genomes in three stages. In the first stage of the bioinformatics algorithm, long contigs (N50 = 21,892) of lower accuracy were reconstructed. In the second stage, short contigs (N50 = 5686) of higher accuracy were assembled, while in the final stage the high quality contigs served as template for the correction of the longer contigs resulting in a high-accuracy, full genome assembly (N50 = 41,056). We were able to reconstruct a single representative haplotype without employing any scaffolding steps. The majority (98.8%) of the genomic features from the reference strain were accurately annotated on this full genome construct. Our method also allowed the detection of multiple alternative sub-genomic fragments and non-canonical structures suggesting rearrangement events between the unique (UL /US) and the repeated (T/IRL/S) genomic regions. CONCLUSIONS: Third generation high-throughput sequencing technologies can accurately reconstruct full-length HCMV genomes including their low-complexity and highly repetitive regions. Full-length HCMV genomes could prove crucial in understanding the genetic determinants and viral evolution underpinning drug resistance, virulence and pathogenesis
Satellite Earth Observation Data in Epidemiological Modeling of Malaria, Dengue and West Nile Virus: A Scoping Review
Earth Observation (EO) data can be leveraged to estimate environmental variables that influence the transmission cycle of the pathogens that lead to mosquito-borne diseases (MBDs). The aim of this scoping review is to examine the state-of-the-art and identify knowledge gaps on the latest methods that used satellite EO data in their epidemiological models focusing on malaria, dengue and West Nile Virus (WNV). In total, 43 scientific papers met the inclusion criteria and were considered in this review. Researchers have examined a wide variety of methodologies ranging from statistical to machine learning algorithms. A number of studies used models and EO data that seemed promising and claimed to be easily replicated in different geographic contexts, enabling the realization of systems on regional and national scales. The need has emerged to leverage furthermore new powerful modeling approaches, like artificial intelligence and ensemble modeling and explore new and enhanced EO sensors towards the analysis of big satellite data, in order to develop accurate epidemiological models and contribute to the reduction of the burden of MBDs
Persistence of frequently transmitted drug-resistant HIV-1 variants can be explained by high viral replication capacity
Background: In approximately 10% of newly diagnosed individuals in Europe, HIV-1 variants harboring transmitted drug resistance mutations (TDRM) are detected. For some TDRM it has been shown that they revert to wild type while other mutations persist in the absence of therapy. To understand the mechanisms explaining persistence we investigated the in vivo evolution of frequently transmitted HIV-1 variants and their impact on in vitro replicative capacity. Results: We selected 31 individuals infected with HIV-1 harboring frequently observed TDRM such as M41L or K103N in reverse transcriptase (RT) or M46L in protease. In all these samples, polymorphisms at non-TDRM positions were present at baseline (median protease: 5, RT: 6). Extensive analysis of viral evolution of protease and RT demonstrated that the majority of TDRM (51/55) persisted for at least a year and even up to eight years in the plasma. D
Potential for diagnosis of infectious disease from the 100,000 Genomes Project Metagenomic Dataset: Recommendations for reporting results
The identification of microbiological infection is usually a diagnostic investigation, a complex process that is firstly initiated by clinical suspicion. With the emergence of high-throughput sequencing (HTS) technologies, metagenomic analysis has unveiled the power to identify microbial DNA/RNA from a diverse range of clinical samples (1). Metagenomic analysis of whole human genomes at the clinical/research interface bypasses the steps of clinical scrutiny and targeted testing and has the potential to generate unexpected findings relating to infectious and sometimes transmissible disease. There is no doubt that microbial findings that may have a significant impact on a patient’s treatment and their close contacts should be reported to those with clinical responsibility for the sample-donating patient. There are no clear recommendations on how such findings that are incidental, or outside the original investigation, should be handled. Here we aim to provide an informed protocol for the management of incidental microbial findings as part of the 100,000 Genomes Projectwhich may have broader application in this emerging field. As with any other clinical information, we aim to prioritise the reporting of data that are most likely to be of benefit to the patient and their close contacts. We also set out to minimize risks, costs and potential anxiety associated with the reporting of results that are unlikely to be of clinical significance. Our recommendations aim to support the practice of microbial metagenomics by providing a simplified pathway that can be applied to reporting the identification of potential pathogens from metagenomic datasets. Given that the ambition for UK sequenced human genomes over the next 5 years has been set to reach 5 million and the field of metagenomics is rapidly evolving, the guidance will be regularly reviewed and will likely adapt over time as experience develops
Upregulation of Human Endogenous Retroviruses in Bronchoalveolar Lavage Fluid of COVID-19 Patients
Severe COVID-19 pneumonia has been associated with the development of intense inflammatory responses during the course of infections with SARS-CoV-2. Given that human endogenous retroviruses (HERVs) are known to be activated during and participate in inflammatory processes, we examined whether HERV dysregulation signatures are present in COVID-19 patients. By comparing transcriptomes of bronchoalveolar lavage fluid (BALF) of COVID-19 patients and healthy controls, and peripheral blood monocytes (PBMCs) from patients and controls, we have shown that HERVs are intensely dysregulated in BALF of COVID-19 patients compared to those in BALF of healthy control patients but not in PBMCs. In particular, upregulation in the expression of specific HERV families was detected in BALF samples of COVID-19 patients, with HERV-FRD being the most highly upregulated family among the families analyzed. In addition, we compared the expression of HERVs in human bronchial epithelial cells (HBECs) without and after senescence induction in an oncogene-induced senescence model in order to quantitatively measure changes in the expression of HERVs in bronchial cells during the process of cellular senescence. This apparent difference of HERV dysregulation between PBMCs and BALF warrants further studies in the involvement of HERVs in inflammatory pathogenetic mechanisms as well as exploration of HERVs as potential biomarkers for disease progression. Furthermore, the increase in the expression of HERVs in senescent HBECs in comparison to that in noninduced HBECs provides a potential link for increased COVID-19 severity and mortality in aged populations
SARS-CoV-2 Molecular Transmission Clusters and Containment Measures in Ten European Regions during the First Pandemic Wave
International audienceBackground: The spatiotemporal profiling of molecular transmission clusters (MTCs) using viral genomic data can effectively identify transmission networks in order to inform public health actions targeting SARS-CoV-2 spread. Methods: We used whole genome SARS-CoV-2 sequences derived from ten European regions belonging to eight countries to perform phylogenetic and phylodynamic analysis. We developed dedicated bioinformatics pipelines to identify regional MTCs and to assess demographic factors potentially associated with their formation. Results: The total number and the scale of MTCs varied from small household clusters identified in all regions, to a super-spreading event found in Uusimaa-FI. Specific age groups were more likely to belong to MTCs in different regions. The clustered sequences referring to the age groups 50–100 years old (y.o.) were increased in all regions two weeks after the establishment of the lockdown, while those referring to the age group 0–19 y.o. decreased only in those regions where schools’ closure was combined with a lockdown. Conclusions: The spatiotemporal profiling of the SARS-CoV-2 MTCs can be a useful tool to monitor the effectiveness of the interventions and to reveal cryptic transmissions that have not been identified through contact tracing
- …