50 research outputs found

    Seeking an optimal variant calling pipeline for medical genetics

    Get PDF
    Accurate and comprehensive variant discovery is extremely important for rare disease diagnostics using next-generation sequencing (NGS) methods. Over the recent years, a plethora of methods have been developed for short variant calling from NGS data, and the most recent tools extensively use machine learning algorithms for both variant discovery and filtering. In our study, we took an effort to systematically evaluate the performance of different pipelines for short variant calling in the human genome. To perform such a systematic comparison, we collected a large dataset of both “gold standard” (provided by the Genome In A Bottle (GIAB) consortium) and in-house wholeexome sequencing (WES) and whole-genome sequencing (WGS) datasets. (a total of 20 different datasets was used). We tested all combinations of 4 popular short read aligners (BWA, Bowtie2, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Freebayes, Clair3, DeepVariant, Genome Analysis ToolKit (GATK), Octopus, Strelka2). We also used several different tools for preprocessing of short reads. Our analysis showed negligible effects of adapter trimming on the accuracy of short variant calling. Among read aligners, Bowtie2 performed significantly worse than other tools, suggesting it should not be used for medical variant calling. For pipelines based on BWA, Isaac, and Novoalign, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. DeepVariant consistently showed the best performance and the greatest robustness compared to all other tested variant callers. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting. Taken together, our study showed that modern strategies for NGS data analysis allow for high accuracy of genetic variant discovery within coding regions of the human genome. However, there is still a need for development of new library preparation and variant calling methods to enhance variant discovery in the challenging regions of the human genome.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202

    Transcriptomic profiling of Escherichia coli K-12 in response to a compendium of stressors

    Get PDF
    Environmental perturbations impact multiple cellular traits, including gene expression. Bacteria respond to these stressful situations through complex gene interaction networks, thereby inducing stress tolerance and survival of cells. In this paper, we study the response mechanisms of E. coli when exposed to different environmental stressors via differential expression and co-expression analysis. Gene co-expression networks were generated and analyzed via Weighted Gene Co-expression Network Analysis (WGCNA). Based on the gene co-expression networks, genes with similar expression profiles were clustered into modules. The modules were analysed for identification of hub genes, enrichment of biological processes and transcription factors. In addition, we also studied the link between transcription factors and their differentially regulated targets to understand the regulatory mechanisms involved. These networks validate known gene interactions and provide new insights into genes mediating transcriptional regulation in specific stress environments, thus allowing for in silico hypothesis generation

    Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage

    Get PDF
    Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated large-scale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3(rd) generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the similar to 500kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost

    Genome-wide fitness analysis identifies genes required for in vitro growth and macrophage infection by African and global epidemic pathovariants of Salmonella enterica Enteritidis

    Get PDF
    Salmonella enterica Enteritidis is the second most common serovar associated with invasive non-typhoidal Salmonella (iNTS) disease in sub-Saharan Africa. Previously, genomic and phylogenetic characterization of S . enterica Enteritidis isolates from the human bloodstream led to the discovery of the Central/Eastern African clade (CEAC) and West African clade, which were distinct from the gastroenteritis-associated global epidemic clade (GEC). The African S . enterica Enteritidis clades have unique genetic signatures that include genomic degradation, novel prophage repertoires and multi-drug resistance, but the molecular basis for the enhanced propensity of African S . enterica Enteritidis to cause bloodstream infection is poorly understood. We used transposon insertion sequencing (TIS) to identify the genetic determinants of the GEC representative strain P125109 and the CEAC representative strain D7795 for growth in three in vitro conditions (LB or minimal NonSPI2 and InSPI2 growth media), and for survival and replication in RAW 264.7 murine macrophages. We identified 207 in vitro-required genes that were common to both S . enterica Enteritidis strains and also required by S . enterica Typhimurium, S . enterica Typhi and Escherichia coli , and 63 genes that were only required by individual S . enterica Enteritidis strains. Similar types of genes were required by both P125109 and D7795 for optimal growth in particular media. Screening the transposon libraries during macrophage infection identified 177 P125109 and 201 D7795 genes that contribute to bacterial survival and replication in mammalian cells. The majority of these genes have proven roles in Salmonella virulence. Our analysis uncovered candidate strain-specific macrophage fitness genes that could encode novel Salmonella virulence factors

    Acute response to pathogens in the early human placenta at single-cell resolution

    Get PDF
    The placenta is a selective maternal-fetal barrier that provides nourishment and protection from infections. However, certain pathogens can attach to and even cross the placenta, causing pregnancy complications with potential lifelong impacts on the child's health. Here, we profiled at the single-cell level the placental responses to three pathogens associated with intrauterine complications—Plasmodium falciparum, Listeria monocytogenes, and Toxoplasma gondii. We found that upon exposure to the pathogens, all placental lineages trigger inflammatory responses that may compromise placental function. Additionally, we characterized the responses of fetal macrophages known as Hofbauer cells (HBCs) to each pathogen and propose that they are the probable niche for T. gondii. Finally, we revealed how P. falciparum adapts to the placental microenvironment by modulating protein export into the host erythrocyte and nutrient uptake pathways. Altogether, we have defined the cellular networks and signaling pathways mediating acute placental inflammatory responses that could contribute to pregnancy complications.</p

    Niche-specific profiling reveals transcriptional adaptations required for the cytosolic lifestyle of <i>Salmonella enterica</i>

    Get PDF
    AbstractSalmonella enterica serovar Typhimurium (S. Typhimurium) is a zoonotic pathogen that causes diarrheal disease in humans and animals. During salmonellosis, S. Typhimurium colonizes epithelial cells lining the gastrointestinal tract. S. Typhimurium has an unusual lifestyle in epithelial cells that begins within an endocytic-derived Salmonella-containing vacuole (SCV), followed by escape into the cytosol, epithelial cell lysis and bacterial release. The cytosol is a more permissive environment than the SCV and supports rapid bacterial growth. The physicochemical conditions encountered by S. Typhimurium within the cytosol, and the bacterial genes required for cytosolic colonization, remain unknown. Here we have exploited the parallel colonization strategies of S. Typhimurium in epithelial cells to decipher the two niche-specific bacterial virulence programs. By combining a population-based RNA-seq approach with single-cell microscopic analysis, we identified bacterial genes/sRNAs with cytosol-specific or vacuole-specific expression signatures. Using these genes/sRNAs as environmental biosensors, we defined that Salmonella is exposed to iron and manganese deprivation and oxidative stress in the cytosol and zinc and magnesium deprivation in the SCV. Furthermore, iron availability was critical for optimal S. Typhimurium replication in the cytosol, as well as entC, fepB, soxS and sitA-mntH. Virulence genes that are typically associated with extracellular bacteria, namely Salmonella pathogenicity island 1 (SPI1) and SPI4, had a cytosolic-specific expression profile. Our study reveals that the cytosolic and vacuolar S. Typhimurium virulence gene programs are unique to, and tailored for, residence within distinct intracellular compartments. Therefore, this archetypical vacuole-adapted pathogen requires extensive transcriptional reprogramming to successfully colonize the mammalian cytosol.Author SummaryIntracellular pathogens reside either within a membrane-bound vacuole or are free-living in the cytosol and their virulence programs are tailored towards survival within a particular intracellular compartment. Some bacterial pathogens (such as Salmonella enterica) can successfully colonize both intracellular niches, but how they do so is unclear. Here we have exploited the parallel intracellular lifestyles of S. enterica in epithelial cells to identify the niche-specific bacterial expression profiles and environmental cues encountered by S. enterica. We have also discovered bacterial genes that are required for colonization of the cytosol, but not the vacuole. Our results advance our understanding of pathogen-adaptation to alternative replication niches and highlight an emerging concept in the field of bacteria-host cell interactions.</jats:sec

    Acute response to pathogens in the early human placenta at single-cell resolution

    Get PDF
    The placenta is a selective maternal-fetal barrier that provides nourishment and protection from infections. However, certain pathogens can attach to and even cross the placenta, causing pregnancy complications with potential lifelong impacts on the child's health. Here, we profiled at the single-cell level the placental responses to three pathogens associated with intrauterine complications—Plasmodium falciparum, Listeria monocytogenes, and Toxoplasma gondii. We found that upon exposure to the pathogens, all placental lineages trigger inflammatory responses that may compromise placental function. Additionally, we characterized the responses of fetal macrophages known as Hofbauer cells (HBCs) to each pathogen and propose that they are the probable niche for T. gondii. Finally, we revealed how P. falciparum adapts to the placental microenvironment by modulating protein export into the host erythrocyte and nutrient uptake pathways. Altogether, we have defined the cellular networks and signaling pathways mediating acute placental inflammatory responses that could contribute to pregnancy complications.</p

    <i>Salmonella enterica</i> serovar Typhimurium ST313 sublineage 2.2 has emerged in Malawi with a characteristic gene expression signature and a fitness advantage.

    Get PDF
    Invasive non-typhoidal Salmonella (iNTS) disease is a serious bloodstream infection that targets immune-compromised individuals, and causes significant mortality in sub-Saharan Africa. Salmonella enterica serovar Typhimurium ST313 causes the majority of iNTS in Malawi. We performed an intensive comparative genomic analysis of 608 S. Typhimurium ST313 isolates dating between 1996 and 2018 from Blantyre, Malawi. We discovered that following the arrival of the well-characterized S. Typhimurium ST313 lineage 2 in 1999, two multidrug-resistant variants emerged in Malawi in 2006 and 2008, designated sublineages 2.2 and 2.3, respectively. The majority of S. Typhimurium isolates from human bloodstream infections in Malawi now belong to sublineages 2.2 or 2.3. To understand the emergence of the prevalent ST313 sublineage 2.2, we studied two representative strains, D23580 (lineage 2) and D37712 (sublineage 2.2). The chromosome of ST313 lineage 2 and sublineage 2.2 only differed by 29 SNPs/small indels and a 3 kb deletion of a Gifsy-2 prophage region including the sseI pseudogene. Lineage 2 and sublineage 2.2 had distinctive plasmid profiles. The transcriptome was investigated in 15 infection-relevant in vitro conditions and within macrophages. During growth in physiological conditions that do not usually trigger S. Typhimurium SPI2 gene expression, the SPI2 genes of D37712 were transcriptionally active. We identified down-regulation of flagellar genes in D37712 compared with D23580. Following phenotypic confirmation of transcriptomic differences, we discovered that sublineage 2.2 had increased fitness compared with lineage 2 during mixed growth in minimal media. We speculate that this competitive advantage is contributing to the emergence of sublineage 2.2 in Malawi

    <i>Salmonella enterica</i>serovar Typhimurium ST313 sublineage 2.2 has emerged in Malawi with a characteristic gene expression signature and a fitness advantage

    Get PDF
    AbstractInvasive non-typhoidalSalmonella(iNTS) disease is a serious bloodstream infection that targets immune-compromised individuals, and causes significant mortality in sub-Saharan Africa.Salmonella entericaserovar Typhimurium ST313 causes the majority of iNTS in Malawi, and we performed an intensive comparative genomic analysis of 608 isolates obtained from fever surveillance at the Queen Elizabeth Hospital, Blantyre between 1996 and 2018. We discovered that following the upsurge of the well-characterisedS.Typhimurium ST313 lineage 2 from 1999 onwards, two new multidrug-resistant sublineages designated 2.2 and 2.3, emerged in Malawi in 2006 and 2008, respectively. The majority ofS.Typhimurium isolates from human bloodstream infections in Malawi now belong to sublineage 2.2 or 2.3. To identify factors that characterised the emergence of the prevalent ST313 sublineage 2.2, we performed genomic and functional analysis of two representative strains, D23580 (lineage 2) and D37712 (sublineage 2.2). Comparative genomic analysis showed that the chromosome of ST313 lineage 2 and sublineage 2.2 were broadly similar, only differing by 29 SNPs and small indels and a 3kb deletion in the Gifsy-2 prophage region that spanned thesseIpseudogene. Lineage 2 and sublineage 2.2 have unique plasmid profiles that were verified by long read sequencing. The transcriptome was initially explored in 15 infection-relevant conditions and within macrophages. Differential gene expression was subsequently investigated in depth in the four most importantin vitrogrowth conditions. We identified up-regulation of SPI2 genes in non-inducing conditions, and down-regulation of flagellar genes in D37712, compared to D23580. Following phenotypic confirmation of transcriptional differences, we discovered that sublineage 2.2 had increased fitness compared with lineage 2 during mixed-growth in minimal media. We speculate that this competitive advantage is contributing to the continuing presence of sublineage 2.2 in Malawi.</jats:p
    corecore