85 research outputs found

    Seeking an optimal variant calling pipeline for medical genetics

    Get PDF
    Accurate and comprehensive variant discovery is extremely important for rare disease diagnostics using next-generation sequencing (NGS) methods. Over the recent years, a plethora of methods have been developed for short variant calling from NGS data, and the most recent tools extensively use machine learning algorithms for both variant discovery and filtering. In our study, we took an effort to systematically evaluate the performance of different pipelines for short variant calling in the human genome. To perform such a systematic comparison, we collected a large dataset of both “gold standard” (provided by the Genome In A Bottle (GIAB) consortium) and in-house wholeexome sequencing (WES) and whole-genome sequencing (WGS) datasets. (a total of 20 different datasets was used). We tested all combinations of 4 popular short read aligners (BWA, Bowtie2, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Freebayes, Clair3, DeepVariant, Genome Analysis ToolKit (GATK), Octopus, Strelka2). We also used several different tools for preprocessing of short reads. Our analysis showed negligible effects of adapter trimming on the accuracy of short variant calling. Among read aligners, Bowtie2 performed significantly worse than other tools, suggesting it should not be used for medical variant calling. For pipelines based on BWA, Isaac, and Novoalign, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. DeepVariant consistently showed the best performance and the greatest robustness compared to all other tested variant callers. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting. Taken together, our study showed that modern strategies for NGS data analysis allow for high accuracy of genetic variant discovery within coding regions of the human genome. However, there is still a need for development of new library preparation and variant calling methods to enhance variant discovery in the challenging regions of the human genome.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202

    Transcriptomic profiling of Escherichia coli K-12 in response to a compendium of stressors

    Get PDF
    Environmental perturbations impact multiple cellular traits, including gene expression. Bacteria respond to these stressful situations through complex gene interaction networks, thereby inducing stress tolerance and survival of cells. In this paper, we study the response mechanisms of E. coli when exposed to different environmental stressors via differential expression and co-expression analysis. Gene co-expression networks were generated and analyzed via Weighted Gene Co-expression Network Analysis (WGCNA). Based on the gene co-expression networks, genes with similar expression profiles were clustered into modules. The modules were analysed for identification of hub genes, enrichment of biological processes and transcription factors. In addition, we also studied the link between transcription factors and their differentially regulated targets to understand the regulatory mechanisms involved. These networks validate known gene interactions and provide new insights into genes mediating transcriptional regulation in specific stress environments, thus allowing for in silico hypothesis generation

    Multiscale spatial mapping of cell populations across anatomical sites in healthy human skin and basal cell carcinoma

    Get PDF
    \ua9 2024 National Academy of Sciences. All rights reserved.Our understanding of how human skin cells differ according to anatomical site and tumour formation is limited. To address this, we have created a multiscale spatial atlas of healthy skin and basal cell carcinoma (BCC), incorporating in vivo optical coherence tomography, single-cell RNA sequencing, spatial global transcriptional profiling, and in situ sequencing. Computational spatial deconvolution and projection revealed the localisation of distinct cell populations to specific tissue contexts. Although cell populations were conserved between healthy anatomical sites and in BCC, mesenchymal cell populations including fibroblasts and pericytes retained signatures of developmental origin. Spatial profiling and in silico lineage tracing support a hair follicle origin for BCC and demonstrate that cancer-associated fibroblasts are an expansion of a POSTN+ subpopulation associated with hair follicles in healthy skin. RGS5+ pericytes are also expanded in BCC suggesting a role in vascular remodelling. We propose that the identity of mesenchymal cell populations is regulated by signals emanating from adjacent structures and that these signals are repurposed to promote the expansion of skin cancer stroma. The resource we have created is publicly available in an interactive format for the research community

    Acute response to pathogens in the early human placenta at single-cell resolution

    Get PDF
    The placenta is a selective maternal-fetal barrier that provides nourishment and protection from infections. However, certain pathogens can attach to and even cross the placenta, causing pregnancy complications with potential lifelong impacts on the child's health. Here, we profiled at the single-cell level the placental responses to three pathogens associated with intrauterine complications—Plasmodium falciparum, Listeria monocytogenes, and Toxoplasma gondii. We found that upon exposure to the pathogens, all placental lineages trigger inflammatory responses that may compromise placental function. Additionally, we characterized the responses of fetal macrophages known as Hofbauer cells (HBCs) to each pathogen and propose that they are the probable niche for T. gondii. Finally, we revealed how P. falciparum adapts to the placental microenvironment by modulating protein export into the host erythrocyte and nutrient uptake pathways. Altogether, we have defined the cellular networks and signaling pathways mediating acute placental inflammatory responses that could contribute to pregnancy complications.</p

    Acute response to pathogens in the early human placenta at single-cell resolution

    Get PDF
    The placenta is a selective maternal-fetal barrier that provides nourishment and protection from infections. However, certain pathogens can attach to and even cross the placenta, causing pregnancy complications with potential lifelong impacts on the child's health. Here, we profiled at the single-cell level the placental responses to three pathogens associated with intrauterine complications—Plasmodium falciparum, Listeria monocytogenes, and Toxoplasma gondii. We found that upon exposure to the pathogens, all placental lineages trigger inflammatory responses that may compromise placental function. Additionally, we characterized the responses of fetal macrophages known as Hofbauer cells (HBCs) to each pathogen and propose that they are the probable niche for T. gondii. Finally, we revealed how P. falciparum adapts to the placental microenvironment by modulating protein export into the host erythrocyte and nutrient uptake pathways. Altogether, we have defined the cellular networks and signaling pathways mediating acute placental inflammatory responses that could contribute to pregnancy complications.</p

    Genome-wide fitness analysis identifies genes required for in vitro growth and macrophage infection by African and global epidemic pathovariants of Salmonella enterica Enteritidis

    Get PDF
    Salmonella enterica Enteritidis is the second most common serovar associated with invasive non-typhoidal Salmonella (iNTS) disease in sub-Saharan Africa. Previously, genomic and phylogenetic characterization of S . enterica Enteritidis isolates from the human bloodstream led to the discovery of the Central/Eastern African clade (CEAC) and West African clade, which were distinct from the gastroenteritis-associated global epidemic clade (GEC). The African S . enterica Enteritidis clades have unique genetic signatures that include genomic degradation, novel prophage repertoires and multi-drug resistance, but the molecular basis for the enhanced propensity of African S . enterica Enteritidis to cause bloodstream infection is poorly understood. We used transposon insertion sequencing (TIS) to identify the genetic determinants of the GEC representative strain P125109 and the CEAC representative strain D7795 for growth in three in vitro conditions (LB or minimal NonSPI2 and InSPI2 growth media), and for survival and replication in RAW 264.7 murine macrophages. We identified 207 in vitro-required genes that were common to both S . enterica Enteritidis strains and also required by S . enterica Typhimurium, S . enterica Typhi and Escherichia coli , and 63 genes that were only required by individual S . enterica Enteritidis strains. Similar types of genes were required by both P125109 and D7795 for optimal growth in particular media. Screening the transposon libraries during macrophage infection identified 177 P125109 and 201 D7795 genes that contribute to bacterial survival and replication in mammalian cells. The majority of these genes have proven roles in Salmonella virulence. Our analysis uncovered candidate strain-specific macrophage fitness genes that could encode novel Salmonella virulence factors

    Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage

    Get PDF
    Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated large-scale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3(rd) generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the similar to 500kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost

    Niche-specific profiling reveals transcriptional adaptations required for the cytosolic lifestyle of <i>Salmonella enterica</i>

    Get PDF
    AbstractSalmonella enterica serovar Typhimurium (S. Typhimurium) is a zoonotic pathogen that causes diarrheal disease in humans and animals. During salmonellosis, S. Typhimurium colonizes epithelial cells lining the gastrointestinal tract. S. Typhimurium has an unusual lifestyle in epithelial cells that begins within an endocytic-derived Salmonella-containing vacuole (SCV), followed by escape into the cytosol, epithelial cell lysis and bacterial release. The cytosol is a more permissive environment than the SCV and supports rapid bacterial growth. The physicochemical conditions encountered by S. Typhimurium within the cytosol, and the bacterial genes required for cytosolic colonization, remain unknown. Here we have exploited the parallel colonization strategies of S. Typhimurium in epithelial cells to decipher the two niche-specific bacterial virulence programs. By combining a population-based RNA-seq approach with single-cell microscopic analysis, we identified bacterial genes/sRNAs with cytosol-specific or vacuole-specific expression signatures. Using these genes/sRNAs as environmental biosensors, we defined that Salmonella is exposed to iron and manganese deprivation and oxidative stress in the cytosol and zinc and magnesium deprivation in the SCV. Furthermore, iron availability was critical for optimal S. Typhimurium replication in the cytosol, as well as entC, fepB, soxS and sitA-mntH. Virulence genes that are typically associated with extracellular bacteria, namely Salmonella pathogenicity island 1 (SPI1) and SPI4, had a cytosolic-specific expression profile. Our study reveals that the cytosolic and vacuolar S. Typhimurium virulence gene programs are unique to, and tailored for, residence within distinct intracellular compartments. Therefore, this archetypical vacuole-adapted pathogen requires extensive transcriptional reprogramming to successfully colonize the mammalian cytosol.Author SummaryIntracellular pathogens reside either within a membrane-bound vacuole or are free-living in the cytosol and their virulence programs are tailored towards survival within a particular intracellular compartment. Some bacterial pathogens (such as Salmonella enterica) can successfully colonize both intracellular niches, but how they do so is unclear. Here we have exploited the parallel intracellular lifestyles of S. enterica in epithelial cells to identify the niche-specific bacterial expression profiles and environmental cues encountered by S. enterica. We have also discovered bacterial genes that are required for colonization of the cytosol, but not the vacuole. Our results advance our understanding of pathogen-adaptation to alternative replication niches and highlight an emerging concept in the field of bacteria-host cell interactions.</jats:sec

    How to sequence 10,000 bacterial genomes and retain your sanity: an accessible, efficient and global approach

    Get PDF
    Non-typhoidal Salmonella(NTS)are typically associated with enterocolitis and linked to the industrialisation of food production. In recent years, NTS has been associated with invasive disease (iNTS disease) causing an estimated 77,000 deaths each year worldwide; 80% of mortality occurs in sub-Saharan Africa. New clades of S. Typhimurium and S. Enteritidis have been identified, which are characterised by genomic degradation, altered prophage repertoires and novel multidrug resistant plasmids. To understand how these clades are contributing to the burden and severity of iNTS disease, it is crucial to expand genome-based surveillance to cover more countries, and incorporate historical isolates to generate an evolutionary timeline of the development of iNTS. We developedand validateda robust and inexpensive method for large-scale collection and sequencing of bacterial genomes. The “10,000 Salmonella genomes” project established a worldwide research collaboration to generate information relevant to the epidemiology, drug resistance and virulence factors of Salmonellae using a whole-genome sequencing approach. By streamlining collection of isolates and developing an efficient logistics pipeline, we gathered 10,419 clinical and environmental isolates from collections in low and middle-income countries within six months. Genome sequences are now available for isolates from 51 countries/territories dating from 1949 to 2017, with ~80 % representing African and Latin-American datasets. Our method can be applied to other large sample collections that require maximisation of resources within a limited timeframe. Detailed genome analyses are in progress and it is hoped that the resulting data will contribute to public health control strategies in low and middle-income countries
    • 

    corecore