9 research outputs found

    Inferring bona fide transfrags in RNA-Seq derived-transcriptome assemblies of non-model organisms

    Get PDF
    Background: De novo transcriptome assembly of short transcribed fragments (transfrags) produced from sequencing-by-synthesis technologies often results in redundant datasets with differing levels of unassembled, partially assembled or mis-assembled transcripts. Post-assembly processing intended to reduce redundancy typically involves reassembly or clustering of assembled sequences. However, these approaches are mostly based on common word heuristics and often create clusters of biologically unrelated sequences, resulting in loss of unique transfrags annotations and propagation of mis-assemblies. Results: Here, we propose a structured framework that consists of a few steps in pipeline architecture for Inferring Functionally Relevant Assembly-derived Transcripts (IFRAT). IFRAT combines 1) removal of identical subsequences, 2) error tolerant CDS prediction, 3) identification of coding potential, and 4) complements BLAST with a multiple domain architecture annotation that reduces non-specific domain annotation. We demonstrate that independent of the assembler, IFRAT selects bona fide transfrags (with CDS and coding potential) from the transcriptome assembly of a model organism without relying on post-assembly clustering or reassembly. The robustness of IFRAT is inferred on RNA-Seq data of Neurospora crassa assembled using de Bruijn graph-based assemblers, in single (Trinity and Oases-25) and multiple (Oases-Merge and additive or pooled) k-mer modes. Single k-mer assemblies contained fewer transfrags compared to the multiple k-mer assemblies. However, Trinity identified a comparable number of predicted coding sequence and gene loci to Oases pooled assembly. IFRAT selects bona fide transfrags representing over 94% of cumulative BLAST-derived functional annotations of the unfiltered assemblies. Between 4-6% are lost when orphan transfrags are excluded and this represents only a tiny fraction of annotation derived from functional transference by sequence similarity. The median length of bona fide transfrags ranged from 1.5kb (Trinity) to 2kb (Oases), which is consistent with the average coding sequence length in fungi. The fraction of transfrags that could be associated with gene ontology terms ranged from 33-50%, which is also high for domain based annotation. We showed that unselected transfrags were mostly truncated and represent sequences from intronic, untranslated (5′ and 3′) regions and non-coding gene loci. Conclusions: IFRAT simplifies post-assembly processing providing a reference transcriptome enriched with functionally relevant assembly-derived transcripts for non-model organism.Department of Science and Technology National Research Foundation South African Research Chair initiativeWeb of Scienc

    A glance at quality score: implication for de novo transcriptome reconstruction of Illumina reads

    Get PDF
    Downstream analyses of short-reads from next-generation sequencing platforms are often preceded by a pre-processing step that removes uncalled and wrongly called bases. Standard approaches rely on their associated base quality scores to retain the read or a portion of it when the score is above a predefined threshold. It is difficult to differentiate sequencing error from biological variation without a reference using a quality score. The effects of quality score based trimming have not been systematically studied in de novo transcriptome assembly. Using RNA-Seq data produced from Illumina,we teased out the effects of quality score based filtering or trimming on de novo transcriptome reconstruction. We showed that assemblies produced from reads subjected to different quality score thresholds contain truncated and missing transfrags when compared to those from untrimmed reads. Our data supports the fact that de novo assembling of untrimmed data is challenging for de Bruijn graph assemblers. However, our results indicates that comparing the assemblies from untrimmed and trimmed read subsets can suggest appropriate filtering parameters and enables election of the optimum de novo transcriptome assembly in non-model organisms.South African Research Chair Initiative National Research Foundation of South Afric

    Chromosomal-level assembly of the Asian seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species’ native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.Web of Scienc

    Poly(ADP-ribose) polymerase 9 mediates early protection against Mycobacterium tuberculosis infection by regulating type I IFN production

    Get PDF
    The ADP ribosyltransferases (PARPs 1-17) regulate diverse cellular processes, including DNA damage repair. PARPs are classified on the basis of their ability to catalyze poly-ADP-ribosylation (PARylation) or mono-ADP-ribosylation (MARylation). Although PARP9 mRNA expression is significantly increased in progressive tuberculosis (TB) in humans, its participation in host immunity to TB is unknown. Here, we show that PARP9 mRNA encoding the MARylating PARP9 enzyme was upregulated during TB in humans and mice and provide evidence of a critical modulatory role for PARP9 in DNA damage, cyclic GMP-AMP synthase (cGAS) expression, and type I IFN production during TB. Thus, Parp9-deficient mice were susceptible to Mycobacterium tuberculosis infection and exhibited increased TB disease, cGAS and 2\u273\u27-cyclic GMP-AMP (cGAMP) expression, and type I IFN production, along with upregulation of complement and coagulation pathways. Enhanced M. tuberculosis susceptibility is type I IFN dependent, as blockade of IFN α receptor (IFNAR) signaling reversed the enhanced susceptibility of Parp9-/- mice. Thus, in sharp contrast to PARP9 enhancement of type I IFN production in viral infections, this member of the MAR family plays a protective role by limiting type I IFN responses during TB

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    Cytomegalovirus infection is a risk factor for tuberculosis disease in infants.

    Get PDF
    Immune activation is associated with increased risk of tuberculosis (TB) disease in infants. We performed a case-control analysis to identify drivers of immune activation and disease risk. Among 49 infants who developed TB disease over the first 2 years of life, and 129 healthy matched controls, we found the cytomegalovirus-stimulated (CMV-stimulated) IFN-γ response to be associated with CD8+ T cell activation (Spearman's rho, P = 6 × 10-8). A CMV-specific IFN-γ response was also associated with increased risk of developing TB disease (conditional logistic regression; P = 0.043; OR, 2.2; 95% CI, 1.02-4.83) and shorter time to TB diagnosis (Log Rank Mantel-Cox, P = 0.037). CMV+ infants who developed TB disease had lower expression of NK cell-associated gene signatures and a lower frequency of CD3-CD4-CD8- lymphocytes. We identified transcriptional signatures predictive of TB disease risk among CMV ELISpot-positive (area under the receiver operating characteristic [AUROC], 0.98, accuracy, 92.57%) and -negative (AUROC, 0.9; accuracy, 79.3%) infants; the CMV- signature was validated in an independent infant study (AUROC, 0.71; accuracy, 63.9%). A 16-gene signature that previously identified adolescents at risk of developing TB disease did not accurately classify case and control infants in this study. Understanding the microbial drivers of T cell activation, such as CMV, could guide new strategies for prevention of TB disease in infants

    RISK6, a 6-gene transcriptomic signature of TB disease risk, diagnosis and treatment response

    Get PDF
    Improved tuberculosis diagnostics and tools for monitoring treatment response are urgently needed. We developed a robust and simple, PCR-based host-blood transcriptomic signature, RISK6, for multiple applications: identifying individuals at risk of incident disease, as a screening test for subclinical or clinical tuberculosis, and for monitoring tuberculosis treatment. RISK6 utility was validated by blind prediction using quantitative real-time (qRT) PCR in seven independent cohorts. Prognostic performance significantly exceeded that of previous signatures discovered in the same cohort. Performance for diagnosing subclinical and clinical disease in HIV-uninfected and HIV-infected persons, assessed by area under the receiver-operating characteristic curve, exceeded 85%. As a screening test for tuberculosis, the sensitivity at 90% specificity met or approached the benchmarks set out in World Health Organization target product profiles for non-sputum-based tests. RISK6 scores correlated with lung immunopathology activity, measured by positron emission tomography, and tracked treatment response, demonstrating utility as treatment response biomarker, while predicting treatment failure prior to treatment initiation. Performance of the test in capillary blood samples collected by finger-prick was noninferior to venous blood collected in PAXgene tubes. These results support incorporation of RISK6 into rapid, capillary blood-based point-of-care PCR devices for prospective assessment in field studies

    Predictive performance of interferon-gamma release assays and the tuberculin skin test for incident tuberculosis: an individual participant data meta-analysis

    No full text
    Background: Evidence on the comparative performance of purified protein derivative tuberculin skin tests (TST) and interferon-gamma release assays (IGRA) for predicting incident active tuberculosis (TB) remains conflicting. We conducted an individual participant data meta-analysis to directly compare the predictive performance for incident TB disease between TST and IGRA to inform policy. Methods: We searched Medline and Embase from 1 January 2002 to 4 September 2020, and studies that were included in previous systematic reviews. We included prospective longitudinal studies in which participants received both TST and IGRA and estimated performance as hazard ratios (HR) for the development of all diagnoses of TB in participants with dichotomised positive test results compared to negative results, using different thresholds of positivity for TST. Secondary analyses included an evaluation of the impact of background TB incidence. We also estimated the sensitivity and specificity for predicting TB. We explored heterogeneity through pre-defined sub-group analyses (e.g. country-level TB incidence). Publication bias was assessed using funnel plots and Egger's test. This review is registered with PROSPERO, CRD42020205667. Findings: We obtained data from 13 studies out of 40 that were considered eligible (N = 32,034 participants: 36% from countries with TB incidence rate ≥100 per 100,000 population). All reported data on TST and QuantiFERON Gold in-Tube (QFT-GIT). The point estimate for the TST was highest with higher cut-offs for positivity and particularly when stratified by bacillus Calmette–Guérin vaccine (BCG) status (15 mm if BCG vaccinated and 5 mm if not [TST5/15 mm]) at 2.88 (95% CI 1.69–4.90). The pooled HR for QFT-GIT was higher than for TST at 4.15 (95% CI 1.97–8.75). The difference was large in countries with TB incidence rate <100 per 100,000 population (HR 10.38, 95% CI 4.17–25.87 for QFT-GIT VS. HR 5.36, 95% CI 3.82–7.51 for TST5/15 mm) but much of this difference was driven by a single study (HR 5.13, 95% CI 3.58–7.35 for TST5/15 mm VS. 7.18, 95% CI 4.48–11.51 for QFT-GIT, when excluding the study, in which all 19 TB cases had positive QFT-GIT results). The comparative performance was similar in the higher burden countries (HR 1.61, 95% CI 1.23–2.10 for QFT-GIT VS. HR 1.72, 95% CI 0.98–3.01 for TST5/15 mm). The predictive performance of both tests was higher in countries with TB incidence rate <100 per 100,000 population. In the lower TB incidence countries, the specificity of TST (76% for TST5/15 mm) and QFT-GIT (74%) for predicting active TB approached the minimum World Health Organization target (≥75%), but the sensitivity was below the target of ≥75% (63% for TST5/15 mm and 65% for QFT-GIT). The absolute differences in positive and negative predictive values between TST15 mm and QFT-GIT were small (positive predictive values 2.74% VS. 2.46%; negative predictive values 99.42% VS. 99.52% in low-incidence countries). Egger's test did not show evidence of publication bias (0.74 for TST15 mm and p = 0.68 for QFT-GIT). Interpretation: IGRA appears to have higher predictive performance than the TST in low TB incidence countries, but the difference was driven by a single study. Any advantage in clinical performance may be small, given the numerically similar positive and negative predictive values. Both IGRA and TST had lower performance in countries with high TB incidence. Test choice should be contextual and made considering operational and likely clinical impact of test results. Funding: YH, IA, and MXR were supported by the National Institute for Health and Care Research (NIHR), United Kingdom ( RP-PG-0217-20009). MQ was supported by the Medical Research Council [ MC_UU_00004/07]

    Predictive performance of interferon-gamma release assays and the tuberculin skin test for incident tuberculosis: an individual participant data meta-analysisResearch in context

    Get PDF
    Summary: Background: Evidence on the comparative performance of purified protein derivative tuberculin skin tests (TST) and interferon-gamma release assays (IGRA) for predicting incident active tuberculosis (TB) remains conflicting. We conducted an individual participant data meta-analysis to directly compare the predictive performance for incident TB disease between TST and IGRA to inform policy. Methods: We searched Medline and Embase from 1 January 2002 to 4 September 2020, and studies that were included in previous systematic reviews. We included prospective longitudinal studies in which participants received both TST and IGRA and estimated performance as hazard ratios (HR) for the development of all diagnoses of TB in participants with dichotomised positive test results compared to negative results, using different thresholds of positivity for TST. Secondary analyses included an evaluation of the impact of background TB incidence. We also estimated the sensitivity and specificity for predicting TB. We explored heterogeneity through pre-defined sub-group analyses (e.g. country-level TB incidence). Publication bias was assessed using funnel plots and Egger's test. This review is registered with PROSPERO, CRD42020205667. Findings: We obtained data from 13 studies out of 40 that were considered eligible (N = 32,034 participants: 36% from countries with TB incidence rate ≥100 per 100,000 population). All reported data on TST and QuantiFERON Gold in-Tube (QFT-GIT). The point estimate for the TST was highest with higher cut-offs for positivity and particularly when stratified by bacillus Calmette–Guérin vaccine (BCG) status (15 mm if BCG vaccinated and 5 mm if not [TST5/15 mm]) at 2.88 (95% CI 1.69–4.90). The pooled HR for QFT-GIT was higher than for TST at 4.15 (95% CI 1.97–8.75). The difference was large in countries with TB incidence rate <100 per 100,000 population (HR 10.38, 95% CI 4.17–25.87 for QFT-GIT VS. HR 5.36, 95% CI 3.82–7.51 for TST5/15 mm) but much of this difference was driven by a single study (HR 5.13, 95% CI 3.58–7.35 for TST5/15 mm VS. 7.18, 95% CI 4.48–11.51 for QFT-GIT, when excluding the study, in which all 19 TB cases had positive QFT-GIT results). The comparative performance was similar in the higher burden countries (HR 1.61, 95% CI 1.23–2.10 for QFT-GIT VS. HR 1.72, 95% CI 0.98–3.01 for TST5/15 mm). The predictive performance of both tests was higher in countries with TB incidence rate <100 per 100,000 population. In the lower TB incidence countries, the specificity of TST (76% for TST5/15 mm) and QFT-GIT (74%) for predicting active TB approached the minimum World Health Organization target (≥75%), but the sensitivity was below the target of ≥75% (63% for TST5/15 mm and 65% for QFT-GIT). The absolute differences in positive and negative predictive values between TST15 mm and QFT-GIT were small (positive predictive values 2.74% VS. 2.46%; negative predictive values 99.42% VS. 99.52% in low-incidence countries). Egger's test did not show evidence of publication bias (0.74 for TST15 mm and p = 0.68 for QFT-GIT). Interpretation: IGRA appears to have higher predictive performance than the TST in low TB incidence countries, but the difference was driven by a single study. Any advantage in clinical performance may be small, given the numerically similar positive and negative predictive values. Both IGRA and TST had lower performance in countries with high TB incidence. Test choice should be contextual and made considering operational and likely clinical impact of test results. Funding: YH, IA, and MXR were supported by the National Institute for Health and Care Research (NIHR), United Kingdom (RP-PG-0217-20009). MQ was supported by the Medical Research Council [MC_UU_00004/07]
    corecore