12 research outputs found

    DISCLOSE : DISsection of CLusters Obtained by SEries of transcriptome data using functional annotations and putative transcription factor binding sites

    Get PDF
    Background: A typical step in the analysis of gene expression data is the determination of clusters of genes that exhibit similar expression patterns. Researchers are confronted with the seemingly arbitrary choice between numerous algorithms to perform cluster analysis. Results: We developed an exploratory application that benchmarks the results of clustering methods using functional annotations. In addition, a de novo DNA motif discovery algorithm is integrated in our program which identifies overrepresented DNA binding sites in the upstream DNA sequences of genes from the clusters that are indicative of sites of transcriptional control. The performance of our program was evaluated by comparing the original results of a time course experiment with the findings of our application. Conclusion: DISCLOSE assists researchers in the prokaryotic research community in systematically evaluating results of the application of a range of clustering algorithms to transcriptome data. Different performance measures allow to quickly and comprehensively determine the best suited clustering approach for a given dataset.

    A predicted physicochemically distinct sub-proteome associated with the intracellular organelle of the anammox bacterium Kuenenia stuttgartiensis

    Get PDF
    Medema MH, Zhou M, van Hijum SAFT, et al. A predicted physicochemically distinct sub-proteome associated with the intracellular organelle of the anammox bacterium Kuenenia stuttgartiensis. BMC Genomics. 2010;11(1): 299.Background Anaerobic ammonium-oxidizing (anammox) bacteria perform a key step in global nitrogen cycling. These bacteria make use of an organelle to oxidize ammonia anaerobically to nitrogen (N2) and so contribute ~50% of the nitrogen in the atmosphere. It is currently unknown which proteins constitute the organellar proteome and how anammox bacteria are able to specifically target organellar and cell-envelope proteins to their correct final destinations. Experimental approaches are complicated by the absence of pure cultures and genetic accessibility. However, the genome of the anammox bacterium Candidatus "Kuenenia stuttgartiensis" has recently been sequenced. Here, we make use of these genome data to predict the organellar sub-proteome and address the molecular basis of protein sorting in anammox bacteria. Results Two training sets representing organellar (30 proteins) and cell envelope (59 proteins) proteins were constructed based on previous experimental evidence and comparative genomics. Random forest (RF) classifiers trained on these two sets could differentiate between organellar and cell envelope proteins with ~89% accuracy using 400 features consisting of frequencies of two adjacent amino acid combinations. A physicochemically distinct organellar sub-proteome containing 562 proteins was predicted with the best RF classifier. This set included almost all catabolic and respiratory factors encoded in the genome. Apparently, the cytoplasmic membrane performs no catabolic functions. We predict that the Tat-translocation system is located exclusively in the organellar membrane, whereas the Sec-translocation system is located on both the organellar and cytoplasmic membranes. Canonical signal peptides were predicted and validated experimentally, but a specific (N- or C-terminal) signal that could be used for protein targeting to the organelle remained elusive. Conclusions A physicochemically distinct organellar sub-proteome was predicted from the genome of the anammox bacterium K. stuttgartiensis. This result provides strong in silico support for the existing experimental evidence for the existence of an organelle in this bacterium, and is an important step forward in unravelling a geochemically relevant case of cytoplasmic differentiation in bacteria. The predicted dual location of the Sec-translocation system and the apparent absence of a specific N- or C-terminal signal in the organellar proteins suggests that additional chaperones may be necessary that act on an as-yet unknown property of the targeted proteins

    Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons

    Get PDF
    Background: Array-based comparative genome hybridization (aCGH) is commonly used to determine the genomic content of bacterial strains. Since prokaryotes in general have less conserved genome sequences than eukaryotes, sequence divergences between the genes in the genomes used for an aCGH experiment obstruct determination of genome variations (e.g. deletions). Current normalization methods do not take into consideration sequence divergence between target and microarray features and therefore cannot distinguish a difference in signal due to systematic errors in the data or due to sequence divergence. Results: We present supervised Lowess, or S-Lowess, an application of the subset Lowess normalization method. By using a predicted subset of array features with minimal sequence divergence between the analyzed strains for the normalization procedure we remove systematic errors from dual-dye aCGH data in two steps: (1) determination of a subset of conserved genes (i.e. likely conserved genes, LCG); and (2) using the LCG for subset Lowess normalization. Subset Lowess determines the correction factors for systematic errors in the subset of array features and normalizes all array features using these correction factors. The performance of S-Lowess was assessed on aCGH experiments in which differentially labeled genomic DNA fragments of Lactococcus lactis IL1403 and L. lactis MG1363 strains were hybridized to IL1403 DNA microarrays. Since both genomes are sequenced and gene deletions identified, the success rate of different aCGH normalization methods in detecting these deletions in the MG1363 genome were determined. S-Lowess detects 97% of the deletions, whereas other aCGH normalization methods detect up to only 60% of the deletions. Conclusion: S-Lowess is implemented in a user-friendly web-tool. We demonstrate that it outperforms existing normalization methods and maximizes detection of genomic variation (e.g. deletions) from microbial aCGH data.

    Transmissible Mycobacterium tuberculosis Strains Share Genetic Markers and Immune Phenotypes

    Get PDF
    Successful transmission of tuberculosis depends on the interplay of human behavior, host immune responses, and Mycobacterium tuberculosis virulence factors. Previous studies have been focused on identifying host risk factors associated with increased transmission, but the contribution of specific genetic variations in mycobacterial strains themselves are still unknown.This study was funded by the Portuguese Foundation for Science and Technology (FCT) (SFRH/BD/33902/2009 [H.N.-G.])

    Transmissible Mycobacterium tuberculosis Strains Share Genetic Markers and Immune Phenotypes

    No full text
    RATIONALE: Successful transmission of tuberculosis depends on the interplay of human behavior, host immune responses and Mycobacterium tuberculosis virulence factors. Previous studies have focused on identifying host risk factors associated with increased transmission, while the contribution of specific genetic variations in mycobacterial strains themselves are still unknown. OBJECTIVES: To identify mycobacterial genetic markers associated with increased transmissibility, and examine whether these markers lead to altered in vitro immune responses. METHODS: Using a comprehensive (n = 10,389) tuberculosis registry and strain collection in the Netherlands, we identified a set of 100 M. tuberculosis strains either least or most likely to be transmitted after controlling for host factors. We subjected these strains to whole genome sequencing and evolutionary convergence analysis. We repeated this analysis in an independent validation cohort. A subset of the original strains was used to perform functional immunological experiments to measure in vitro cytokine production and neutrophil responses to strains with or without the identified mutations associated to increased transmissibility. MEASUREMENTS AND MAIN RESULTS: We identified the loci espE, PE-PGRS56, Rv0197, Rv2813-2814c and Rv2815-2816c as targets of convergent evolution among transmissible strains. We validated four of these regions in an independent set of strains, and demonstrated that mutations in these targets affected in vitro monocyte and T-cell cytokine production, neutrophil reactive oxygen species release and apoptosis. CONCLUSIONS: This study identifies genetic markers in convergent evolution of M. tuberculosis towards enhanced transmissibility in vivo that are associated with altered immune responses in vitro

    A predictive signature gene set for discriminating active from latent tuberculosis in Warao Amerindian children

    No full text
    BACKGROUND: Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. RESULTS: We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). CONCLUSIONS: Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts

    Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons-1

    No full text
    1363 signals) and red (positive M values; IL1403 signals) channels. A: non-normalized data. B: grid-based Lowess normalization. C: S-Lowess normalization based on the LCG set obtained from the comparison of IL1403 amplicon sequences to the ORFs of three strains. D: S-Lowess normalization with a stringent LCG set (99% identity over 100 bp).<p><b>Copyright information:</b></p><p>Taken from "Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons"</p><p>http://www.biomedcentral.com/1471-2105/9/93</p><p>BMC Bioinformatics 2008;9():93-93.</p><p>Published online 11 Feb 2008</p><p>PMCID:PMC2275246.</p><p></p

    Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons-3

    No full text
    Ree strains. The Rvalues indicate the quality of the regression curve fit (where higher is better).<p><b>Copyright information:</b></p><p>Taken from "Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons"</p><p>http://www.biomedcentral.com/1471-2105/9/93</p><p>BMC Bioinformatics 2008;9():93-93.</p><p>Published online 11 Feb 2008</p><p>PMCID:PMC2275246.</p><p></p

    Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons-4

    No full text
    Array dataset with the LCGs. In case that for phase 1 prediction of LCGs is selected, the user has to upload microarray feature sequences and select (multiple) genomes (in this study 3 genomes). The optimal parameters for selection of LCGs from a sequence comparison using BLAT of array features versus multiple reporter genomes are difficult to predict. Therefore, selection of a LCG set is facilitated by cycling through a maximum of 2 parameters. These parameters are (a combination of two): (i) alignment length cutoff, (ii) E-value cutoff, (iii) percentage nucleotide identity cutoff, (iv) maximum number of hits within the same genome (to account for paralogous genes or duplicated genome fragments), (v) minimum number of hits across genomes (to account for gene conservation in multiple genome sequences). Those array feature sequences meeting the criteria (here in at least 2 out of three genomes a significant BLAT hit; one hit over at least 100 bp with at least 80% nucleotide identity) are marked as LCG and added to the conserved array feature list. In phase 2, the LCGs are used to normalize an uploaded aCGH microarray dataset. The result of phase 2 is a normalized dataset and diagnostic plots.<p><b>Copyright information:</b></p><p>Taken from "Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons"</p><p>http://www.biomedcentral.com/1471-2105/9/93</p><p>BMC Bioinformatics 2008;9():93-93.</p><p>Published online 11 Feb 2008</p><p>PMCID:PMC2275246.</p><p></p

    Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons-0

    No full text
    Array dataset with the LCGs. In case that for phase 1 prediction of LCGs is selected, the user has to upload microarray feature sequences and select (multiple) genomes (in this study 3 genomes). The optimal parameters for selection of LCGs from a sequence comparison using BLAT of array features versus multiple reporter genomes are difficult to predict. Therefore, selection of a LCG set is facilitated by cycling through a maximum of 2 parameters. These parameters are (a combination of two): (i) alignment length cutoff, (ii) E-value cutoff, (iii) percentage nucleotide identity cutoff, (iv) maximum number of hits within the same genome (to account for paralogous genes or duplicated genome fragments), (v) minimum number of hits across genomes (to account for gene conservation in multiple genome sequences). Those array feature sequences meeting the criteria (here in at least 2 out of three genomes a significant BLAT hit; one hit over at least 100 bp with at least 80% nucleotide identity) are marked as LCG and added to the conserved array feature list. In phase 2, the LCGs are used to normalize an uploaded aCGH microarray dataset. The result of phase 2 is a normalized dataset and diagnostic plots.<p><b>Copyright information:</b></p><p>Taken from "Supervised Lowess normalization of comparative genome hybridization data – application to lactococcal strain comparisons"</p><p>http://www.biomedcentral.com/1471-2105/9/93</p><p>BMC Bioinformatics 2008;9():93-93.</p><p>Published online 11 Feb 2008</p><p>PMCID:PMC2275246.</p><p></p
    corecore