141 research outputs found

    rnaSeqMap: a Bioconductor package for RNA sequencing data exploration

    Get PDF
    BACKGROUND: The throughput of commercially available sequencers has recently significantly increased. It has reached the point where measuring the RNA expression by the depth of coverage has become feasible even for largest genomes. The development of software tools is constantly following the progress of biological hardware. In particular, as RNA sequencing software can be regarded genome browsers, exon junction tools and statistical tools operating on counts of reads in predefined regions. The library rnaSeqMap, freely available via Bioconductor, is an RNA sequencing software which is independent of any biological hardware platform. It is based upon standard Bioconductor infrastructure for sequencing data and includes several novel features focused on deeper understanding of coverage expression profiles and discovery of novel transcription regions. RESULTS: rnaSeqMap is a toolbox for analyses that may be performed with the use of gene annotations or alternatively, in an unsupervised mode, on any genomic region to find novel or non-standard transcripts. The data back-end may be a MySQL database or a set of files in standard BAM format. The processing in R can be run on a machine without any particular hardware requirements, and scales linearly with the number of genomic loci and number of samples analyzed. The main features of rnaSeqMap include coverage operations, discovering irreducible regions of high expression, significance search and splicing analyses with nucleotide granularity. CONCLUSIONS: This software may be used for a range of applications related to RNA sequencing by building customized analysis pipelines. The applicability and precision is expected to increase in parallel with the progress of the genome coverage in sequencers

    Asexual expansion of Toxoplasma gondii merozoites is distinct from tachyzoites and entails expression of non-overlapping gene families to attach, invade, and replicate within feline enterocytes

    Full text link
    © 2015 Hehl et al.; licensee BioMed Central. Background: The apicomplexan parasite Toxoplasma gondii is cosmopolitan in nature, largely as a result of its highly flexible life cycle. Felids are its only definitive hosts and a wide range of mammals and birds serve as intermediate hosts. The latent bradyzoite stage is orally infectious in all warm-blooded vertebrates and establishes chronic, transmissible infections. When bradyzoites are ingested by felids, they transform into merozoites in enterocytes and expand asexually as part of their coccidian life cycle. In all other intermediate hosts, however, bradyzoites differentiate exclusively to tachyzoites, and disseminate extraintestinally to many cell types. Both merozoites and tachyzoites undergo rapid asexual population expansion, yet possess different effector fates with respect to the cells and tissues they develop in and the subsequent stages they differentiate into. Results: To determine whether merozoites utilize distinct suites of genes to attach, invade, and replicate within feline enterocytes, we performed comparative transcriptional profiling on purified tachyzoites and merozoites. We used high-throughput RNA-Seq to compare the merozoite and tachyzoite transcriptomes. 8323 genes were annotated with sequence reads across the two asexually replicating stages of the parasite life cycle. Metabolism was similar between the two replicating stages. However, significant stage-specific expression differences were measured, with 312 transcripts exclusive to merozoites versus 453 exclusive to tachyzoites. Genes coding for 177 predicted secreted proteins and 64 membrane- associated proteins were annotated as merozoite-specific. The vast majority of known dense-granule (GRA), microneme (MIC), and rhoptry (ROP) genes were not expressed in merozoites. In contrast, a large set of surface proteins (SRS) was expressed exclusively in merozoites. Conclusions: The distinct expression profiles of merozoites and tachyzoites reveal significant additional complexity within the T. gondii life cycle, demonstrating that merozoites are distinct asexual dividing stages which are uniquely adapted to their niche and biological purpose

    AnGeLi: A Tool for the Analysis of Gene Lists from Fission Yeast

    Get PDF
    Genome-wide assays and screens typically result in large lists of genes or proteins. Enrichments of functional or other biological properties within such lists can provide valuable insights and testable hypotheses. To systematically detect these enrichments can be challenging and time-consuming, because relevant data to compare against query gene lists are spread over many different sources. We have developed AnGeLi (Analysis of Gene Lists), an intuitive, integrated web-tool for comprehensive and customized interrogation of gene lists from the fission yeast, Schizosaccharomyces pombe. AnGeLi searches for significant enrichments among multiple qualitative and quantitative information sources, including gene and phenotype ontologies, genetic and protein interactions, numerous features of genes, transcripts, translation, and proteins such as copy numbers, chromosomal positions, genetic diversity, RNA polymerase II and ribosome occupancy, localization, conservation, half-lives, domains, and molecular weight among others, as well as diverse sets of genes that are co-regulated or lead to the same phenotypes when mutated. AnGeLi uses robust statistics which can be tailored to specific needs. It also provides the option to upload user-defined gene sets to compare against the query list. Through an integrated data submission form, AnGeLi encourages the community to contribute additional curated gene lists to further increase the usefulness of this resource and to get the most from the ever increasing large-scale experiments. AnGeLi offers a rigorous yet flexible statistical analysis platform for rich insights into functional enrichments and biological context for query gene lists, thus providing a powerful exploratory tool through which S. pombe researchers can uncover fresh perspectives and unexpected connections from genomic data. AnGeLi is freely available at: www.bahlerlab.info/AnGeLi

    Identifying differential exon splicing using linear models and correlation coefficients

    Get PDF
    Background: With the availability of the Affymetrix exon arrays a number of tools have been developed to enable the analysis. These however can be expensive or have several pre-installation requirements. This led us to develop an analysis workflow for analysing differential splicing using freely available software packages that are already being widely used for gene expression analysis. The workflow uses the packages in the standard installation of R and Bioconductor (BiocLite) to identify differential splicing. We use the splice index method with the LIMMA framework. The main drawback with this approach is that it relies on accurate estimates of gene expression from the probe-level data. Methods such as RMA and PLIER may misestimate when a large proportion of exons are spliced. We therefore present the novel concept of a gene correlation coefficient calculated using only the probeset expression pattern within a gene. We show that genes with lower correlation coefficients are likely to be differentially spliced.Results: The LIMMA approach was used to identify several tissue-specific transcripts and splicing events that are supported by previous experimental studies. Filtering the data is necessary, particularly removing exons and genes that are not expressed in all samples and cross-hybridising probesets, in order to reduce the false positive rate. The LIMMA approach ranked genes containing single or few differentially spliced exons much higher than genes containing several differentially spliced exons. On the other hand we found the gene correlation coefficient approach better for identifying genes with a large number of differentially spliced exons.Conclusion: We show that LIMMA can be used to identify differential exon splicing from Affymetrix exon array data. Though further work would be necessary to develop the use of correlation coefficients into a complete analysis approach, the preliminary results demonstrate their usefulness for identifying differentially spliced genes. The two approaches work complementary as they can potentially identify different subsets of genes (single/few spliced exons vs. large transcript structure differences)

    Exon Array Analysis of Head and Neck Cancers Identifies a Hypoxia Related Splice Variant of LAMA3 Associated with a Poor Prognosis

    Get PDF
    The identification of alternatively spliced transcript variants specific to particular biological processes in tumours should increase our understanding of cancer. Hypoxia is an important factor in cancer biology, and associated splice variants may present new markers to help with planning treatment. A method was developed to analyse alternative splicing in exon array data, using probeset multiplicity to identify genes with changes in expression across their loci, and a combination of the splicing index and a new metric based on the variation of reliability weighted fold changes to detect changes in the splicing patterns. The approach was validated on a cancer/normal sample dataset in which alternative splicing events had been confirmed using RT-PCR. We then analysed ten head and neck squamous cell carcinomas using exon arrays and identified differentially expressed splice variants in five samples with high versus five with low levels of hypoxia-associated genes. The analysis identified a splice variant of LAMA3 (Laminin α 3), LAMA3-A, known to be involved in tumour cell invasion and progression. The full-length transcript of the gene (LAMA3-B) did not appear to be hypoxia-associated. The results were confirmed using qualitative RT-PCR. In a series of 59 prospectively collected head and neck tumours, expression of LAMA3-A had prognostic significance whereas LAMA3-B did not. This work illustrates the potential for alternatively spliced transcripts to act as biomarkers of disease prognosis with improved specificity for particular tissues or conditions over assays which do not discriminate between splice variants

    Stable reference genes for the measurement of transcript abundance during larval caste development in the honeybee

    Get PDF
    Many genes are differentially regulated by caste development in the honeybee. Identifying and understanding these differences is key to discovering the mechanisms underlying this process. To identify these gene expression differences requires robust methods to measure transcript abundance. RT-qPCR is currently the gold standard to measure gene expression, but requires stable reference genes to compare gene expression changes. Such reference genes have not been established for honeybee caste development. Here, we identify and test potential reference genes that have stable expression throughout larval development between the two female castes. In this study, 15 candidate reference genes were examined to identify the most stable reference genes. Three algorithms (GeNorm, Bestkeeper and NormFinder) were used to rank the candidate reference genes based on their stability between the castes throughout larval development. Of these genes Ndufa8 (the orthologue of a component of complex one of the mitochondrial electron transport chain) and Pros54 (orthologous to a component of the 26S proteasome) were identified as being the most stable. When these two genes were used to normalise expression of two target genes (previously found to be differentially expressed between queen and worker larvae by microarray analysis) they were able to more accurately detect differential expression than two previously used reference genes (awd and RpL12). The identification of these novel reference genes will be of benefit to future studies of caste development in the honeybee

    The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

    Get PDF
    BACKGROUND: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses. RESULTS: A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A) were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics. CONCLUSION: Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power
    corecore