46 research outputs found

    Exon level integration of proteomics and microarray data

    Get PDF
    Background: Previous studies comparing quantitative proteomics and microarray data have generally found poor correspondence between the two. We hypothesised that this might in part be because the different assays were targeting different parts of the expressed genome and might therefore be subjected to confounding effects from processes such as alternative splicing.Results: Using a genome database as a platform for integration, we combined quantitative protein mass spectrometry with Affymetrix Exon array data at the level of individual exons. We found significantly higher degrees of correlation than have been previously observed (r = 0.808). The study was performed using cell lines in equilibrium in order to reduce a major potential source of biological variation, thus allowing the analysis to focus on the data integration methods in order to establish their performance.Conclusion: We conclude that part of the variation observed when integrating microarray and proteomics data may occur as a consequence both of the data analysis and of the high granularity to which studies have until recently been limited. The approach opens up the possibility for the first time of considering combined microarray and proteomics datasets at the level of individual exons and isoforms, important given the high proportion of alternative splicing observed in the human genome

    rnaSeqMap: a Bioconductor package for RNA sequencing data exploration

    Get PDF
    BACKGROUND: The throughput of commercially available sequencers has recently significantly increased. It has reached the point where measuring the RNA expression by the depth of coverage has become feasible even for largest genomes. The development of software tools is constantly following the progress of biological hardware. In particular, as RNA sequencing software can be regarded genome browsers, exon junction tools and statistical tools operating on counts of reads in predefined regions. The library rnaSeqMap, freely available via Bioconductor, is an RNA sequencing software which is independent of any biological hardware platform. It is based upon standard Bioconductor infrastructure for sequencing data and includes several novel features focused on deeper understanding of coverage expression profiles and discovery of novel transcription regions. RESULTS: rnaSeqMap is a toolbox for analyses that may be performed with the use of gene annotations or alternatively, in an unsupervised mode, on any genomic region to find novel or non-standard transcripts. The data back-end may be a MySQL database or a set of files in standard BAM format. The processing in R can be run on a machine without any particular hardware requirements, and scales linearly with the number of genomic loci and number of samples analyzed. The main features of rnaSeqMap include coverage operations, discovering irreducible regions of high expression, significance search and splicing analyses with nucleotide granularity. CONCLUSIONS: This software may be used for a range of applications related to RNA sequencing by building customized analysis pipelines. The applicability and precision is expected to increase in parallel with the progress of the genome coverage in sequencers

    Experimental Comparison and Evaluation of the Affymetrix Exon and U133Plus2 GeneChip Arrays

    Get PDF
    Affymetrix exon arrays offer scientists the only solution for exon-level expression profiling at the whole-genome scale on a single array. These arrays feature a new chip design with no mismatch probes and a radically new random primed protocol to generate sense DNA targets along the entire length of the transcript. In addition to these changes, a limited number of validating experiments and virtually no experimental data to rigorously address the comparability of all-exon arrays with conventional 3'-arrays result in a natural reluctance to replace conventional expression arrays with the new all-exon platform.Using commercially available Affymetrix arrays, we assess the performance of the Human Exon 1.0 ST (HuEx) and U133 Plus 2.0 (U133Plus2) platforms directly through a series of 'spike-in' hybridizations containing 25 transcripts in the presence of a fixed eukaryotic background. Specifically, we compare the measures of expression for HuEx and U133Plus2 arrays to evaluate the precision of these measures as well as the specificity and sensitivity of the measures' ability to detect differential expression.This study presents an experimental comparison and systematic cross-validation of Affymetrix exon arrays and establishes high comparability of expression changes and probe performance characteristics between Affymetrix conventional and exon arrays. In addition, this study offers a reliable benchmark data set for the comparison of competing exon expression measures, the selection of methods suitable for mapping exon array measures to the wealth of previously generated microarray data, as well as the development of more advanced methods for exon- and transcript-level expression summarization

    Exon Array Analysis of Head and Neck Cancers Identifies a Hypoxia Related Splice Variant of LAMA3 Associated with a Poor Prognosis

    Get PDF
    The identification of alternatively spliced transcript variants specific to particular biological processes in tumours should increase our understanding of cancer. Hypoxia is an important factor in cancer biology, and associated splice variants may present new markers to help with planning treatment. A method was developed to analyse alternative splicing in exon array data, using probeset multiplicity to identify genes with changes in expression across their loci, and a combination of the splicing index and a new metric based on the variation of reliability weighted fold changes to detect changes in the splicing patterns. The approach was validated on a cancer/normal sample dataset in which alternative splicing events had been confirmed using RT-PCR. We then analysed ten head and neck squamous cell carcinomas using exon arrays and identified differentially expressed splice variants in five samples with high versus five with low levels of hypoxia-associated genes. The analysis identified a splice variant of LAMA3 (Laminin α 3), LAMA3-A, known to be involved in tumour cell invasion and progression. The full-length transcript of the gene (LAMA3-B) did not appear to be hypoxia-associated. The results were confirmed using qualitative RT-PCR. In a series of 59 prospectively collected head and neck tumours, expression of LAMA3-A had prognostic significance whereas LAMA3-B did not. This work illustrates the potential for alternatively spliced transcripts to act as biomarkers of disease prognosis with improved specificity for particular tissues or conditions over assays which do not discriminate between splice variants

    Identifying differential exon splicing using linear models and correlation coefficients

    Get PDF
    Background: With the availability of the Affymetrix exon arrays a number of tools have been developed to enable the analysis. These however can be expensive or have several pre-installation requirements. This led us to develop an analysis workflow for analysing differential splicing using freely available software packages that are already being widely used for gene expression analysis. The workflow uses the packages in the standard installation of R and Bioconductor (BiocLite) to identify differential splicing. We use the splice index method with the LIMMA framework. The main drawback with this approach is that it relies on accurate estimates of gene expression from the probe-level data. Methods such as RMA and PLIER may misestimate when a large proportion of exons are spliced. We therefore present the novel concept of a gene correlation coefficient calculated using only the probeset expression pattern within a gene. We show that genes with lower correlation coefficients are likely to be differentially spliced.Results: The LIMMA approach was used to identify several tissue-specific transcripts and splicing events that are supported by previous experimental studies. Filtering the data is necessary, particularly removing exons and genes that are not expressed in all samples and cross-hybridising probesets, in order to reduce the false positive rate. The LIMMA approach ranked genes containing single or few differentially spliced exons much higher than genes containing several differentially spliced exons. On the other hand we found the gene correlation coefficient approach better for identifying genes with a large number of differentially spliced exons.Conclusion: We show that LIMMA can be used to identify differential exon splicing from Affymetrix exon array data. Though further work would be necessary to develop the use of correlation coefficients into a complete analysis approach, the preliminary results demonstrate their usefulness for identifying differentially spliced genes. The two approaches work complementary as they can potentially identify different subsets of genes (single/few spliced exons vs. large transcript structure differences)

    SwissPKcdw - A clinical data warehouse for the optimization of pediatric dosing regimens.

    Get PDF
    Clinical trials have been performed mainly in adults and accordingly the necessary information is lacking for pediatric patients, especially regarding dosage recommendation for approved drugs. This gap in information could be filled with results from pharmacokinetic (PK) modeling, based on data collected in daily clinical routine. In order to make this data accessible and usable for research, the Swiss Pharmacokinetics Clinical Data Warehouse (SwissPKcdw ) project has been set up, including a clinical data warehouse (CDW) and the regulatory framework for data transfer and use within. Embedded into the secure BioMedIT network, the CDW can connect to various data providers and researchers in order to collaborate on the data securely. Due to its modularity, partially containerized deployment and open-source software, each of the components can be extended, modified, and re-used for similar projects that require integrated data management, data analysis, and web tools in a secure scientific data and information technology (IT) environment. Here, we describe a collaborative and interprofessional effort to implement the aforementioned infrastructure between several partners from medical health care and academia. Furthermore, we describe a real-world use case where blood samples from pediatric patients were analyzed for the presence of genetic polymorphisms and the results were aggregated and further analyzed together with the health-related patient data in the SwissPKcdw

    The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

    Get PDF
    BACKGROUND: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses. RESULTS: A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A) were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107), from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics. CONCLUSION: Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power

    Implementation of exon arrays: alternative splicing during T-cell proliferation as determined by whole genome analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The contribution of alternative splicing and isoform expression to cellular response is emerging as an area of considerable interest, and the newly developed exon arrays allow for systematic study of these processes. We use this pilot study to report on the feasibility of exon array implementation looking to replace the 3' <it>in vitro </it>transcription expression arrays in our laboratory.</p> <p>One of the most widely studied models of cellular response is T-cell activation from exogenous stimulation. Microarray studies have contributed to our understanding of key pathways activated during T-cell stimulation. We use this system to examine whole genome transcription and alternate exon usage events that are regulated during lymphocyte proliferation in an attempt to evaluate the exon arrays.</p> <p>Results</p> <p>Peripheral blood mononuclear cells form healthy donors were activated using phytohemagglutinin, IL2 and ionomycin and harvested at 5 points over a 7 day period. Flow cytometry measured cell cycle events and the Affymetrix exon array platform was used to identify the gene expression and alternate exon usage changes. Gene expression changes were noted in a total of 2105 transcripts, and alternate exon usage identified in 472 transcript clusters. There was an overlap of 263 transcripts which showed both differential expression and alternate exon usage over time. Gene ontology enrichment analysis showed a broader range of biological changes in biological processes for the differentially expressed genes, which include cell cycle, cell division, cell proliferation, chromosome segregation, cell death, component organization and biogenesis and metabolic process ontologies. The alternate exon usage ontological enrichments are in metabolism and component organization and biogenesis. We focus on alternate exon usage changes in the transcripts of the spliceosome complex. The real-time PCR validation rates were 86% for transcript expression and 71% for alternate exon usage.</p> <p>Conclusions</p> <p>This study illustrates that the Exon array technology has the potential to provide information on both transcript expression and isoform usage, with very little increase in expense.</p
    corecore