363 research outputs found

    Reproducible probe-level analysis of the Affymetrix Exon 1.0 ST array with R/Bioconductor

    Full text link
    The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge due to the vast amounts of data and the large variety of pre-processing and filtering steps employed before the actual analysis is carried out. To guarantee a firm basis for methodological development where results with new methods are compared with previous results it is crucial to ensure that all analyses are completely reproducible for other researchers. We here give a detailed workflow on how to perform reproducible analysis of the GeneChip Human Exon 1.0 ST Array at probe and probeset level solely in R/Bioconductor, choosing packages based on their simplicity of use. To exemplify the use of the proposed workflow we analyse differential splicing and differential gene expression in a publicly available dataset using various statistical methods. We believe this study will provide other researchers with an easy way of accessing gene expression data at different annotation levels and with the sufficient details needed for developing their own tools for reproducible analysis of the GeneChip Human Exon 1.0 ST Array

    Unsupervised assessment of microarray data quality using a Gaussian mixture model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Quality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny.</p> <p>Results</p> <p>We show how an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and the naïve Bayes model can be used to automate microarray quality assessment. The method is flexible and can be easily adapted to accommodate alternate quality statistics and platforms. We evaluate our approach using Affymetrix 3' gene expression and exon arrays and compare the performance of this method to a similar supervised approach.</p> <p>Conclusion</p> <p>This research illustrates the efficacy of an unsupervised classification approach for the purpose of automated microarray data quality assessment. Since our approach requires only unannotated training data, it is easy to customize and to keep up-to-date as technology evolves. In contrast to other "black box" classification systems, this method also allows for intuitive explanations.</p

    Rare and common epilepsies converge on a shared gene regulatory network providing opportunities for novel antiepileptic drug discovery

    Get PDF
    Background The relationship between monogenic and polygenic forms of epilepsy is poorly understood, and the extent to which the genetic and acquired epilepsies share common pathways is unclear. Here, we use an integrated systems-level analysis of brain gene expression data to identify molecular networks disrupted in epilepsy. Results We identify a co-expression network of 320 genes (M30), which is significantly enriched for non-synonymous de novo mutations ascertained from patients with monogenic epilepsy, and for common variants associated with polygenic epilepsy. The genes in M30 network are expressed widely in the human brain under tight developmental control, and encode physically interacting proteins involved in synaptic processes. The most highly connected proteins within M30 network are preferentially disrupted by deleterious de novo mutations for monogenic epilepsy, in line with the centrality-lethality hypothesis. Analysis of M30 expression revealed consistent down-regulation in the epileptic brain in heterogeneous forms of epilepsy including human temporal lobe epilepsy, a mouse model of acquired temporal lobe epilepsy, and a mouse model of monogenic Dravet (SCN1A) disease. These results suggest functional disruption of M30 via gene mutation or altered expression as a convergent mechanism regulating susceptibility to epilepsy broadly. Using the large collection of drug-induced gene expression data from Connectivity Map, several drugs were predicted to preferentially restore the down-regulation of M30 in epilepsy toward health, most notably valproic acid, whose effect on M30 expression was replicated in neurons. Conclusions Taken together, our results suggest targeting the expression of M30 as a potential new therapeutic strategy in epilepsy

    Comparison of Nasal Epithelial Smoking-Induced Gene Expression on Affymetrix Exon 1.0 and Gene 1.0 ST Arrays

    Get PDF
    We have previously defined the impact of tobacco smoking on nasal epithelium gene expression using Affymetrix Exon 1.0 ST arrays. In this paper, we compared the performance of the Affymetrix GeneChip Human Gene 1.0 ST array with the Human Exon 1.0 ST array for detecting nasal smoking-related gene expression changes. RNA collected from the nasal epithelium of five current smokers and five never smokers was hybridized to both arrays. While the intersample correlation within each array platform was relatively higher in the Gene array than that in the Exon array, the majority of the genes most changed by smoking were tightly correlated between platforms. Although neither array dataset was powered to detect differentially expressed genes (DEGs) at a false discovery rate (FDR) <0.05, we identified more DEGs than expected by chance using the Gene ST array. These findings suggest that while both platforms show a high degree of correlation for detecting smoking-induced differential gene expression changes, the Gene ST array may be a more cost-effective platform in a clinical setting for gene-level genomewide expression profiling and an effective tool for exploring the host response to cigarette smoking and other inhaled toxins

    Molecular gene expression and genome wide profiling in tamoxifen-resistant breast cancer.

    Get PDF
    PhDOestrogen receptor positive (ER+) breast cancers (BC) are heterogeneous in both their clinical behaviour and response to therapy. The ER and Progesterone (PgR) are currently the best predictors of response to the anti-oestrogen tamoxifen, yet up to 40% of ER+ breast cancer will relapse despite tamoxifen treatment. New prognostic biomarkers and further biological understanding of tamoxifen resistance (TR) are required. There has been an explosion of greater understanding since the arrival of cutting-edge gene and genomic profiling technology. The two major aims of this research are to develop stable gene signatures that are effective at distinguishing „prognostic‟ groups and, when tested directly for response to tamoxifen, a set of „predictive‟ markers. In order to establish cellular pathways responsible for TR, tissue at relapse while on tamoxifen is preferred. However, in practice, this is difficult to obtain. Hence, in this study, I have established TR derivatives of breast cancer cell lines, T47D and ZR75-1, and analysed their gene-expression by microarray. MAGEA2 and EGLN3 were 4.0 and 3.8 fold upregulated respectively in TR cell lines. For MAGEA2- and EGLN3-overexpressing lines, the proliferation and growth rates in tamoxifen-containing media were significantly higher (p-value <0.001 and p<0.05, respectively) than for control cells. I have investigated possible downstream targets for each protein which may contribute to the mechanism of resistance. Immunohistochemistry validation was performed on a cohort of 196 tamoxifen-treated primary breast tumour tissues: MAGEA2 and EGLN3 were found to be valuable predictive (Positive predictive value of 89%, and 85%, with high sensitivity 38% and 42% respectively) biomarkers for TR in primary breast tumours. In the human breast tumour arm of this study, 25 frozen samples with known response to tamoxifen were analysed on both SNP6.0 and expression EXON arrays. The integrated analysis suggested that 5 genes (OPCML, OR10G7, SNF1LK2, PALM and ZBTB-16) are good predictors of TR, with high negative predictor values (68%, 71%, 59% and 73% respectively for the last 4 genes). Significant regions of copy number variation (CNV) were identified at chromosomes 8q24, 17q21-22 and 11q23-25. The application of this high-resolution approach should lead to a better understanding of the roles of complex genetic alterations in TR

    Exon level integration of proteomics and microarray data

    Get PDF
    Background: Previous studies comparing quantitative proteomics and microarray data have generally found poor correspondence between the two. We hypothesised that this might in part be because the different assays were targeting different parts of the expressed genome and might therefore be subjected to confounding effects from processes such as alternative splicing.Results: Using a genome database as a platform for integration, we combined quantitative protein mass spectrometry with Affymetrix Exon array data at the level of individual exons. We found significantly higher degrees of correlation than have been previously observed (r = 0.808). The study was performed using cell lines in equilibrium in order to reduce a major potential source of biological variation, thus allowing the analysis to focus on the data integration methods in order to establish their performance.Conclusion: We conclude that part of the variation observed when integrating microarray and proteomics data may occur as a consequence both of the data analysis and of the high granularity to which studies have until recently been limited. The approach opens up the possibility for the first time of considering combined microarray and proteomics datasets at the level of individual exons and isoforms, important given the high proportion of alternative splicing observed in the human genome

    Splice variants as novel targets in pancreatic ductal adenocarcinoma

    Get PDF
    The study was funded by the MolDiagPaCa European Union Framework Programme and CR-UK Programme grant A12008 from CR-UK (C. Chelala, T. Crnogorac-Jurcevic, and N.R. Lemoine). Italian Cancer Genome Project – Ministry of University [FIRB RBAP10AHJB]; Associazione Italiana Ricerca Cancro [grant number: 12182]; FP7 European Community Grant Cam-Pac [no: 602783]; Italian Ministry of Health [FIMPCUP_J33G13000210001]. The funders were not involved in the design of the study, collection, analysis, and interpretation of data and in writing of the manuscript. We thank Tracy Chaplin-Perkins for help with running the Affymetrix experiments

    a comparative assessment

    Get PDF
    Background The analysis of differential splicing (DS) is crucial for understanding physiological processes in cells and organs. In particular, aberrant transcripts are known to be involved in various diseases including cancer. A widely used technique for studying DS are exon arrays. Over the last decade a variety of algorithms for the detection of DS events from exon arrays has been developed. However, no comprehensive, comparative evaluation including sensitivity to the most important data features has been conducted so far. To this end, we created multiple data sets based on simulated data to assess strengths and weaknesses of seven published methods as well as a newly developed method, KLAS. Additionally, we evaluated all methods on two cancer data sets that comprised RT-PCR validated results. Results Our studies indicated ARH as the most robust methods when integrating the results over all scenarios and data sets. Nevertheless, special cases or requirements favor other methods. While FIRMA was highly sensitive according to experimental data, SplicingCompass, MIDAS and ANOSVA showed high specificity throughout the scenarios. On experimental data ARH, FIRMA, MIDAS, and KLAS performed best. Conclusions Each method shows different characteristics regarding sensitivity, specificity, interference to certain data settings and robustness over multiple data sets. While some methods can be considered as generally good choices over all data sets and scenarios, other methods show heterogeneous prediction quality on the different data sets. The adequate method has to be chosen carefully and with a defined study aim in mind

    Discovery and Validation of Molecular Biomarkers for Colorectal Adenomas and Cancer with Application to Blood Testing

    Get PDF
    BACKGROUND & AIMS: Colorectal cancer incidence and deaths are reduced by the detection and removal of early-stage, treatable neoplasia but we lack proven biomarkers sensitive for both cancer and pre-invasive adenomas. The aims of this study were to determine if adenomas and cancers exhibit characteristic patterns of biomarker expression and to explore whether a tissue-discovered (and validated) biomarker is differentially expressed in the plasma of patients with colorectal adenomas or cancer. METHODS: Candidate RNA biomarkers were identified by oligonucleotide microarray analysis of colorectal specimens (222 normal, 29 adenoma, 161 adenocarcinoma and 50 colitis) and validated in a previously untested cohort of 68 colorectal specimens using a custom-designed oligonucleotide microarray. One validated biomarker, KIAA1199, was assayed using qRT-PCR on plasma extracted RNA from 20 colonoscopy-confirmed healthy controls, 20 patients with adenoma, and 20 with cancer. RESULTS: Genome-wide analysis uncovered reproducible gene expression signatures for both adenomas and cancers compared to controls. 386/489 (79%) of the adenoma and 439/529 (83%) of the adenocarcinoma biomarkers were validated in independent tissues. We also identified genes differentially expressed in adenomas compared to cancer. KIAA1199 was selected for further analysis based on consistent up-regulation in neoplasia, previous studies and its interest as an uncharacterized gene. Plasma KIAA1199 RNA levels were significantly higher in patients with either cancer or adenoma (31/40) compared to neoplasia-free controls (6/20). CONCLUSIONS: Colorectal neoplasia exhibits characteristic patterns of gene expression. KIAA1199 is differentially expressed in neoplastic tissues and KIAA1199 transcripts are more abundant in the plasma of patients with either cancer or adenoma compared to controls
    • …
    corecore