303 research outputs found

    Transcript-level annotation of Affymetrix probesets improves the interpretation of gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The wide use of Affymetrix microarray in broadened fields of biological research has made the probeset annotation an important issue. Standard Affymetrix probeset annotation is at gene level, i.e. a probeset is precisely linked to a gene, and probeset intensity is interpreted as gene expression. The increased knowledge that one gene may have multiple transcript variants clearly brings up the necessity of updating this gene-level annotation to a refined transcript-level.</p> <p>Results</p> <p>Through performing rigorous alignments of the Affymetrix probe sequences against a comprehensive pool of currently available transcript sequences, and further linking the probesets to the International Protein Index, we generated transcript-level or protein-level annotation tables for two popular Affymetrix expression arrays, Mouse Genome 430A 2.0 Array and Human Genome U133A Array. Application of our new annotations in re-examining existing expression data sets shows increased expression consistency among synonymous probesets and strengthened expression correlation between interacting proteins.</p> <p>Conclusion</p> <p>By refining the standard Affymetrix annotation of microarray probesets from the gene level to the transcript level and protein level, one can achieve a more reliable interpretation of their experimental data, which may lead to discovery of more profound regulatory mechanism.</p

    Reproducible probe-level analysis of the Affymetrix Exon 1.0 ST array with R/Bioconductor

    Full text link
    The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge due to the vast amounts of data and the large variety of pre-processing and filtering steps employed before the actual analysis is carried out. To guarantee a firm basis for methodological development where results with new methods are compared with previous results it is crucial to ensure that all analyses are completely reproducible for other researchers. We here give a detailed workflow on how to perform reproducible analysis of the GeneChip Human Exon 1.0 ST Array at probe and probeset level solely in R/Bioconductor, choosing packages based on their simplicity of use. To exemplify the use of the proposed workflow we analyse differential splicing and differential gene expression in a publicly available dataset using various statistical methods. We believe this study will provide other researchers with an easy way of accessing gene expression data at different annotation levels and with the sufficient details needed for developing their own tools for reproducible analysis of the GeneChip Human Exon 1.0 ST Array

    Integrating data from heterogeneous DNA microarray platforms

    Get PDF
    DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.The authors thank the FCT Strategic Project of UID/BIO/04469/2013 unit, the project RECI/BBBEBI/0179/2012 (FCOMP-01-0124-FEDER-027462) and the project BioInd - Biotechnology and Bioengineering for improved Industrial and Agro-Foodprocesses”, REF.NORTE-07-0124FEDER-000028 Co-funded by the Programa Operacional Regional do Norte (ON.2 O Novo Norte), QREN, FEDER

    SplicerAV: a tool for mining microarray expression data for changes in RNA processing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined <it>en masse </it>for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets.</p> <p>Results</p> <p>Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using <it>in vivo </it>and <it>in vitro </it>breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by <it>in vitro </it>over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of <it>EGFR </it>family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets.</p> <p>Conclusions</p> <p>Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by <it>in vitro </it>oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for <it>en masse </it>changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.</p

    Consistent annotation of gene expression arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene expression arrays are valuable and widely used tools for biomedical research. Today's commercial arrays attempt to measure the expression level of all of the genes in the genome. Effectively translating the results from the microarray into a biological interpretation requires an accurate mapping between the probesets on the array and the genes that they are targeting. Although major array manufacturers provide annotations of their gene expression arrays, the methods used by various manufacturers are different and the annotations are difficult to keep up to date in the rapidly changing world of biological sequence databases.</p> <p>Results</p> <p>We have created a consistent microarray annotation protocol applicable to all of the major array manufacturers. We constantly keep our annotations updated with the latest Ensembl Gene predictions, and thus cross-referenced with a large number of external biomedical sequence database identifiers. We show that these annotations are accurate and address in detail reasons for the minority of probesets that cannot be annotated. Annotations are publicly accessible through the Ensembl Genome Browser and programmatically through the Ensembl Application Programming Interface. They are also seamlessly integrated into the BioMart data-mining tool and the biomaRt package of BioConductor.</p> <p>Conclusions</p> <p>Consistent, accurate and updated gene expression array annotations remain critical for biological research. Our annotations facilitate accurate biological interpretation of gene expression profiles.</p

    GATExplorer: Genomic and Transcriptomic Explorer; mapping expression probes to gene loci, transcripts, exons and ncRNAs

    Get PDF
    Background: Genome-wide expression studies have developed exponentially in recent years as a result of extensive use of microarray technology. However, expression signals are typically calculated using the assignment of "probesets" to genes, without addressing the problem of "gene" definition or proper consideration of the location of the measuring probes in the context of the currently known genomes and transcriptomes. Moreover, as our knowledge of metazoan genomes improves, the number of both protein-coding and noncoding genes, as well as their associated isoforms, continues to increase. Consequently, there is a need for new databases that combine genomic and transcriptomic information and provide updated mapping of expression probes to current genomic annotations.Results: GATExplorer (Genomic and Transcriptomic Explorer) is a database and web platform that integrates a gene loci browser with nucleotide level mappings of oligo probes from expression microarrays. It allows interactive exploration of gene loci, transcripts and exons of human, mouse and rat genomes, and shows the specific location of all mappable Affymetrix microarray probes and their respective expression levels in a broad set of biological samples. The web site allows visualization of probes in their genomic context together with any associated protein-coding or noncoding transcripts. In the case of all-exon arrays, this provides a means by which the expression of the individual exons within a gene can be compared, thereby facilitating the identification and analysis of alternatively spliced exons. The application integrates data from four major source databases: Ensembl, RNAdb, Affymetrix and GeneAtlas; and it provides the users with a series of files and packages (R CDFs) to analyze particular query expression datasets. The maps cover both the widely used Affymetrix GeneChip microarrays based on 3' expression (e.g. human HG U133 series) and the all-exon expression microarrays (Gene 1.0 and Exon 1.0).Conclusions: GATExplorer is an integrated database that combines genomic/transcriptomic visualization with nucleotide-level probe mapping. By considering expression at the nucleotide level rather than the gene level, it shows that the arrays detect expression signals from entities that most researchers do not contemplate or discriminate. This approach provides the means to undertake a higher resolution analysis of microarray data and potentially extract considerably more detailed and biologically accurate information from existing and future microarray experiments

    Genome-wide gene expression profiling of stress response in a spinal cord clip compression injury model.

    Get PDF
    BackgroundThe aneurysm clip impact-compression model of spinal cord injury (SCI) is a standard injury model in animals that closely mimics the primary mechanism of most human injuries: acute impact and persisting compression. Its histo-pathological and behavioural outcomes are extensively similar to human SCI. To understand the distinct molecular events underlying this injury model we analyzed global mRNA abundance changes during the acute, subacute and chronic stages of a moderate to severe injury to the rat spinal cord.ResultsTime-series expression analyses resulted in clustering of the majority of deregulated transcripts into eight statistically significant expression profiles. Systematic application of Gene Ontology (GO) enrichment pathway analysis allowed inference of biological processes participating in SCI pathology. Temporal analysis identified events specific to and common between acute, subacute and chronic time-points. Processes common to all phases of injury include blood coagulation, cellular extravasation, leukocyte cell-cell adhesion, the integrin-mediated signaling pathway, cytokine production and secretion, neutrophil chemotaxis, phagocytosis, response to hypoxia and reactive oxygen species, angiogenesis, apoptosis, inflammatory processes and ossification. Importantly, various elements of adaptive and induced innate immune responses span, not only the acute and subacute phases, but also persist throughout the chronic phase of SCI. Induced innate responses, such as Toll-like receptor signaling, are more active during the acute phase but persist throughout the chronic phase. However, adaptive immune response processes such as B and T cell activation, proliferation, and migration, T cell differentiation, B and T cell receptor-mediated signaling, and B cell- and immunoglobulin-mediated immune response become more significant during the chronic phase.ConclusionsThis analysis showed that, surprisingly, the diverse series of molecular events that occur in the acute and subacute stages persist into the chronic stage of SCI. The strong agreement between our results and previous findings suggest that our analytical approach will be useful in revealing other biological processes and genes contributing to SCI pathology

    Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean

    Get PDF
    Background: The nutritional and economic value of many crops is effectively a function of seed protein and oil content. Insight into the genetic and molecular control mechanisms involved in the deposition of these constituents in the developing seed is needed to guide crop improvement. A quantitative trait locus (QTL) on Linkage Group I (LG I) of soybean (Glycine max (L.) Merrill) has a striking effect on seed protein content. Results: A soybean near-isogenic line (NIL) pair contrasting in seed protein and differing in an introgressed genomic segment containing the LG I protein QTL was used as a resource to demarcate the QTL region and to study variation in transcript abundance in developing seed. The LG I QTL region was delineated to less than 8.4 Mbp of genomic sequence on chromosome 20. Using AffymetrixÂŽ Soy GeneChip and high-throughput IlluminaÂŽ whole transcriptome sequencing platforms, 13 genes displaying significant seed transcript accumulation differences between NILs were identified that mapped to the 8.4 Mbp LG I protein QTL region. Conclusions: This study identifies gene candidates at the LG I protein QTL for potential involvement in the regulation of protein content in the soybean seed. The results demonstrate the power of complementary approaches to characterize contrasting NILs and provide genome-wide transcriptome insight towards understanding seed biology and the soybean genome

    Transcript-Specific Expression Profiles Derived from Sequence-Based Analysis of Standard Microarrays

    Get PDF
    Background: Alternative mRNA processing mechanisms lead to multiple transcripts (i.e. splice isoforms) of a given gene which may have distinct biological functions. Microarrays like Affymetrix GeneChips measure mRNA expression of genes using sets of nucleotide probes. Until recently probe sets were not designed for transcript specificity. Nevertheless, the reanalysis of established microarray data using newly defined transcript-specific probe sets may provide information about expression levels of specific transcripts. Methodology/Principal Findings: In the present study alignment of probe sequences of the Affymetrix microarray HGU133A with Ensembl transcript sequences was performed to define transcript-specific probe sets. Out of a total of 247,965 perfect match probes, 95,008 were designated ‘‘transcript-specific’’, i.e. showing complete sequence alignment, no crosshybridization, and transcript-, not only gene-specificity. These probes were grouped into 7,941 transcript-specific probe sets and 15,619 gene-specific probe sets, respectively. The former were used to differentiate 445 alternative transcripts of 215 genes. For selected transcripts, predicted by this analysis to be differentially expressed in the human kidney, confirmatory real-time RT-PCR experiments were performed. First, the expression of two specific transcripts of the genes PPM1A (PP2CA_HUMAN and P35813) and PLG (PLMN_HUMAN and Q5TEH5) in human kidneys was determined by the transcriptspecific array analysis and confirmed by real-time RT-PCR. Secondly, disease-specific differential expression of single transcripts of PLG and ABCA1 (ABCA1_HUMAN and Q5VYS0_HUMAN) was computed from the available array data sets and confirmed by transcript-specific real-time RT-PCR. Conclusions: Transcript-specific analysis of microarray experiments can be employed to study gene-regulation on the transcript level using conventional microarray data. In this study, predictions based on sufficient probe set size and foldchange are confirmed by independent mean

    Exon Array Analysis of Head and Neck Cancers Identifies a Hypoxia Related Splice Variant of LAMA3 Associated with a Poor Prognosis

    Get PDF
    The identification of alternatively spliced transcript variants specific to particular biological processes in tumours should increase our understanding of cancer. Hypoxia is an important factor in cancer biology, and associated splice variants may present new markers to help with planning treatment. A method was developed to analyse alternative splicing in exon array data, using probeset multiplicity to identify genes with changes in expression across their loci, and a combination of the splicing index and a new metric based on the variation of reliability weighted fold changes to detect changes in the splicing patterns. The approach was validated on a cancer/normal sample dataset in which alternative splicing events had been confirmed using RT-PCR. We then analysed ten head and neck squamous cell carcinomas using exon arrays and identified differentially expressed splice variants in five samples with high versus five with low levels of hypoxia-associated genes. The analysis identified a splice variant of LAMA3 (Laminin Îą 3), LAMA3-A, known to be involved in tumour cell invasion and progression. The full-length transcript of the gene (LAMA3-B) did not appear to be hypoxia-associated. The results were confirmed using qualitative RT-PCR. In a series of 59 prospectively collected head and neck tumours, expression of LAMA3-A had prognostic significance whereas LAMA3-B did not. This work illustrates the potential for alternatively spliced transcripts to act as biomarkers of disease prognosis with improved specificity for particular tissues or conditions over assays which do not discriminate between splice variants
    • …
    corecore