34 research outputs found

    Integrating data from heterogeneous DNA microarray platforms

    Get PDF
    DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.The authors thank the FCT Strategic Project of UID/BIO/04469/2013 unit, the project RECI/BBBEBI/0179/2012 (FCOMP-01-0124-FEDER-027462) and the project BioInd - Biotechnology and Bioengineering for improved Industrial and Agro-Foodprocesses”, REF.NORTE-07-0124FEDER-000028 Co-funded by the Programa Operacional Regional do Norte (ON.2 O Novo Norte), QREN, FEDER

    Using surveys of Affymetrix GeneChips to study antisense expression.

    Get PDF
    We have used large surveys of Affymetrix GeneChip data in the public domain to conduct a study of antisense expression across diverse conditions. We derive correlations between groups of probes which map uniquely to the same exon in the antisense direction. When there are no probes assigned to an exon in the sense direction we find that many of the antisense groups fail to detect a coherent block of transcription. We find that only a minority of these groups contain coherent blocks of antisense expression suggesting transcription. We also derive correlations between groups of probes which map uniquely to the same exon in both sense and antisense direction. In some of these cases the locations of sense probes overlap with the antisense probes, and the sense and antisense probe intensities are correlated with each other. This configuration suggests the existence of a Natural Antisense Transcript (NAT) pair. We find the majority of such NAT pairs detected by GeneChips are formed by a transcript of an established gene and either an EST or an mRNA. In order to determine the exact antisense regulatory mechanism indicated by the correlation of sense probes with antisense probes, a further investigation is necessary for every particular case of interest. However, the analysis of microarray data has proved to be a good method to reconfirm known NATs, discover new ones, as well as to notice possible problems in the annotation of antisense transcripts

    Long non-coding RNA deregulation in tongue squamous cell carcinoma

    Get PDF
    Background. The deregulated tumorigenic long non-coding RNA (lncRNA) has been reported in several malignancies. However, there is still no comprehensive study on tongue squamous cell carcinoma (SCC). Methods. Functional reannotation for the human lncRNA was carried out by ncFANs. Real-time quantitative PCR was used to validate the identified lncRNAs. Results. Using the functional annotation algorithm from ncFANs, 8 differentially expressed lncRNAs were identified. Lnc-PPP2R4-5, lnc-SPRR2D-1, lnc-MAN1A2-1, lnc-FAM46A-1, lnc-MBL2-4:1, and lnc-MBL2-4:3 were upregulated in the microdissected tongue SCC tissues. In comparison, lnc-AL355149.1-1 and lnc-STXBP5-1 showed significant downregulation. High level of lnc-MBL2-4:3 was significantly associated with the node positive tongue SCC patients. Further, patients with advanced T-stage demonstrated a further reduction of lnc-AL355149.1-1 in the tumor tissues. Treatment of tongue SCC cells with 5-fluorouracil and paclitaxel can reserve the expression patterns observed in the tongue SCC tissues. Further, changes of lnc-MBL2-4:3 and lnc-AL355149.1-1 expression levels were noticed in the cisplatin-resistant tongue SCC cells. Conclusions. Our results demonstrated that functional reannotation allows us to identify novel lncRNAs using the existing gene expression array dataset. The association of lncRNA with the T-stage and nodal status of tongue SCC patients suggested that lncRNA deregulation was involved in the pathogenesis of tongue SCC.published_or_final_versio

    Transcript-level annotation of Affymetrix probesets improves the interpretation of gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The wide use of Affymetrix microarray in broadened fields of biological research has made the probeset annotation an important issue. Standard Affymetrix probeset annotation is at gene level, i.e. a probeset is precisely linked to a gene, and probeset intensity is interpreted as gene expression. The increased knowledge that one gene may have multiple transcript variants clearly brings up the necessity of updating this gene-level annotation to a refined transcript-level.</p> <p>Results</p> <p>Through performing rigorous alignments of the Affymetrix probe sequences against a comprehensive pool of currently available transcript sequences, and further linking the probesets to the International Protein Index, we generated transcript-level or protein-level annotation tables for two popular Affymetrix expression arrays, Mouse Genome 430A 2.0 Array and Human Genome U133A Array. Application of our new annotations in re-examining existing expression data sets shows increased expression consistency among synonymous probesets and strengthened expression correlation between interacting proteins.</p> <p>Conclusion</p> <p>By refining the standard Affymetrix annotation of microarray probesets from the gene level to the transcript level and protein level, one can achieve a more reliable interpretation of their experimental data, which may lead to discovery of more profound regulatory mechanism.</p

    Consistent annotation of gene expression arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene expression arrays are valuable and widely used tools for biomedical research. Today's commercial arrays attempt to measure the expression level of all of the genes in the genome. Effectively translating the results from the microarray into a biological interpretation requires an accurate mapping between the probesets on the array and the genes that they are targeting. Although major array manufacturers provide annotations of their gene expression arrays, the methods used by various manufacturers are different and the annotations are difficult to keep up to date in the rapidly changing world of biological sequence databases.</p> <p>Results</p> <p>We have created a consistent microarray annotation protocol applicable to all of the major array manufacturers. We constantly keep our annotations updated with the latest Ensembl Gene predictions, and thus cross-referenced with a large number of external biomedical sequence database identifiers. We show that these annotations are accurate and address in detail reasons for the minority of probesets that cannot be annotated. Annotations are publicly accessible through the Ensembl Genome Browser and programmatically through the Ensembl Application Programming Interface. They are also seamlessly integrated into the BioMart data-mining tool and the biomaRt package of BioConductor.</p> <p>Conclusions</p> <p>Consistent, accurate and updated gene expression array annotations remain critical for biological research. Our annotations facilitate accurate biological interpretation of gene expression profiles.</p

    Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: High-resolution annotation for microarrays

    Get PDF
    BACKGROUND: Extracting biological information from high-density Affymetrix arrays is a multi-step process that begins with the accurate annotation of microarray probes. Shortfalls in the original Affymetrix probe annotation have been described; however, few studies have provided rigorous solutions for routine data analysis. RESULTS: Using AceView, a comprehensive human transcript database, we have reannotated the probes by matching them to RNA transcripts instead of genes. Based on this transcript-level annotation, a new probe set definition was created in which every probe in a probe set maps to a common set of AceView gene transcripts. In addition, using artificial data sets we identified that a minimal probe set size of 4 is necessary for reliable statistical summarization. We further demonstrate that applying the new probe set definition can detect specific transcript variants contributing to differential expression and it also improves cross-platform concordance. CONCLUSION: We conclude that our transcript-level reannotation and redefinition of probe sets complement the original Affymetrix design. Redefinitions introduce probe sets whose sizes may not support reliable statistical summarization; therefore, we advocate using our transcript-level mapping redefinition in a secondary analysis step rather than as a replacement. Knowing which specific transcripts are differentially expressed is important to properly design probe/primer pairs for validation purposes. For convenience, we have created custom chip-description-files (CDFs) and annotation files for our new probe set definitions that are compatible with Bioconductor, Affymetrix Expression Console or third party software

    Analýza pluripotentního programu genové exprese v časných embryích a embryonálních kmenových buňkách

    Get PDF
    Pluripotence je schopnost buňky diferencovat do jakéhokoliv buněčného typu. Formuje se během časného embryonálního vývoje u savců a její vznik je spojen s reprogramací genové exprese na globální úrovni. Proces přirozeného vzniku pluripotence není stále zcela pochopen. Pro získání nového pohledu na události, které vedou ke vzniku pluripotence u savců, studovali jsme změny v genové expresi během oocyt-zygotického přechodu u myši. V tomto modelovém systému, oplodněné vajíčko podstoupí reprogramaci, která vede k vytvoření pluripotentních blastomer. Tyto blastomery zakládají samotné embryo. Cílem mé diplomové práce bylo analyzovat aktivaci transkripce během časného vývoje a vyvinout metodu pro monitorování exprese genů v oocytech, časných embryích a embryonálních kmenových buňkách. Metoda využívá kvantitativní PCR a umožnuje změřit expresi až 48 vybraných genů, které slouží jako markery pro maternální degradaci, aktivaci pluripotentního programu a diferenciaci do zárodečných linií. Dále ukazujeme, že náš systém monitoruje dynamiku transkriptomu během oocyt-zygotického přechodu, a získané výsledky jsou srovnatelné s daty naměřenými pomocí jiných metod. Díky našemu bioinformatickému přístupu jsme navíc identifikovali nové oocyt-specifické a zygotické nekódující RNA. Klíčová slova: pluripotence,...Pluripotence je schopnost buňky diferencovat do jakéhokoliv buněčného typu. Formuje se během časného embryonálního vývoje u savců a její vznik je spojen s reprogramací genové exprese na globální úrovni. Proces přirozeného vzniku pluripotence není stále zcela pochopen. Pro získání nového pohledu na události, které vedou ke vzniku pluripotence u savců, studovali jsme změny v genové expresi během oocyt-zygotického přechodu u myši. V tomto modelovém systému, oplodněné vajíčko podstoupí reprogramaci, která vede k vytvoření pluripotentních blastomer. Tyto blastomery zakládají samotné embryo. Cílem mé diplomové práce bylo analyzovat aktivaci transkripce během časného vývoje a vyvinout metodu pro monitorování exprese genů v oocytech, časných embryích a embryonálních kmenových buňkách. Metoda využívá kvantitativní PCR a umožnuje změřit expresi až 48 vybraných genů, které slouží jako markery pro maternální degradaci, aktivaci pluripotentního programu a diferenciaci do zárodečných linií. Dále ukazujeme, že náš systém monitoruje dynamiku transkriptomu během oocyt-zygotického přechodu, a získané výsledky jsou srovnatelné s daty naměřenými pomocí jiných metod. Díky našemu bioinformatickému přístupu jsme navíc identifikovali nové oocyt-specifické a zygotické nekódující RNA. Klíčová slova: pluripotence,...Department of Cell BiologyKatedra buněčné biologieFaculty of SciencePřírodovědecká fakult

    Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis.</p> <p>Results</p> <p>Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets.</p> <p>Conclusion</p> <p>Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.</p

    Integrating multiple genome annotation databases improves the interpretation of microarray gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Affymetrix GeneChip is a widely used gene expression profiling platform. Since the chips were originally designed, the genome databases and gene definitions have been considerably updated. Thus, more accurate interpretation of microarray data requires parallel updating of the specificity of GeneChip probes. We propose a new probe remapping protocol, using the zebrafish GeneChips as an example, by removing nonspecific probes, and grouping the probes into transcript level probe sets using an integrated zebrafish genome annotation. This genome annotation is based on combining transcript information from multiple databases. This new remapping protocol, especially the new genome annotation, is shown here to be an important factor in improving the interpretation of gene expression microarray data.</p> <p>Results</p> <p>Transcript data from the RefSeq, GenBank and Ensembl databases were downloaded from the UCSC genome browser, and integrated to generate a combined zebrafish genome annotation. Affymetrix probes were filtered and remapped according to the new annotation. The influence of transcript collection and gene definition methods was tested using two microarray data sets. Compared to remapping using a single database, this new remapping protocol results in up to 20% more probes being retained in the remapping, leading to approximately 1,000 more genes being detected. The differentially expressed gene lists are consequently increased by up to 30%. We are also able to detect up to three times more alternative splicing events. A small number of the bioinformatics predictions were confirmed using real-time PCR validation.</p> <p>Conclusions</p> <p>By combining gene definitions from multiple databases, it is possible to greatly increase the numbers of genes and splice variants that can be detected in microarray gene expression experiments.</p

    Transcriptome Analysis of the Arabidopsis Megaspore Mother Cell Uncovers the Importance of RNA Helicases for Plant Germline Development

    Get PDF
    Germ line specification is a crucial step in the life cycle of all organisms. For sexual plant reproduction, the megaspore mother cell (MMC) is of crucial importance: it marks the first cell of the plant “germline” lineage that gets committed to undergo meiosis. One of the meiotic products, the functional megaspore, subsequently gives rise to the haploid, multicellular female gametophyte that harbours the female gametes. The MMC is formed by selection and differentiation of a single somatic, sub-epidermal cell in the ovule. The transcriptional network underlying MMC specification and differentiation is largely unknown. We provide the first transcriptome analysis of an MMC using the model plant Arabidopsis thaliana with a combination of laser-assisted microdissection and microarray hybridizations. Statistical analyses identified an over-representation of translational regulation control pathways and a significant enrichment of DEAD/DEAH-box helicases in the MMC transcriptome, paralleling important features of the animal germline. Analysis of two independent T-DNA insertion lines suggests an important role of an enriched helicase, MNEME (MEM), in MMC differentiation and the restriction of the germline fate to only one cell per ovule primordium. In heterozygous mem mutants, additional enlarged MMC-like cells, which sometimes initiate female gametophyte development, were observed at higher frequencies than in the wild type. This closely resembles the phenotype of mutants affected in the small RNA and DNA-methylation pathways important for epigenetic regulation. Importantly, the mem phenotype shows features of apospory, as female gametophytes initiate from two non-sister cells in these mutants. Moreover, in mem gametophytic nuclei, both higher order chromatin structure and the distribution of LIKE HETEROCHROMATIN PROTEIN1 were affected, indicating epigenetic perturbations. In summary, the MMC transcriptome sets the stage for future functional characterization as illustrated by the identification of MEM, a novel gene involved in the restriction of germline fate
    corecore