2,777 research outputs found

    Alternative mapping of probes to genes for Affymetrix chips

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Short oligonucleotide arrays have several probes measuring the expression level of each target transcript. Therefore the selection of probes is a key component for the quality of measurements. However, once probes have been selected and synthesized on an array, it is still possible to re-evaluate the results using an updated mapping of probes to genes, taking into account the latest biological knowledge available.</p> <p>Methods</p> <p>We investigated how probes found on recent commercial microarrays for human genes (Affymetrix HG-U133A) were matching a recent curated collection of human transcripts: the NCBI RefSeq database. We also built mappings and used them in place of the original probe to genes associations provided by the manufacturer of the arrays.</p> <p>Results</p> <p>In a large number of cases, 36%, the probes matching a reference sequence were consistent with the grouping of probes by the manufacturer of the chips. For the remaining cases there were discrepancies and we show how that can affect the analysis of data.</p> <p>Conclusions</p> <p>While the probes on Affymetrix arrays remain the same for several years, the biological knowledge concerning the genomic sequences evolves rapidly. Using up-to-date knowledge can apparently change the outcome of an analysis.</p

    An Ultra-High-Density, Transcript-Based, Genetic Map of Lettuce.

    Get PDF
    We have generated an ultra-high-density genetic map for lettuce, an economically important member of the Compositae, consisting of 12,842 unigenes (13,943 markers) mapped in 3696 genetic bins distributed over nine chromosomal linkage groups. Genomic DNA was hybridized to a custom Affymetrix oligonucleotide array containing 6.4 million features representing 35,628 unigenes of Lactuca spp. Segregation of single-position polymorphisms was analyzed using 213 F7:8 recombinant inbred lines that had been generated by crossing cultivated Lactuca sativa cv. Salinas and L. serriola acc. US96UC23, the wild progenitor species of L. sativa The high level of replication of each allele in the recombinant inbred lines was exploited to identify single-position polymorphisms that were assigned to parental haplotypes. Marker information has been made available using GBrowse to facilitate access to the map. This map has been anchored to the previously published integrated map of lettuce providing candidate genes for multiple phenotypes. The high density of markers achieved in this ultradense map allowed syntenic studies between lettuce and Vitis vinifera as well as other plant species

    Consistent annotation of gene expression arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene expression arrays are valuable and widely used tools for biomedical research. Today's commercial arrays attempt to measure the expression level of all of the genes in the genome. Effectively translating the results from the microarray into a biological interpretation requires an accurate mapping between the probesets on the array and the genes that they are targeting. Although major array manufacturers provide annotations of their gene expression arrays, the methods used by various manufacturers are different and the annotations are difficult to keep up to date in the rapidly changing world of biological sequence databases.</p> <p>Results</p> <p>We have created a consistent microarray annotation protocol applicable to all of the major array manufacturers. We constantly keep our annotations updated with the latest Ensembl Gene predictions, and thus cross-referenced with a large number of external biomedical sequence database identifiers. We show that these annotations are accurate and address in detail reasons for the minority of probesets that cannot be annotated. Annotations are publicly accessible through the Ensembl Genome Browser and programmatically through the Ensembl Application Programming Interface. They are also seamlessly integrated into the BioMart data-mining tool and the biomaRt package of BioConductor.</p> <p>Conclusions</p> <p>Consistent, accurate and updated gene expression array annotations remain critical for biological research. Our annotations facilitate accurate biological interpretation of gene expression profiles.</p

    Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Affymetrix GeneChips are a popular platform for performing whole-genome experiments on the transcriptome. There are a range of different calibration steps, and users are presented with choices of different background subtractions, normalisations and expression measures. We wished to establish which of the calibration steps resulted in the biggest uncertainty in the sets of genes reported to be differentially expressed.</p> <p>Results</p> <p>Our results indicate that the sets of genes identified as being most significantly differentially expressed, as estimated by the z-score of fold change, is relatively insensitive to the choice of background subtraction and normalisation. However, the contents of the gene list are most sensitive to the choice of expression measure. This is irrespective of whether the experiment uses a rat, mouse or human chip and whether the chip definition is made using probe mappings from Unigene, RefSeq, Entrez Gene or the original Affymetrix definitions. It is also irrespective of whether both Present and Absent, or just Present, Calls from the MAS5 algorithm are used to filter genelists, and this conclusion holds for genes of differing intensities. We also reach the same conclusion after assigning genes to be differentially expressed using t-statistics, although this approach results in a large amount of false positives in the sets of genes identified due to the small numbers of replicates typically used in microarray experiments.</p> <p>Conclusion</p> <p>The major calibration uncertainty that biologists need to consider when analysing Affymetrix data is how their multiple probe values are condensed into one expression measure.</p

    Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: High-resolution annotation for microarrays

    Get PDF
    BACKGROUND: Extracting biological information from high-density Affymetrix arrays is a multi-step process that begins with the accurate annotation of microarray probes. Shortfalls in the original Affymetrix probe annotation have been described; however, few studies have provided rigorous solutions for routine data analysis. RESULTS: Using AceView, a comprehensive human transcript database, we have reannotated the probes by matching them to RNA transcripts instead of genes. Based on this transcript-level annotation, a new probe set definition was created in which every probe in a probe set maps to a common set of AceView gene transcripts. In addition, using artificial data sets we identified that a minimal probe set size of 4 is necessary for reliable statistical summarization. We further demonstrate that applying the new probe set definition can detect specific transcript variants contributing to differential expression and it also improves cross-platform concordance. CONCLUSION: We conclude that our transcript-level reannotation and redefinition of probe sets complement the original Affymetrix design. Redefinitions introduce probe sets whose sizes may not support reliable statistical summarization; therefore, we advocate using our transcript-level mapping redefinition in a secondary analysis step rather than as a replacement. Knowing which specific transcripts are differentially expressed is important to properly design probe/primer pairs for validation purposes. For convenience, we have created custom chip-description-files (CDFs) and annotation files for our new probe set definitions that are compatible with Bioconductor, Affymetrix Expression Console or third party software

    Analysis of circadian pattern reveals tissue-specific alternative transcription in leptin signaling pathway

    Get PDF
    *Background*&#xd;&#xa;It has been previously reported that most mammalian genes display a circadian oscillation in their baseline expression. Consequently, the phase and amplitude of each component of a signal transduction cascade has downstream consequences. &#xd;&#xa;&#xd;&#xa;*Results*&#xd;&#xa;We report our analysis of alternative transcripts in the leptin signaling pathway which is responsible for the systemic regulation of macronutrient storage and energy balance. We focused on the circadian expression pattern of a critical component of the leptin signaling system, suppressor of cytokine signaling 3 (SOCS3). On an Affymetrix GeneChip 430A2 microarray, this gene is represented by three probe sets targeting different regions within the 3&#x2019; end of the last exon. We demonstrate that in murine brown adipose tissue two downstream 3&#x2019; probe sets experience circadian baseline oscillation in counter-phase to the upstream probe set. Such differences in expression patterns are a telltale sign of alternative splicing within the last exon of SOCS3. In contrast, all three probe sets oscillated in a common phase in murine liver and white adipose tissue. This suggests that the regulation of SOCS3 expression in brown fat is tissue specific. Another component of the signaling pathway, Janus kinase (JAK), is directly regulated by SOCS and has alternative transcript probe sets oscillating in counter-phase in a white adipose tissue specific manner.&#xd;&#xa; &#xd;&#xa;*Conclusion*&#xd;&#xa;We hypothesize that differential oscillation of alternative transcripts may provide a mechanism to maintain steady levels of expression in spite of circadian baseline variation

    Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases

    Full text link
    Copy number variants (CNVs) account for more polymorphic base pairs in the human genome than do single nucleotide polymorphisms (SNPs). CNVs encompass genes as well as noncoding DNA, making these polymorphisms good candidates for functional variation. Consequently, most modern genome-wide association studies test CNVs along with SNPs, after inferring copy number status from the data generated by high-throughput genotyping platforms. Here we give an overview of CNV genomics in humans, highlighting patterns that inform methods for identifying CNVs. We describe how genotyping signals are used to identify CNVs and provide an overview of existing statistical models and methods used to infer location and carrier status from such data, especially the most commonly used methods exploring hybridization intensity. We compare the power of such methods with the alternative method of using tag SNPs to identify CNV carriers. As such methods are only powerful when applied to common CNVs, we describe two alternative approaches that can be informative for identifying rare CNVs contributing to disease risk. We focus particularly on methods identifying de novo CNVs and show that such methods can be more powerful than case-control designs. Finally we present some recommendations for identifying CNVs contributing to common complex disorders.Comment: Published in at http://dx.doi.org/10.1214/09-STS304 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore