95 research outputs found

    Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations

    Get PDF
    BACKGROUND: Microarrays measure the binding of nucleotide sequences to a set of sequence specific probes. This information is combined with annotation specifying the relationship between probes and targets and used to make inferences about transcript- and, ultimately, gene expression. In some situations, a probe is capable of hybridizing to more than one transcript, in others, multiple probes can target a single sequence. These 'multiply targeted' probes can result in non-independence between measured expression levels. RESULTS: An analysis of these relationships for Affymetrix arrays considered both the extent and influence of exact matches between probe and transcript sequences. For the popular HGU133A array, approximately half of the probesets were found to interact in this way. Both real and simulated expression datasets were used to examine how these effects influenced the expression signal. It was found not only to lead to increased signal strength for the affected probesets, but the major effect is to significantly increase their correlation, even in situations when only a single probe from a probeset was involved. By building a network of probe-probeset-transcript relationships, it is possible to identify families of interacting probesets. More than 10% of the families contain members annotated to different genes or even different Unigene clusters. Within a family, a mixture of genuine biological and artefactual correlations can occur. CONCLUSION: Multiple targeting is not only prevalent, but also significant. The ability of probesets to hybridize to more than one gene product can lead to false positives when analysing gene expression. Comprehensive annotation describing multiple targeting is required when interpreting array data

    PLANdbAffy: probe-level annotation database for Affymetrix expression microarrays

    Get PDF
    Standard Affymetrix technology evaluates gene expression by measuring the intensity of mRNA hybridization with a panel of the 25-mer oligonucleotide probes, and summarizing the probe signal intensities by a robust average method. However, in many cases, signal intensity of the probe does not correlate with gene expression. This could be due to the hybridization of the probe to a transcript of another gene, mapping of the probe to an intron, alternative splicing, single nucleotide polymorphisms and other reasons. We have developed a database, PLANdbAffy (available at http://affymetrix2.bioinf.fbb.msu.ru), that contains the results of the alignment of probe sequences from five Affymetrix expression microarrays to the human genome. We have determined the probes matching the transcript-coding regions in the correct orientation. For each such probe alignment region, we determined the mRNA and EST sequences that contain the probe sequence. In the textual part of the database interface we summarize the data on the sequences that cover the probe alignment region and SNPs that are located inside it. The graphical part of our database interface is implemented as custom tracks to the UCSC genome browser that allows one to utilize all the data that are offered by UCSC browser

    Jetset: selecting the optimal microarray probe set to represent a gene

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Interpretation of gene expression microarrays requires a mapping from probe set to gene. On many Affymetrix gene expression microarrays, a given gene may be detected by multiple probe sets, which may deliver inconsistent or even contradictory measurements. Therefore, obtaining an unambiguous expression estimate of a pre-specified gene can be a nontrivial but essential task.</p> <p>Results</p> <p>We developed scoring methods to assess each probe set for specificity, splice isoform coverage, and robustness against transcript degradation. We used these scores to select a single representative probe set for each gene, thus creating a simple one-to-one mapping between gene and probe set. To test this method, we evaluated concordance between protein measurements and gene expression values, and between sets of genes whose expression is known to be correlated. For both test cases, we identified genes that were nominally detected by multiple probe sets, and we found that the probe set chosen by our method showed stronger concordance.</p> <p>Conclusions</p> <p>This method provides a simple, unambiguous mapping to allow assessment of the expression levels of specific genes of interest.</p

    Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data

    Get PDF
    RNA-sequencing has become the gold standard for whole-transcriptome gene expression quanti cation. Multiple algorithms have been developed to derive gene counts from sequencing reads. While a number of benchmarking studies have been conducted, the question remains how individual methods perform at accurately quantifying gene expression levels from RNA-sequencing reads. We performed an independent benchmarking study using RNA-sequencing data from the well established MAQCA and MAQCB reference samples. RNA-sequencing reads were processed using five workflows (Tophat-HTSeq, Tophat-Cuflinks, STAR-HTSeq, Kallisto and Salmon) and resulting gene expression measurements were compared to expression data generated by wet-lab validated qPCR assays for all protein coding genes. All methods showed high gene expression correlations with qPCR data. When comparing gene expression fold changes between MAQCA and MAQCB samples, about 85% of the genes showed consistent results between RNA-sequencing and qPCR data. Of note, each method revealed a small but speci c gene set with inconsistent expression measurements. A significant proportion of these method-specific inconsistent genes were reproducibly identified in independent datasets. These genes were typically smaller, had fewer exons, and were lower expressed compared to genes with consistent expression measurements. We propose that careful validation is warranted when evaluating RNA-seq based expression profiles for this specific gene set

    Cross-hybridization modeling on Affymetrix exon arrays

    Get PDF
    Motivation: Microarray designs have become increasingly probe-rich, enabling targeting of specific features, such as individual exons or single nucleotide polymorphisms. These arrays have the potential to achieve quantitative high-throughput estimates of transcript abundances, but currently these estimates are affected by biases due to cross-hybridization, in which probes hybridize to off-target transcripts

    Optimization of the BLASTN substitution matrix for prediction of non-specific DNA microarray hybridization

    Get PDF
    DNA microarray measurements are susceptible to error caused by non-specific hybridization between a probe and a target (cross-hybridization), or between two targets (bulk-hybridization). Search algorithms such as BLASTN can quickly identify potentially hybridizing sequences. We set out to improve BLASTN accuracy by modifying the substitution matrix and gap penalties. We generated gene expression microarray data for samples in which 1 or 10% of the target mass was an exogenous spike of known sequence. We found that the 10% spike induced 2-fold intensity changes in 3% of the probes, two-third of which were decreases in intensity likely caused by bulk-hybridization. These changes were correlated with similarity between the spike and probe sequences. Interestingly, even very weak similarities tended to induce a change in probe intensity with the 10% spike. Using this data, we optimized the BLASTN substitution matrix to more accurately identify probes susceptible to non-specific hybridization with the spike. Relative to the default substitution matrix, the optimized matrix features a decreased score for A–T base pairs relative to G–C base pairs, resulting in a 5–15% increase in area under the ROC curve for identifying affected probes. This optimized matrix may be useful in the design of microarray probes, and in other BLASTN-based searches for hybridization partners

    Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis.</p> <p>Results</p> <p>Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN), significantly outperform mean-centering and distance-weighted discrimination (DWD) in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets.</p> <p>Conclusion</p> <p>Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.</p

    Secuenciación y análisis del transcriptoma de dalbulusmaidis

    Get PDF
    Los auquenorrincos (chicharritas o cotorritas) son insectos exclusivamente fitófagos, que pueden causar importantes daños económicos sobre los cultivos. Una de las enfermedades vectorizadas por ellos es el achaparramiento del maíz o Corn Stunt Disease, potencialmente una de las enfermedades más serias del cultivo de maíz, capaz de causar pérdidas parciales o totales en la producci ón en las zonas afectadas. En Argentina, Dalbulusmaidis (Hemiptera: Auchenorrhyncha) es el único vector a campo conocido como transmisor del Spiroplasmakunkelii , patógeno causal del Corn Stunt . Dada su importancia como plaga en la agricultura, se secuenció el transcriptoma de todos los estadios del ciclo de vida de este insecto (huevos, 5 estadios ninfales y dos muestras de adultos). Se utilizó un pool de insectos para abarcar la mayor cantidad de genes expresados. Como la información genómica de Dalbulusma idis no está disponible, se realizó el ensamblado de novo . Se compararon los ensambles realizados con 3 programas: VELVET OASES, ABySS y Trinity. Se evaluaron utilizando métricas (N50, longitud de contig ) y medidas de cobertura (CEG, BUSCO). En base a es tosanálisis, se decidió buscar genes del desarrollo en los ensambles de VELVET OASES y Trinity. El porcentaje total de genes encontrado fue mayor para el ensamble de Trinity. Teniendo en cuenta los resultados previos, se ensamblaron el resto de las muest ras con Trinity, obteniendo valores de métricas y coberturas muy buenos. Además se compararon los transcriptomas con proteomas publicados como medida de homología entre especies. En este trabajo se compararon distintos métodos de ensamble de novo y se selec cionó el que mejor se adaptó a nuestros datos y experimentosFil: Palacio, Victorio Gabriel. Universidad Nacional del Noroeste de la Provincia de Buenos AiresFil: Lavore, Andrés. Universidad Nacional del Noroeste de la Provincia de Buenos AiresFil: Catalano, María Inés . Universidad Nacional del Noroeste de la Provincia de Buenos AiresFil: Rivera Pomar, Rolando . Universidad Nacional del Noroeste de la Provincia de Buenos Aire
    corecore