Search CORE

23,861 research outputs found

Normalized Affymetrix expression data are biased by G-quadruplex formation

Author: Altman
Andrew P. Harrison
Barrett
Bolstad
Burge
Cambon
Do
Dudoit
Eisen
Farhat N. Memon
Geller
Gellert
Giorgi
Graham J. G. Upton
Hammond
Harris
Hochreiter
Hubbell
Hugh P. Shanahan
Irizarry
Irizarry
Iwamoto
Kittleson
Langdon
Li
Memon
Memon
Naef
Patterson
Ringnér
Ryan
Sen
Stalteri
Upton
Upton
Walton
Wu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2011
Field of study

Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG-U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14 of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15 of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal. © 2011 The Author(s)

University of Essex Research Repository

CiteSeerX

Royal Holloway Research Online

Crossref

Royal Holloway - Pure

PubMed Central

Listen to genes : dealing with microarray data in the frequency domain

Author: A Claridge-Chang
AN Stepanova
AN Stepanova
B-R Kim
Diego Di Bernardo
Dongyun Yi
H Guo
H Ueda
HG McWatters
IP Androulakis
J Fan
J Fan
J Qian
JCW Locke
JH Wu
Jianfeng Feng
MJ Yanovsky
MR Doyle
N Dojer
P DHaeseleer
PO Lim
PT Spellman
R Balasubramaniyan
R Cristi
Ritesh Krishna
S Kim
S Wichert
Shuixia Guo
SL Harmer
SX Guo
U Alon
Vicky Buchanan-Wollaston
W Pan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/04/2009
Field of study

Background: We present a novel and systematic approach to analyze temporal microarray data. The approach includes normalization, clustering and network analysis of genes. Methodology: Genes are normalized using an error model based uniform normalization method aimed at identifying and estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000 genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail. Conclusions: We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of potential interest to Arabidopsis researchers

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Warwick Research Archives Portal Repository

Discussion of: Treelets--An adaptive multi-scale basis for sparse unordered data

Author: Bühlmann Peter
Meinshausen Nicolai
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

This is a discussion of paper "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481] by Ann B. Lee, Boaz Nadler and Larry Wasserman. In this paper the authors defined a new type of dimension reduction algorithm, namely, the treelet algorithm. The treelet method has the merit of being completely data driven, and its decomposition is easier to interpret as compared to PCR. It is suitable in some certain situations, but it also has its own limitations. I will discuss both the strength and the weakness of this method when applied to microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS137E the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

maigesPack: A Computational Environment for Microarray Data Analysis

Author: Esteves Gustavo H.
Hirata Jr Roberto
Publication venue
Publication date: 11/11/2015
Field of study

Microarray technology is still an important way to assess gene expression in molecular biology, mainly because it measures expression profiles for thousands of genes simultaneously, what makes this technology a good option for some studies focused on systems biology. One of its main problem is complexity of experimental procedure, presenting several sources of variability, hindering statistical modeling. So far, there is no standard protocol for generation and evaluation of microarray data. To mitigate the analysis process this paper presents an R package, named maigesPack, that helps with data organization. Besides that, it makes data analysis process more robust, reliable and reproducible. Also, maigesPack aggregates several data analysis procedures reported in literature, for instance: cluster analysis, differential expression, supervised classifiers, relevance networks and functional classification of gene groups or gene networks

arXiv.org e-Print Archive

CiteSeerX

Study of meta-analysis strategies for network inference using information-theoretic approaches

Author: Bellot Pujalte Pau
Bontempi Gianluca
Haibe-Kains Benjamin
Meyer Patrick E.
Pham Ngoc C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Indeed, such analysis is usually more robust than the traditional approaches focused on individual datasets, which typically suffer from some experimental bias and a small number of samples. To date, there are mainly two strategies for the problem of interest: the first one (”data merging”) merges all datasets together and then infers a GRN whereas the other (”networks ensemble”) infers GRNs from every dataset separately and then aggregates them using some ensemble rules (such as ranksum or weightsum). Unfortunately, a thorough comparison of these two approaches is lacking. In this paper, we evaluate the performances of various metaanalysis approaches mentioned above with a systematic set of experiments based on in silico benchmarks. Furthermore, we present a new meta-analysis approach for inferring GRNs from multiple studies. Our proposed approach, adapted to methods based on pairwise measures such as correlation or mutual information, consists of two steps: aggregating matrices of the pairwise measures from every dataset followed by extracting the network from the meta-matrix.Peer ReviewedPostprint (author's final draft

University of Toronto Research Repository

Crossref

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

DI-fusion

Profound effect of profiling platform and normalization strategy on detection of differentially expressed microRNAs

Author: Kaiser Sebastian
Meyer Swanhild U.
Pfaffl Michael W.
Thirion Christian
Wagner Carola
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2012
Field of study

Adequate normalization minimizes the effects of systematic technical variations and is a prerequisite for getting meaningful biological changes. However, there is inconsistency about miRNA normalization performances and recommendations. Thus, we investigated the impact of seven different normalization methods (reference gene index, global geometric mean, quantile, invariant selection, loess, loessM, and generalized procrustes analysis) on intra- and inter-platform performance of two distinct and commonly used miRNA profiling platforms. We included data from miRNA profiling analyses derived from a hybridization-based platform (Agilent Technologies) and an RT-qPCR platform (Applied Biosystems). Furthermore, we validated a subset of miRNAs by individual RT-qPCR assays. Our analyses incorporated data from the effect of differentiation and tumor necrosis factor alpha treatment on primary human skeletal muscle cells and a murine skeletal muscle cell line. Distinct normalization methods differed in their impact on (i) standard deviations, (ii) the area under the receiver operating characteristic (ROC) curve, (iii) the similarity of differential expression. Loess, loessM, and quantile analysis were most effective in minimizing standard deviations on the Agilent and TLDA platform. Moreover, loess, loessM, invariant selection and generalized procrustes analysis increased the area under the ROC curve, a measure for the statistical performance of a test. The Jaccard index revealed that inter-platform concordance of differential expression tended to be increased by loess, loessM, quantile, and GPA normalization of AGL and TLDA data as well as RGI normalization of TLDA data. We recommend the application of loess, or loessM, and GPA normalization for miRNA Agilent arrays and qPCR cards as these normalization approaches showed to (i) effectively reduce standard deviations, (ii) increase sensitivity and accuracy of differential miRNA expression detection as well as (iii) increase inter-platform concordance. Results showed the successful adoption of loessM and generalized procrustes analysis to one-color miRNA profiling experiments

Open Access LMU

Optimal classifier selection and negative bias in error rate estimation: An empirical study on high-dimensional prediction

Author: Boulesteix Anne-Laure
Strobl Carolin
Publication venue
Publication date: 01/01/2009
Field of study

In biometric practice, researchers often apply a large number of different methods in a "trial-and-error" strategy to get as much as possible out of their data and, due to publication pressure or pressure from the consulting customer, present only the most favorable results. This strategy may induce a substantial optimistic bias in prediction error estimation, which is quantitatively assessed in the present manuscript. The focus of our work is on class prediction based on high-dimensional data (e.g. microarray data), since such analyses are particularly exposed to this kind of bias. In our study we consider a total of 124 variants of classifiers (possibly including variable selection or tuning steps) within a cross-validation evaluation scheme. The classifiers are applied to original and modified real microarray data sets, some of which are obtained by randomly permuting the class labels to mimic non-informative predictors while preserving their correlation structure. We then assess the minimal misclassification rate over the different variants of classifiers in order to quantify the bias arising when the optimal classifier is selected a posteriori in a data-driven manner. The bias resulting from the parameter tuning (including gene selection parameters as a special case) and the bias resulting from the choice of the classification method are examined both separately and jointly. We conclude that the strategy to present only the optimal result is not acceptable, and suggest alternative approaches for properly reporting classification accuracy

Springer - Publisher Connector

Directory of Open Access Journals

Open Access LMU

PubMed Central

Starr: Simple Tiling Array Analysis of Affymetrix ChIP-chip data

Author: Tresch Achim
Zacher Benedikt
Publication venue
Publication date: 01/01/2009
Field of study

Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is an assay for DNA-protein-binding or post-translational chromatin/histone modifications. As with all high-throughput technologies, it requires a thorough bioinformatic processing of the data for which there is no standard yet. The primary goal is the reliable identification and localization of genomic regions that bind a specific protein. The second step comprises comparison of binding profiles of functionally related proteins, or of binding profiles of the same protein in different genetic backgrounds or environmental conditions. Ultimately, one would like to gain a mechanistic understanding of the effects of DNA binding events on gene expression. We present a free, open-source R package Starr that, in combination with the package Ringo, facilitates the comparative analysis of ChIP-chip data across experiments and across different microarray platforms. Core features are data import, quality assessment, normalization and visualization of the data, and the detection of ChIP-enriched genomic regions. The use of common Bioconductor classes ensures the compatibility with other R packages. Most importantly, Starr provides methods for integration of complementary genomics data, e.g., it enables systematic investigation of the relation between gene expression and dna binding

arXiv.org e-Print Archive

CiteSeerX