Search CORE

3,397 research outputs found

Study of meta-analysis strategies for network inference using information-theoretic approaches

Author: Bellot Pujalte Pau
Bontempi Gianluca
Haibe-Kains Benjamin
Meyer Patrick E.
Pham Ngoc C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Indeed, such analysis is usually more robust than the traditional approaches focused on individual datasets, which typically suffer from some experimental bias and a small number of samples. To date, there are mainly two strategies for the problem of interest: the first one (”data merging”) merges all datasets together and then infers a GRN whereas the other (”networks ensemble”) infers GRNs from every dataset separately and then aggregates them using some ensemble rules (such as ranksum or weightsum). Unfortunately, a thorough comparison of these two approaches is lacking. In this paper, we evaluate the performances of various metaanalysis approaches mentioned above with a systematic set of experiments based on in silico benchmarks. Furthermore, we present a new meta-analysis approach for inferring GRNs from multiple studies. Our proposed approach, adapted to methods based on pairwise measures such as correlation or mutual information, consists of two steps: aggregating matrices of the pairwise measures from every dataset followed by extracting the network from the meta-matrix.Peer ReviewedPostprint (author's final draft

University of Toronto Research Repository

Crossref

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

DI-fusion

A comparative study of different strategies of batch effect removal in microarray data: a case study of three datasets

Author: Ding Fei
Publication venue
Publication date: 27/09/2013
Field of study

Batch effects refer to the systematic non-biological variability that is introduced by experimental design and sample processing in microarray experiments. It is a common issue in microarray data and could introduce bias into the analysis, if ignored. Many batch effect removal methods have been developed. Previous comparative work has been focused on their effectiveness of batch effects removal and impact on downstream classification analysis. The most common type of analysis for microarray data is differential expression (DE) analysis, yet no study has examined the impact of these methods on downstream DE analysis, which identifies markers that are significantly associated with the outcome of interest. In this project, we investigated the performance of five popular batch effect removal methods, mean-centering, ComBat_p, ComBat_n, SVA, and ratio based methods, on batch effects reduction and their impact on DE analysis using three experimental datasets with different sources of batch effects. We found that the performance of these methods is data-dependent: simple mean-centering method performed reasonably well in all three datasets, but the more complicated algorithms such as ComBat method’s performance could be unstable for certain dataset and should be applied with caution. Given a new dataset, we recommend either using the mean-centering method or carefully investigating a few different batch removal methods and choosing the one that is the best for the data, if possible. This study has important public health significance because better handling of batch effect in microarray data can reduce biased results and lead to improved biomarker identification

D-Scholarship@Pitt

New statistical tools for microarray data and comparison with existing tools

Author: Liu Xuxin
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/08/2007
Field of study

Microarray technologies have gained tremendous interest from researchers in recent years. The problem we are interested in is how to combine two microarray data, which have systematic batch differences. The reason for the combination is that the combined data set contains more samples which will give improved statistical power. This dissertation covers two topics about microarray batch adjustment. The first topic is about the visualization of paired High Dimension Low Sample Size (HDLSS) data. We propose two interesting directions: the Canonical Parallel and the Canonical Orthogonal Directions (CPD & COD). This pair of directions gives an insightful 2-d parallel view for understanding paired HDLSS data sets. The CPD can be used for adjusting the batch differences. An application to the NCI60 cell lines data shows good performance of this method. The second topic is about the comparison between three commonly used batch adjustment methods: the Support Vector Machine (SVM), the Distance Weighted Discrimination (DWD), and the Prediction Analysis of Microarray (PAM). We show that SVM has some serious problems for the HDLSS data. The DWD method is much more robust than PAM under the Unbalanced Subgroup Model. The mathematical studies made in this dissertation are in the area of HDLSS asymptotics, in the sense that the sample sizes are fixed and the dimension (the number of genes) goes to infinity. Hall et. al (2004) have studied the geometric structure of the data when the dimension is high. In this dissertation, we study the geometric structure of the data under more complicated models. In the first topic, we give the conditions for the consistency and the strong inconsistency of the CPD under the Linear Shift Model. This model reflects the effects of systematic biases and the random measurement errors. In the second topic, we compare the PAM and the DWD method using the Unbalanced Subgroup Model. Both methods are biased when the dimension goes to infinity. However, DWD is shown to be consistently more robust than PAM. We give the quantitative bias of them. Keywords: Microarray Batch Adjustment, Principal Component Analysis, Exploratory Data Analysis, High Dimension Low Sample Size Data Analysis, Data Discrimination Meth-ods, Distance Weighted Discrimination, Support Vector Machine, Predication Analysis of Microarray, High Dimension Asymptotics

Carolina Digital Repository

Effect of pooling samples on the efficiency of comparative studies using microarrays

Author: Agrawal
Churchill
Gastwirth
Jin
Kendziorski
Peng
Pounds
S.-D. Zhang
T. W. Gant
Publication venue: 'Oxford University Press (OUP)'
Publication date: 13/10/2005
Field of study

Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample-pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and exact results can be large. In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total numbers of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.Comment: 8 pages, 1 figure, 2 tables; to appear in Bioinformatic

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

Physico-chemical foundations underpinning microarray and next-generation sequencing experiments

Author: A. Buhot
A. E. Pozhitkov
A. Halperin
A. Harrison
A. Ott
Amend
B. M. Pettitt
Berger
Binder
Binder
Binder
Binder
Bolstad
Bullard
Burden
Burden
C. Gibas
C. J. Burden
Chou
Chou
Czypionka
D. P. Kreil
D. Tautz
E. Carlon
Fasold
Fasold
Fiche
Fuchs
H. Binder
Halperin
Harr
Harrison
He
Heim
Held
Hooyberghs
Huettel
Iltumur
Irizarry
Irizarry
Irizarry
Irving
J. Hooyberghs
Kane
L. J. Gamble
Lee
Lee
Letowski
Li
Liebich
Lockhart
Luebke
Marshall
Matveeva
Mueckstein
Mulders
Naiser
Naiser
P. A. Noble
Pingel
Pozhitkov
R. Levicky
Relogio
Rouillard
Tanaka
Trapp
Upton
Vainrub
Wodicka
Yu
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized

University of Essex Research Repository

Crossref

Hal - Université Grenoble Alpes

PubMed Central

Warwick Research Archives Portal Repository

The Australian National University

HAL-CEA

MPG.PuRe

Spatial normalization improves the quality of genotype calling for Affymetrix SNP 6.0 arrays

Author: Bailey Kent R
Chai High Seng
Kocher Jean-Pierre A
Therneau Terry M
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Microarray measurements are susceptible to a variety of experimental artifacts, some of which give rise to systematic biases that are spatially dependent in a unique way on each chip. It is likely that such artifacts affect many SNP arrays, but the normalization methods used in currently available genotyping algorithms make no attempt at spatial bias correction. Here, we propose an effective single-chip spatial bias removal procedure for Affymetrix 6.0 SNP arrays or platforms with similar design features. This procedure deals with both extreme and subtle biases and is intended to be applied before standard genotype calling algorithms. Results Application of the spatial bias adjustments on HapMap samples resulted in higher genotype call rates with equal or even better accuracy for thousands of SNPs. Consequently the normalization procedure is expected to lead to more meaningful biological inferences and could be valuable for genome-wide SNP analysis. Conclusions Spatial normalization can potentially rescue thousands of SNPs in a genetic study at the small cost of computational time. The approach is implemented in R and available from the authors upon request.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A framework for the informed normalization of printed microarrays

Author: Illing Nicola
Shen Arthur
van Heerden Johan
Walford Sally-Ann
Publication venue: Department of Molecular and Cell Biology
Publication date: 18/01/2016
Field of study

Microarray technology has become an essential part of contemporary molecular biological research. An aspect central to any microarray experiment is that of normalization, a form of data processing directed at removing technical noise while preserving biological meaning, thereby allowing for more accurate interpretations of data. The statistics underlying many normalization methods can appear overwhelming to microarray newcomers, a situation which is further compounded by a lack of accessible, non-statistical descriptions of common approaches to normalization. Normalization strategies significantly affect the analytical outcome of a microarray experiment, and consequently it is important that the statistical assumptions underlying normalization algorithms are understood and met before researchers embark upon the processing of raw microarray data. Many of these assumptions pertain only to whole-genome arrays, and are not valid for custom or directed microarrays. A thorough diagnostic evaluation of the nature and extent to which technical noise affects individual arrays is paramount to the success of any chosen normalization strategy. Here we suggest an approach to normalization based on extensive stepwise exploration and diagnostic assessment of data prior to, and after, normalization. Common data visualization and diagnostic approaches are highlighted, followed by descriptions of popular normalization methods, and the underlying assumptions they are based on, within the context of removing general technical artefacts associated with microarray data

Cape Town University OpenUCT