3,397 research outputs found
Study of meta-analysis strategies for network inference using information-theoretic approaches
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Indeed, such analysis is usually more robust than the traditional approaches focused on individual datasets, which typically suffer from some experimental bias and a small number of samples.
To date, there are mainly two strategies for the problem of interest: the first one (”data merging”) merges all datasets together and then infers a GRN whereas the other (”networks ensemble”) infers GRNs from every dataset separately and then aggregates them using some ensemble rules (such as ranksum or weightsum). Unfortunately, a thorough comparison of these two approaches is lacking.
In this paper, we evaluate the performances of various metaanalysis approaches mentioned above with a systematic set of experiments based on in silico benchmarks. Furthermore, we present a new meta-analysis approach for inferring GRNs from multiple studies. Our proposed approach, adapted to methods based on pairwise measures such as correlation or mutual information, consists of two steps: aggregating matrices of the pairwise measures from every dataset followed by extracting the network from the meta-matrix.Peer ReviewedPostprint (author's final draft
A comparative study of different strategies of batch effect removal in microarray data: a case study of three datasets
Batch effects refer to the systematic non-biological variability that is introduced by experimental design and sample processing in microarray experiments. It is a common issue in microarray data and could introduce bias into the analysis, if ignored. Many batch effect removal methods have been developed. Previous comparative work has been focused on their effectiveness of batch effects removal and impact on downstream classification analysis. The most common type of analysis for microarray data is differential expression (DE) analysis, yet no study has examined the impact of these methods on downstream DE analysis, which identifies markers that are significantly associated with the outcome of interest. In this project, we investigated the performance of five popular batch effect removal methods, mean-centering, ComBat_p, ComBat_n, SVA, and ratio based methods, on batch effects reduction and their impact on DE analysis using three experimental datasets with different sources of batch effects. We found that the performance of these methods is data-dependent: simple mean-centering method performed reasonably well in all three datasets, but the more complicated algorithms such as ComBat method’s performance could be unstable for certain dataset and should be applied with caution. Given a new dataset, we recommend either using the mean-centering method or carefully investigating a few different batch removal methods and choosing the one that is the best for the data, if possible. This study has important public health significance because better handling of batch effect in microarray data can reduce biased results and lead to improved biomarker identification
New statistical tools for microarray data and comparison with existing tools
Microarray technologies have gained tremendous interest from researchers in recent years. The problem we are interested in is how to combine two microarray data, which have systematic batch differences. The reason for the combination is that the combined data set contains more samples which will give improved statistical power. This dissertation covers two topics about microarray batch adjustment. The first topic is about the visualization of paired High Dimension Low Sample Size (HDLSS) data. We propose two interesting directions: the Canonical Parallel and the Canonical Orthogonal Directions (CPD & COD). This pair of directions gives an insightful 2-d parallel view for understanding paired HDLSS data sets. The CPD can be used for adjusting the batch differences. An application to the NCI60 cell lines data shows good performance of this method. The second topic is about the comparison between three commonly used batch adjustment methods: the Support Vector Machine (SVM), the Distance Weighted Discrimination (DWD), and the Prediction Analysis of Microarray (PAM). We show that SVM has some serious problems for the HDLSS data. The DWD method is much more robust than PAM under the Unbalanced Subgroup Model. The mathematical studies made in this dissertation are in the area of HDLSS asymptotics, in the sense that the sample sizes are fixed and the dimension (the number of genes) goes to infinity. Hall et. al (2004) have studied the geometric structure of the data when the dimension is high. In this dissertation, we study the geometric structure of the data under more complicated models. In the first topic, we give the conditions for the consistency and the strong inconsistency of the CPD under the Linear Shift Model. This model reflects the effects of systematic biases and the random measurement errors. In the second topic, we compare the PAM and the DWD method using the Unbalanced Subgroup Model. Both methods are biased when the dimension goes to infinity. However, DWD is shown to be consistently more robust than PAM. We give the quantitative bias of them. Keywords: Microarray Batch Adjustment, Principal Component Analysis, Exploratory Data Analysis, High Dimension Low Sample Size Data Analysis, Data Discrimination Meth-ods, Distance Weighted Discrimination, Support Vector Machine, Predication Analysis of Microarray, High Dimension Asymptotics
Effect of pooling samples on the efficiency of comparative studies using microarrays
Many biomedical experiments are carried out by pooling individual biological
samples. However, pooling samples can potentially hide biological variance and
give false confidence concerning the data significance. In the context of
microarray experiments for detecting differentially expressed genes, recent
publications have addressed the problem of the efficiency of sample-pooling,
and some approximate formulas were provided for the power and sample size
calculations. It is desirable to have exact formulas for these calculations and
have the approximate results checked against the exact ones. We show that the
difference between the approximate and exact results can be large. In this
study, we have characterized quantitatively the effect of pooling samples on
the efficiency of microarray experiments for the detection of differential gene
expression between two classes. We present exact formulas for calculating the
power of microarray experimental designs involving sample pooling and technical
replications. The formulas can be used to determine the total numbers of arrays
and biological subjects required in an experiment to achieve the desired power
at a given significance level. The conditions under which pooled design becomes
preferable to non-pooled design can then be derived given the unit cost
associated with a microarray and that with a biological subject. This paper
thus serves to provide guidance on sample pooling and cost effectiveness. The
formulation in this paper is outlined in the context of performing microarray
comparative studies, but its applicability is not limited to microarray
experiments. It is also applicable to a wide range of biomedical comparative
studies where sample pooling may be involved.Comment: 8 pages, 1 figure, 2 tables; to appear in Bioinformatic
Physico-chemical foundations underpinning microarray and next-generation sequencing experiments
Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized
Spatial normalization improves the quality of genotype calling for Affymetrix SNP 6.0 arrays
<p>Abstract</p> <p>Background</p> <p>Microarray measurements are susceptible to a variety of experimental artifacts, some of which give rise to systematic biases that are spatially dependent in a unique way on each chip. It is likely that such artifacts affect many SNP arrays, but the normalization methods used in currently available genotyping algorithms make no attempt at spatial bias correction. Here, we propose an effective single-chip spatial bias removal procedure for Affymetrix 6.0 SNP arrays or platforms with similar design features. This procedure deals with both extreme and subtle biases and is intended to be applied before standard genotype calling algorithms.</p> <p>Results</p> <p>Application of the spatial bias adjustments on HapMap samples resulted in higher genotype call rates with equal or even better accuracy for thousands of SNPs. Consequently the normalization procedure is expected to lead to more meaningful biological inferences and could be valuable for genome-wide SNP analysis.</p> <p>Conclusions</p> <p>Spatial normalization can potentially rescue thousands of SNPs in a genetic study at the small cost of computational time. The approach is implemented in R and available from the authors upon request.</p
A framework for the informed normalization of printed microarrays
Microarray technology has become an essential part of contemporary molecular biological research. An aspect central to any microarray experiment is that of normalization, a form of data processing directed at removing technical noise while preserving biological meaning, thereby allowing for more accurate interpretations of data. The statistics underlying many normalization methods can appear overwhelming to microarray newcomers, a situation which is further compounded by a lack of accessible, non-statistical descriptions of common approaches to normalization. Normalization strategies significantly affect the analytical outcome of a microarray experiment, and consequently it is important that the statistical assumptions underlying normalization algorithms are understood and met before researchers embark upon the processing of raw microarray data. Many of these assumptions pertain only to whole-genome arrays, and are not valid for custom or directed microarrays. A thorough diagnostic evaluation of the nature and extent to which technical noise affects individual arrays is paramount to the success of any chosen normalization strategy. Here we suggest an approach to normalization based on extensive stepwise exploration and diagnostic assessment of data prior to, and after, normalization. Common data visualization and diagnostic approaches are highlighted, followed by descriptions of popular normalization methods, and the underlying assumptions they are based on, within the context of removing general technical artefacts associated with microarray data
- …