15 research outputs found
Incorporating Nonlinear Relationships in Microarray Missing Value Imputation
Microarray gene expression data often contain missing values. Accurate estimation of the missing values is important for down-stream data analyses that require complete data. Nonlinear relationships between gene expression levels have not been well-utilized in missing value imputation. We propose an imputation scheme based on nonlinear dependencies between genes. By simulations based on real microarray data, we show that incorporating non-linear relationships could improve the accuracy of missing value imputation, both in terms of normalized root mean squared error and in terms of the preservation of the list of significant genes in statistical testing. In addition, we studied the impact of artificial dependencies introduced by data normalization on the simulation results. Our results suggest that methods relying on global correlation structures may yield overly optimistic simulation results when the data has been subjected to row (gene) – wise mean removal
Incorporating Nonlinear Relationships in Microarray Missing Value Imputation
Microarray gene expression data often contain missing values. Accurate estimation of the missing values is important for down-stream data analyses that require complete data. Nonlinear relationships between gene expression levels have not been well-utilized in missing value imputation. We propose an imputation scheme based on nonlinear dependencies between genes. By simulations based on real microarray data, we show that incorporating non-linear relationships could improve the accuracy of missing value imputation, both in terms of normalized root mean squared error and in terms of the preservation of the list of significant genes in statistical testing. In addition, we studied the impact of artificial dependencies introduced by data normalization on the simulation results. Our results suggest that methods relying on global correlation structures may yield overly optimistic simulation results when the data has been subjected to row (gene) – wise mean removal
BUS Vignette
GOAL: The BUS package allows the computation of two types of similarities (correlation [Sokal, 2003] and mutual information [Cover, 2001]) for two different goals: (i) identification of the similarity among the activity of molecules sampled across different experiments (we name this option Unsupervised, U), (ii) identification of the similarity between such molecules and other types of information (clinical, anagraphical, etc, we name this optio
MeDiA: Mean Distance Association and Its Applications in Nonlinear Gene Set Analysis
<div><p>Probabilistic association discovery aims at identifying the association between random vectors, regardless of number of variables involved or linear/nonlinear functional forms. Recently, applications in high-dimensional data have generated rising interest in probabilistic association discovery. We developed a framework based on functions on the observation graph, named MeDiA (<u>M</u>ean <u>D</u>istance <u>A</u>ssociation). We generalize its property to a group of functions on the observation graph. The group of functions encapsulates major existing methods in association discovery, e.g. mutual information and Brownian Covariance, and can be expanded to more complicated forms. We conducted numerical comparison of the statistical power of related methods under multiple scenarios. We further demonstrated the application of MeDiA as a method of gene set analysis that captures a broader range of responses than traditional gene set analysis methods.</p></div
Network interaction for celiac disease pathways.
<p>Red edge indicates that the interaction between connected pathways are amplified in disease individuals. Blue edge indicates the interaction suppressed in disease individuals.</p
Gene sets associated with the two-dimensional clinical outcome based on MeDiA.
<p><sup>*</sup> Superscripts by the GO terms are for easy reference from the main text.</p><p>Gene sets associated with the two-dimensional clinical outcome based on MeDiA.</p
Random samples generated from independent bivariate normal distribution (left), and mixture bivariate normal distribution with ±0.8 covariates (right).
<p>The dashed lines connects two observations if they are nearest neighbors.</p
Comparison between the independent bivariate normal distribution and mixture normal distribution in Fig 1.
<p>Comparison between the independent bivariate normal distribution and mixture normal distribution in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0124620#pone.0124620.g001" target="_blank">Fig 1</a>.</p