7 research outputs found

    Incorporating Nonlinear Relationships in Microarray Missing Value Imputation

    Get PDF
    Microarray gene expression data often contain missing values. Accurate estimation of the missing values is important for down-stream data analyses that require complete data. Nonlinear relationships between gene expression levels have not been well-utilized in missing value imputation. We propose an imputation scheme based on nonlinear dependencies between genes. By simulations based on real microarray data, we show that incorporating non-linear relationships could improve the accuracy of missing value imputation, both in terms of normalized root mean squared error and in terms of the preservation of the list of significant genes in statistical testing. In addition, we studied the impact of artificial dependencies introduced by data normalization on the simulation results. Our results suggest that methods relying on global correlation structures may yield overly optimistic simulation results when the data has been subjected to row (gene) โ€“ wise mean removal

    Gene Expression Analysis Methods on Microarray Data a A Review

    Get PDF
    In recent years a new type of experiments are changing the way that biologists and other specialists analyze many problems. These are called high throughput experiments and the main difference with those that were performed some years ago is mainly in the quantity of the data obtained from them. Thanks to the technology known generically as microarrays, it is possible to study nowadays in a single experiment the behavior of all the genes of an organism under different conditions. The data generated by these experiments may consist from thousands to millions of variables and they pose many challenges to the scientists who have to analyze them. Many of these are of statistical nature and will be the center of this review. There are many types of microarrays which have been developed to answer different biological questions and some of them will be explained later. For the sake of simplicity we start with the most well known ones: expression microarrays

    Meta-Analysis of Large-Scale Toxicogenomic Data Finds Neuronal Regeneration Related Protein and Cathepsin D to Be Novel Biomarkers of Drug-Induced Toxicity

    Get PDF
    Undesirable toxicity is one of the main reasons for withdrawing drugs from the market or eliminating them as candidates in clinical trials. Although numerous studies have attempted to identify biomarkers capable of predicting pharmacotoxicity, few have attempted to discover robust biomarkers that are coherent across various species and experimental settings. To identify such biomarkers, we conducted meta-analyses of massive gene expression profiles for 6,567 in vivo rat samples and 453 compounds. After applying rigorous feature reduction procedures, our analyses identified 18 genes to be related with toxicity upon comparisons of untreated versus treated and innocuous versus toxic specimens of kidney, liver and heart tissue. We then independently validated these genes in human cell lines. In doing so, we found several of these genes to be coherently regulated in both in vivo rat specimens and in human cell lines. Specifically, mRNA expression of neuronal regeneration-related protein was robustly down-regulated in both liver and kidney cells, while mRNA expression of cathepsin D was commonly up-regulated in liver cells after exposure to toxic concentrations of chemical compounds. Use of these novel toxicity biomarkers may enhance the efficiency of screening for safe lead compounds in early-phase drug development prior to animal testing.ope

    Deep Learning Based Approaches for Imputation of Time Series Models

    Get PDF
    Market price forecasting models for Fresh Produce (FP) are crucial to protect retailers and consumers from highly priced FP. However, utilizing the data for forecasting is obstructed by the occurrence of missing values. Therefore, it is imperative to develop models to determine the value for those missing instances thereby enabling effective forecasting. Usually this problem is tackled with conventional methods that introduce bias into the system which in turn results in unreliable forecasting results. Therefore, in this thesis, numerous imputation models are developed alongside a framework enabling the user to impute any time series data with the optimal models. This thesis also develops novel forecasting models which are used as a gauging mechanism for each tested imputation mode. However, those forecasting models can also be used as standalone models. The growth and success of deep learning has largely been attributed to the availability of big data and high end computational power along with the theoretical advancement . In this thesis, multiple deep learning models are built for imputing the missing values and also for forecasting. The data used in building these deep learning models comprise California weather data, California strawberry yield, California strawberry farm-gate prices, USA corn yield data, Brent oil type daily prices and a synthetic time series dataset. For imputation, mean squared error is used as an metric to gauge the performance of imputation whereas for forecasting a new aggregated error measure (AGM) is proposed in this thesis which combines mean absolute error, mean squared error and R2 which is the coefficient of determination. Different models are found to be optimal for different time series. These models are illustrated in the recommendation framework developed in the thesis. Different stacking ensemble techniques such as voting regressor and stacking ML ensemble are then utilized to have better imputation results. The experiments show that the voting regressor yields the best imputation results. To gauge the robustness of the imputation framework, different time series are assessed. The imputed data is used for forecasting and the forecasting results are compared with market deep and non-deep learning models. The results show the best imputation models recommended based on work with the synthesized datasets are in fact the best for the tested incomplete real datasets with Mean Absolute Scaled Error (MASE) <1 i.e. better than the naive forecasting model. Also, it is found that the best imputation models have higher impact on reducing the forecasting errors compared to other deep or non-deep imputation models found in literature and market

    Incorporating Nonlinear Relationships in Microarray Missing Value Imputation

    No full text
    corecore