46 research outputs found

    Borrowing information across genes and experiments for improved error variance estimation in microarray data analysis and statistical inferences for gene expression heterosis

    Get PDF
    The advancement in microarray technology enables the simultaneous measurement of expression levels of thousands of genes. However, due to the relatively high cost of making a replicate in a microarray experiment, the number of replicates in a single experiment is typically small. This results in the small n, large p problem for statistical inferences, where there are gene expression measurements for many genes, but only a few biological replicates (or observations) for each gene. In this dissertation, we develop statistical models and methods for microarray data to borrow information across genes and/or even across experiments to improve statistical inferences for specific biological questions. In Chapter 2, we develop statistical methods to improve the estimation of gene expression error variances. Good estimation of error variances is crucial for detecting differentially expressed genes (genes that differ in mean expression level across treatments or conditions of interest). Since the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches. In Chapter 3, we develop statistical methods to improve the estimation and testing of gene expression heterosis. Heterosis, also known as the hybrid vigor, refers to the superior phenotype of the hybrid offspring relative to its two inbred parents. Though the heterosis phenomenon has been extensively utilized in agriculture for over a century, the molecular basis is still unknown. In an effort to understand the basic mechanisms responsible for the phenotypic heterosis at the molecular level, researchers have begun to compare expression levels of thousands of genes in the parental inbred lines and their offspring to find genes that exhibit gene expression heterosis. In our study, we focus on three types of gene expression heterosis: high-parent heterosis, low-parent heterosis and mid-parent heterosis. Currently, the sample average method is the most commonly used method for estimation and testing of gene expression heterosis. However, the sample average estimators underestimate high-parent heterosis and low-parent heterosis, which consequently leads to loss of power in hypothesis testing. Though the sample average estimator for mid-parent heterosis is unbiased, with only a few replicates in a typical microarray experiment, estimation is highly variable. To improve the estimation and testing of all three types of gene expression heterosis, we develop a hierarchical model, which permits information sharing across genes. Based on the model, we derive empirical Bayes estimators, and test gene expression heterosis using posterior probabilities. The effectiveness of our approach is demonstrated through simulations based on two real heterosis microarray experiments as well as hypothetical probability models that violate our model assumptions. Chapter 4 presents statistical analysis of a soil-based carbon sequestration experiment. Driven by global climate change due to the increasing level of atmospheric carbon dioxide, researchers have proposed a soil-based carbon sequestration approach. A soil-based carbon sequestration approach reduces carbon dioxide emission from crop residues after harvesting and sequesters more carbon into the land as a soil nutrient. Previous research has reported significant differences across species in their rates of residue decomposition and the amount of carbon dioxide emission. Because the biomass composition varies across maize genotypes, we hypothesize that there are also differences among genotypes within the maize species in their rates of biomass decomposition and abilities of carbon sequestration. We designed and performed a longitudinal experiment to measure the amount of carbon dioxide flux from crop stover samples of 14 maize varieties. Flux observations for more than 150 days were collected. We modeled the logarithm of carbon dioxide flux as a linear function of genotype, day, and genotype-by-day interaction effects as well as several other important fixed and random factors. The analysis results show significant differences among maize varieties with respect to the accumulated carbon dioxide flux from crop residues as well as flux pattern over time. We also investigate relationships of carbon dioxide emission and several potentially influential chemical compounds in the maize residue biomass composition. These results suggest the potential for development of carbon capturing crops through bioengineering or hybrid methods

    Estimation and Testing of Gene Expression Heterosis

    Get PDF
    Heterosis, also known as the hybrid vigor, occurs when the mean phenotype of hybrid offspring is superior to that of its two inbred parents. The heterosis phenomenon is extensively utilized in agriculture though the molecular basis is still unknown. In an effort to understand phenotypic heterosis at the molecular level, researchers have begun to compare expression levels of thousands of genes between parental inbred lines and their hybrid offspring to search for evidence of gene expression heterosis. Standard statistical approaches for separately analyzing expression data for each gene can produce biased and highly variable estimates and unreliable tests of heterosis. To address these shortcomings, we develop a hierarchical model to borrow information across genes. Using our modeling framework, we derive empirical Bayes estimators and an inference strategy to identify gene expression heterosis. Simulation results show that our proposed method outperforms the more traditional strategy used to detect gene expression heterosis. This article has supplementary material online

    Differential gene expression in response to eCry3.1Ab ingestion in an unselected and eCry3.1Abselected western corn rootworm (Diabrotica virgifera virgifera LeConte) population

    Get PDF
    Diabrotica virgifera virgifera LeConte, the western corn rootworm (WCR) is one of the most destructive pests in the U.S. Corn Belt. Transgenic maize lines expressing various Cry toxins from Bacillus thuringiensis have been adopted as a management strategy. However, resistance to many Bt toxins has occurred. To investigate the mechanisms of Bt resistance we carried out RNA-seq using Illumina sequencing technology on resistant, eCry3.1Ab-selected and susceptible, unselected, whole WCR neonates which fed on seedling maize with and without eCry3.1Ab for 12 and 24 hours. In a parallel experiment RNA-seq experiments were conducted when only the midgut of neonate WCR was evaluated from the same treatments. After de novo transcriptome assembly we identified differentially expressed genes (DEGs). Results from the assemblies and annotation indicate that WCR neonates from the eCry3.1Ab-selected resistant colony expressed a small number of up and down-regulated genes following Bt intoxication. In contrast, unselected susceptible WCR neonates expressed a large number of up and down-regulated transcripts in response to intoxication. Annotation and pathway analysis of DEGs between susceptible and resistant whole WCR and their midgut tissue revealed genes associated with cell membrane, immune response, detoxification, and potential Bt receptors which are likely related to eCry3.1Ab resistance. This research provides a framework to study the toxicology of Bt toxins and mechanism of resistance in WCR, an economically important coleopteran pest species

    Assessment of a Novel VEGF Targeted Agent Using Patient-Derived Tumor Tissue Xenograft Models of Colon Carcinoma with Lymphatic and Hepatic Metastases

    Get PDF
    The lack of appropriate tumor models of primary tumors and corresponding metastases that can reliably predict for response to anticancer agents remains a major deficiency in the clinical practice of cancer therapy. It was the aim of our study to establish patient-derived tumor tissue (PDTT) xenograft models of colon carcinoma with lymphatic and hepatic metastases useful for testing of novel molecularly targeted agents. PDTT of primary colon carcinoma, lymphatic and hepatic metastases were used to create xenograft models. Hematoxylin and eosin staining, immunohistochemical staining, genome-wide gene expression analysis, pyrosequencing, qRT-PCR, and western blotting were used to determine the biological stability of the xenografts during serial transplantation compared with the original tumor tissues. Early passages of the PDTT xenograft models of primary colon carcinoma, lymphatic and hepatic metastases revealed a high degree of similarity with the original clinical tumor samples with regard to histology, immunohistochemistry, genes expression, and mutation status as well as mRNA expression. After we have ascertained that these xenografts models retained similar histopathological features and molecular signatures as the original tumors, drug sensitivities of the xenografts to a novel VEGF targeted agent, FP3 was evaluated. In this study, PDTT xenograft models of colon carcinoma with lymphatic and hepatic metastasis have been successfully established. They provide appropriate models for testing of novel molecularly targeted agents

    Mu Transposon Insertion Sites and Meiotic Recombination Events Co-Localize with Epigenetic Marks for Open Chromatin across the Maize Genome

    Get PDF
    The Mu transposon system of maize is highly active, with each of the ∼50–100 copies transposing on average once each generation. The approximately one dozen distinct Mu transposons contain highly similar ∼215 bp terminal inverted repeats (TIRs) and generate 9-bp target site duplications (TSDs) upon insertion. Using a novel genome walking strategy that uses these conserved TIRs as primer binding sites, Mu insertion sites were amplified from Mu stocks and sequenced via 454 technology. 94% of ∼965,000 reads carried Mu TIRs, demonstrating the specificity of this strategy. Among these TIRs, 21 novel Mu TIRs were discovered, revealing additional complexity of the Mu transposon system. The distribution of >40,000 non-redundant Mu insertion sites was strikingly non-uniform, such that rates increased in proportion to distance from the centromere. An identified putative Mu transposase binding consensus site does not explain this non-uniformity. An integrated genetic map containing more than 10,000 genetic markers was constructed and aligned to the sequence of the maize reference genome. Recombination rates (cM/Mb) are also strikingly non-uniform, with rates increasing in proportion to distance from the centromere. Mu insertion site frequencies are strongly correlated with recombination rates. Gene density does not fully explain the chromosomal distribution of Mu insertion and recombination sites, because pronounced preferences for the distal portion of chromosome are still observed even after accounting for gene density. The similarity of the distributions of Mu insertions and meiotic recombination sites suggests that common features, such as chromatin structure, are involved in site selection for both Mu insertion and meiotic recombination. The finding that Mu insertions and meiotic recombination sites both concentrate in genomic regions marked with epigenetic marks of open chromatin provides support for the hypothesis that open chromatin enhances rates of both Mu insertion and meiotic recombination

    Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content

    Get PDF
    Following the domestication of maize over the past ∼10,000 years, breeders have exploited the extensive genetic diversity of this species to mold its phenotype to meet human needs. The extent of structural variation, including copy number variation (CNV) and presence/absence variation (PAV), which are thought to contribute to the extraordinary phenotypic diversity and plasticity of this important crop, have not been elucidated. Whole-genome, array-based, comparative genomic hybridization (CGH) revealed a level of structural diversity between the inbred lines B73 and Mo17 that is unprecedented among higher eukaryotes. A detailed analysis of altered segments of DNA conservatively estimates that there are several hundred CNV sequences among the two genotypes, as well as several thousand PAV sequences that are present in B73 but not Mo17. Haplotype-specific PAVs contain hundreds of single-copy, expressed genes that may contribute to heterosis and to the extraordinary phenotypic diversity of this important crop

    Maternal hyperleptinemia is associated with male offspring’s altered vascular function and structure in mice

    Get PDF
    Children of mothers with gestational diabetes have greater risk of developing hypertension but little is known about the mechanisms by which this occurs. The objective of this study was to test the hypothesis that high maternal concentrations of leptin during pregnancy, which are present in mothers with gestational diabetes and/or obesity, alter blood pressure, vascular structure and vascular function in offspring. Wildtype (WT) offspring of hyperleptinemic, normoglycemic, Lepr db/+ dams were compared to genotype matched offspring of WT-control dams. Vascular function was assessed in male offspring at 6, and at 31 weeks of age after half the offspring had been fed a high fat, high sucrose diet (HFD) for 6 weeks. Blood pressure was increased by HFD but not affected by maternal hyperleptinemia. On a standard diet, offspring of hyperleptinemic dams had outwardly remodeled mesenteric arteries and an enhanced vasodilatory response to insulin. In offspring of WT but not Leprdb/+ dams, HFD induced vessel hypertrophy and enhanced vasodilatory responses to acetylcholine, while HFD reduced insulin responsiveness in offspring of hyperleptinemic dams. Offspring of hyperleptinemic dams had stiffer arteries regardless of diet. Therefore, while maternal hyperleptinemia was largely beneficial to offspring vascular health under astandard diet, it had detrimental effects in offspring fed HFD. These results suggest that circulating maternal leptin concentrations may interact with other factors in the pre- and post-natal environments to contribute to altered vascular function in offspring of diabetic pregnancie

    Borrowing information across genes and experiments for improved error variance estimation in microarray data analysis and statistical inferences for gene expression heterosis

    Get PDF
    The advancement in microarray technology enables the simultaneous measurement of expression levels of thousands of genes. However, due to the relatively high cost of making a replicate in a microarray experiment, the number of replicates in a single experiment is typically small. This results in the "small n, large p" problem for statistical inferences, where there are gene expression measurements for many genes, but only a few biological replicates (or observations) for each gene. In this dissertation, we develop statistical models and methods for microarray data to borrow information across genes and/or even across experiments to improve statistical inferences for specific biological questions. In Chapter 2, we develop statistical methods to improve the estimation of gene expression error variances. Good estimation of error variances is crucial for detecting differentially expressed genes (genes that differ in mean expression level across treatments or conditions of interest). Since the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches. In Chapter 3, we develop statistical methods to improve the estimation and testing of gene expression heterosis. Heterosis, also known as the hybrid vigor, refers to the superior phenotype of the hybrid offspring relative to its two inbred parents. Though the heterosis phenomenon has been extensively utilized in agriculture for over a century, the molecular basis is still unknown. In an effort to understand the basic mechanisms responsible for the phenotypic heterosis at the molecular level, researchers have begun to compare expression levels of thousands of genes in the parental inbred lines and their offspring to find genes that exhibit gene expression heterosis. In our study, we focus on three types of gene expression heterosis: high-parent heterosis, low-parent heterosis and mid-parent heterosis. Currently, the sample average method is the most commonly used method for estimation and testing of gene expression heterosis. However, the sample average estimators underestimate high-parent heterosis and low-parent heterosis, which consequently leads to loss of power in hypothesis testing. Though the sample average estimator for mid-parent heterosis is unbiased, with only a few replicates in a typical microarray experiment, estimation is highly variable. To improve the estimation and testing of all three types of gene expression heterosis, we develop a hierarchical model, which permits information sharing across genes. Based on the model, we derive empirical Bayes estimators, and test gene expression heterosis using posterior probabilities. The effectiveness of our approach is demonstrated through simulations based on two real heterosis microarray experiments as well as hypothetical probability models that violate our model assumptions. Chapter 4 presents statistical analysis of a soil-based carbon sequestration experiment. Driven by global climate change due to the increasing level of atmospheric carbon dioxide, researchers have proposed a soil-based carbon sequestration approach. A soil-based carbon sequestration approach reduces carbon dioxide emission from crop residues after harvesting and sequesters more carbon into the land as a soil nutrient. Previous research has reported significant differences across species in their rates of residue decomposition and the amount of carbon dioxide emission. Because the biomass composition varies across maize genotypes, we hypothesize that there are also differences among genotypes within the maize species in their rates of biomass decomposition and abilities of carbon sequestration. We designed and performed a longitudinal experiment to measure the amount of carbon dioxide flux from crop stover samples of 14 maize varieties. Flux observations for more than 150 days were collected. We modeled the logarithm of carbon dioxide flux as a linear function of genotype, day, and genotype-by-day interaction effects as well as several other important fixed and random factors. The analysis results show significant differences among maize varieties with respect to the accumulated carbon dioxide flux from crop residues as well as flux pattern over time. We also investigate relationships of carbon dioxide emission and several potentially influential chemical compounds in the maize residue biomass composition. These results suggest the potential for development of "carbon capturing crops" through bioengineering or hybrid methods.</p

    Detecting differentially expressed genes for syndromes by considering change in mean and dispersion simultaneously

    No full text
    Abstract Background Using next-generation sequencing technology to measure gene expression, an empirically intriguing question concerns the identification of differentially expressed genes across treatment groups. Existing methods aim to identify genes whose mean expressions differ among treatment groups by assuming equal dispersion across all groups. For syndromes, however, various combinations of gene expression alterations can result in the same disease, leading to greater heteroscedasticity in the biological replicates in the disease group compared to the normal group. Traditional methods that only consider changes in the mean will fail to fully analyze gene expression in such a scenario. In addition, sequencing technology is relatively expensive; most labs can only afford a few replicates per treatment group, which poses further challenges to reliably estimating the mean and dispersion under each treatment condition. Results We designed an empirical Bayes method and a pooled permutation test to simultaneously consider the change in mean and dispersion across treatment groups. We further computed confidence intervals based on Bayes estimates to identify differentially expressed genes that are unique to each disease sample as well as those that are common across all disease samples. We illustrated our method by applying it to gene expression data from a large offspring syndrome experiment, which motivated this study. We compared our method to competing approaches through simulation studies that mimicked the real datasets to demonstrate the effectiveness of our proposed method. Conclusions We will show that, compared to popular methods that only aim to find the difference in the mean, our method can capture greater variation in the disease group to effectively identify differentially expressed genes for syndromes
    corecore