6 research outputs found

    Identification of Genes Critical for Resistance to Infection by West Nile Virus Using RNA-Seq Analysis

    Get PDF
    The West Nile virus (WNV) is an emerging infection of biodefense concern and there are no available treatments or vaccines. Here we used a high-throughput method based on a novel gene expression analysis, RNA-Seq, to give a global picture of differential gene expression by primary human macrophages of 10 healthy donors infected in vitro with WNV. From a total of 28 million reads per sample, we identified 1,514 transcripts that were differentially expressed after infection. Both predicted and novel gene changes were detected, as were gene isoforms, and while many of the genes were expressed by all donors, some were unique. Knock-down of genes not previously known to be associated with WNV resistance identified their critical role in control of viral infection. Our study distinguishes both common gene pathways as well as novel cellular responses. Such analyses will be valuable for translational studies of susceptible and resistant individuals—and for targeting therapeutics—in multiple biological settings

    Differential expression and feature selection in the analysis of multiple omics studies

    Get PDF
    With the rapid advances of high-throughput technologies in the past decades, various kinds of omics data have been generated from many labs and accumulated in the public domain. These studies have been designed for different biological purposes, including the identification of differentially expressed genes, the selection of predictive biomarkers, etc. Effective meta-analysis of omics data from multiple studies can improve statistical power, accuracy and reproducibility of single study. This dissertation covered a few methods for differential expression (Chapter 2 and 3) and feature selection (Chapter 4) in the analysis of multiple omics studies. In Chapter 2, we proposed a full Bayesian hierarchical model for RNA-seq meta-analysis by modeling count data, integrating information across genes and across studies, and modeling differential signals across studies via latent variables. A Dirichlet process mixture prior was further applied on the latent variables to provide categorization of detected biomarkers according to their differential expression patterns across studies. We used both simulations and a real application on multiple brain region HIV-1 transgenic rats to demonstrate improved sensitivity, accuracy and biological findings of our method. In Chapter 3, we extended the previous Bayesian model to jointly integrate transcriptomic data from the two platforms: microarray and RNA-seq. In Chapter 4, we considered a general framework for variable screening with multiple omics studies and further proposed a novel two-step screening procedure for high-dimensional regression analysis in this framework. Compared to the one-step procedure and rank-based sure independence screening procedure, our procedure greatly reduced false negative errors while keeping a low false positive rate. Theoretically, we showed that our procedure possesses the sure screening property with weaker assumptions on signal strengths and allows the number of features to grow at an exponential rate of the sample size. Public health significance: The proposed methods are useful in detecting important biomarkers that are either differentially expressed or predictive of clinical outcomes. This is essential for searching for potential drug targets and understanding the disease mechanism. Such findings in basic science can be translated into preventive medicine or potential treatment for disease to promote human health and improve the global healthcare system

    RNA-Seq Analysis Strategies and Ethical Considerations Involved in Precision Medicine

    Get PDF
    RNA-Seq has become the most recently and widely accepted method to evaluate gene expression. Though with RNA-Seq being a fairly green technology, analytical methods for its output data have not been fully investigated as they have for preceding technology; such as those methods used in analyses of microarray data. This is likely the result of the potential breadth of information that can be obtained from the different applications of RNA-Seq. Analyses of RNA-Seq data include: detecting differentially expressed genes, transcriptome profiling, and interpretation of gene functions. As with any advanced technology medical or otherwise, the longer it is available, the price of the technology, in general, decreases and the technology itself becomes more refined. This has been true for genomic sequencing—costs per sample have continued to decrease; and the accuracy and precision of results has improved greatly. Synchronously, more physicians have opted to have more of their patients’ genetic material sequenced. This has caused both challenges in the development of accurate, efficient, and consistent statistical methods; and much debate regarding the ethics involved in genomic sequencing. To provide insight into two statistical challenges that are common with analyzing RNA-Seq data, we conduct extensive simulation studies. These simulations studies include: 1) investigation of fitting complex models which account for pairedness across subject’s measurements in terms of the power gained and control of Type I error rate; and 2) evaluation of clustering performance of various clustering methods in transformed RNA-Seq data. In addition to investigating the aforementioned statistical challenges, we develop a protocol for a survey study which has the potential to provide insight into cancer patients’ opinions towards genomic sequencing as there is much ethics related controversy that surrounds the topic

    Regulatory complexity in gene expression

    Get PDF
    The regulation of gene expression is the driver of cellular differentiation in multicellular organisms; the result is a diverse range of cell types each with their own unique profile of expression. Within these cell types the transcriptional product of a gene is up or down regulated in response to intrinsic and extrinsic stimuli according to its own regulatory programme encoded within the cell. The complexity of this regulatory programme depends on the requirements of the gene to change expression states in different cell lineages or temporally in response to a range of conditions. In the case of many housekeeping genes integral to the survival of the cell, this programme is simple - switch on the gene and leave it on, whereas often the required level and precision of regulatory control is much more involved and lends to subtle changes in expression. This raises many questions of precisely where and how that regulatory information is encoded and whether different biological systems encode it in the same way. This project attempts to answer these questions through the development of novel approaches in quantifying the output of this regulatory programme according to the state changes as observed from the expression profile of a given gene. Measures of complexity in gene expression are calculated over a wide range of cell types and conditions collected using CAGE, which provides a quantitative estimate of gene expression that precisely defines the promoter utilised to initiate that expression. As expected, housekeeping genes were found to be amongst the least complex, as a result of their uniform expression profiles, as well as those genes highly restricted in their expression. The genes most complex in their expression output were those associated with the presence of H3K27me3 repressive marks; genes poised for activation in a specific set of cell types, as well as those enriched in DNAse I hypersensitive sites in their upstream region but not necessarily conserved in that region. Evidence also suggests that different promoters associated with a gene contribute in different ways to its resultant regulatory complexity, suggesting that certain promoters may be more crucial in driving the regulation of some genes. This allows for the targeting of such promoters in the analysis of certain diseases implicated by changes in regulatory regions. Indeed, genes known to be associated with diseases such as leukaemia and Alzheimer’s are found to be highly complex in their expression

    Differential expression analysis for paired RNA-seq data

    Get PDF
    Background: RNA-Seq technology measures the transcript abundance by generating sequence reads and counting their frequencies across different biological conditions. To identify differentially expressed genes between two conditions, it is important to consider the experimental design as well as the distributional property of the data. In many RNA-Seq studies, the expression data are obtained as multiple pairs, e. g., pre-vs. post-treatment samples from the same individual. We seek to incorporate paired structure into analysis. Results: We present a Bayesian hierarchical mixture model for RNA-Seq data to separately account for the variability within and between individuals from a paired data structure. The method assumes a Poisson distribution for the data mixed with a gamma distribution to account variability between pairs. The effect of differential expression is modeled by two-component mixture model. The performance of this approach is examined by simulated and real data. Conclusions: In this setting, our proposed model provides higher sensitivity than existing methods to detect differential expression. Application to real RNA-Seq data demonstrates the usefulness of this method for detecting expression alteration for genes with low average expression levels or shorter transcript lengt
    corecore