13 research outputs found

    RNA sequencing reveals two major classes of gene expression levels in metazoan cells

    Get PDF
    The expression level of a gene is often used as a proxy for determining whether the protein or RNA product is functional in a cell or tissue. Therefore, it is of fundamental importance to understand the global distribution of gene expression levels, and to be able to interpret it mechanistically and functionally. Here we use RNA sequencing of mouse Th2 cells, coupled with a range of other techniques, to show that all genes can be separated, based on their expression abundance, into two distinct groups: one group comprising of lowly expressed and putatively non-functional mRNAs, and the other of highly expressed mRNAs with active chromatin marks at their promoters

    Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

    Get PDF
    Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses

    Deconvolution of the Response to Bacillus Calmette–Guérin Reveals NF-κB-Induced Cytokines As Autocrine Mediators of Innate Immunity

    Get PDF
    Bacillus Calmette–Guérin (BCG) is used as a vaccine and diagnostic test for tuberculosis, as well as immunotherapy in the treatment of bladder cancer. While clinically useful, the response to mycobacterial stimulation is complex and the induced protein signature remains poorly defined. We characterized the cell types directly engaged by BCG, as well as the induced cytokine loops that transmit signal(s) to bystander cells. Standardized whole-blood stimulations and mechanistic studies on single and purified cell populations identified distinct patterns of activation in monocytes as compared to neutrophils and invariant lymphocyte populations. Deconvoluting the role of Toll-like receptor 2/4 and Dectin-1/2 in the inflammatory response to BCG, we revealed Dectin-1/2 as dominant in neutrophils as compared to monocytes, which equally engaged both pathways. Furthermore, we quantified the role of NF-κB and NADPH/reactive oxygen species (ROS)-dependent cytokines, which triggered a JAK1/2-dependent amplification loop and accounted for 40–50% of the induced response to BCG. In sum, this study provides new insight into the molecular and cellular pathways involved in the response to BCG, establishing the basis for a new generation of immunodiagnostic tools

    The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage

    Get PDF
    Background: We describe the genome of the western painted turtle, Chrysemys picta bellii, one of the most widespread, abundant, and well-studied turtles. We place the genome into a comparative evolutionary context, and focus on genomic features associated with tooth loss, immune function, longevity, sex differentiation and determination, and the species' physiological capacities to withstand extreme anoxia and tissue freezing.Results: Our phylogenetic analyses confirm that turtles are the sister group to living archosaurs, and demonstrate an extraordinarily slow rate of sequence evolution in the painted turtle. The ability of the painted turtle to withstand complete anoxia and partial freezing appears to be associated with common vertebrate gene networks, and we identify candidate genes for future functional analyses. Tooth loss shares a common pattern of pseudogenization and degradation of tooth-specific genes with birds, although the rate of accumulation of mutations is much slower in the painted turtle. Genes associated with sex differentiation generally reflect phylogeny rather than convergence in sex determination functionality. Among gene families that demonstrate exceptional expansions or show signatures of strong natural selection, immune function and musculoskeletal patterning genes are consistently over-represented.Conclusions: Our comparative genomic analyses indicate that common vertebrate regulatory networks, some of which have analogs in human diseases, are often involved in the western painted turtle's extraordinary physiological capacities. As these regulatory pathways are analyzed at the functional level, the painted turtle may offer important insights into the management of a number of human health disorders

    Development of Biclustering Techniques for Gene Expression Data Modeling and Mining

    Get PDF
    The next-generation sequencing technologies can generate large-scale biological data with higher resolution, better accuracy, and lower technical variation than the arraybased counterparts. RNA sequencing (RNA-Seq) can generate genome-scale gene expression data in biological samples at a given moment, facilitating a better understanding of cell functions at genetic and cellular levels. The abundance of gene expression datasets provides an opportunity to identify genes with similar expression patterns across multiple conditions, i.e., co-expression gene modules (CEMs). Genomescale identification of CEMs can be modeled and solved by biclustering, a twodimensional data mining technique that allows clustering of rows and columns in a gene expression matrix, simultaneously. Compared with traditional clustering that targets global patterns, biclustering can predict local patterns. This unique feature makes biclustering very useful when applied to big gene expression data since genes that participate in a cellular process are only active in specific conditions, thus are usually coexpressed under a subset of all conditions. The combination of biclustering and large-scale gene expression data holds promising potential for condition-specific functional pathway/network analysis. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-Seq data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, especially for scRNA-Seq data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. QUBIC2, a novel biclustering algorithm, is designed for large-scale bulk RNA-Seq and single-cell RNA-seq (scRNA-Seq) data analysis. Critical novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression; (ii) adopted the Gaussian mixture distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes; (iii) utilized a Dual strategy to expand the core biclusters, aiming to save dropouts from the background; and (iv) developed a statistical framework to evaluate the significances of all the identified biclusters. Method validation on comprehensive data sets suggests that QUBIC2 had superior performance in functional modules detection and cell type classification. The applications of temporal and spatial data demonstrated that QUBIC2 could derive meaningful biological information from scRNA-Seq data. Also presented in this dissertation is QUBICR. This R package is characterized by an 82% average improved efficiency compared to the source C code of QUBIC. It provides a set of comprehensive functions to facilitate biclustering-based biological studies, including the discretization of expression data, query-based biclustering, bicluster expanding, biclusters comparison, heatmap visualization of any identified biclusters, and co-expression networks elucidation. In the end, a systematical summary is provided regarding the primary applications of biclustering for biological data and more advanced applications for biomedical data. It will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency

    Stochastic gene expression during lineage specification of single T helper lymphocytes

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biological Engineering, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 118-125).The adaptive immune system is an extraordinarily diverse inventory comprised of highly specialized cells, the differentiation of which requires numerous lineage specifications at various developmental stages. The precise control of immune cell differentiation and the delicate balance of their population composition are crucial for effective protection against infectious environmental agents, without triggering autoimmune responses or allergies. It is therefore important to understand at the molecular level in individual cells how lineage commitment is regulated. I explored the heterogeneous gene expression during the lineage specification of single T helper cells, by quantitatively measuring mRNA and protein levels. I have discovered a paradigm of cell lineage specification governed by the signaling interplay between extracellular cues and intracellular transcriptional factors, where the strength of extracellular signaling dominates over the intracellular signaling components. In the presence of extracellular cues, T helper cells stochastically acquire any intermediate Thl/Th2 states. The states of T helper cells can be gradually tuned by depriving availability of extracellular cytokines, which are produced stochastically by a small subpopulation of cells. When extracellular cues are removed, the weak intracellular signaling network reveals its effect, leading to classic mutual exclusion of antagonistic transcriptional factors.by Miaoqing Fang.Ph.D