1,349 research outputs found

    Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices.

    Get PDF
    Quantitative genetic studies that model complex, multivariate phenotypes are important for both evolutionary prediction and artificial selection. For example, changes in gene expression can provide insight into developmental and physiological mechanisms that link genotype and phenotype. However, classical analytical techniques are poorly suited to quantitative genetic studies of gene expression where the number of traits assayed per individual can reach many thousand. Here, we derive a Bayesian genetic sparse factor model for estimating the genetic covariance matrix (G-matrix) of high-dimensional traits, such as gene expression, in a mixed-effects model. The key idea of our model is that we need consider only G-matrices that are biologically plausible. An organism's entire phenotype is the result of processes that are modular and have limited complexity. This implies that the G-matrix will be highly structured. In particular, we assume that a limited number of intermediate traits (or factors, e.g., variations in development or physiology) control the variation in the high-dimensional phenotype, and that each of these intermediate traits is sparse - affecting only a few observed traits. The advantages of this approach are twofold. First, sparse factors are interpretable and provide biological insight into mechanisms underlying the genetic architecture. Second, enforcing sparsity helps prevent sampling errors from swamping out the true signal in high-dimensional data. We demonstrate the advantages of our model on simulated data and in an analysis of a published Drosophila melanogaster gene expression data set

    Information retrieval in single cell chromatin analysis using TF-IDF transformation methods

    Full text link
    Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) assesses genome-wide chromatin accessibility in thousands of cells to reveal regulatory landscapes in high resolutions. However, the analysis presents challenges due to the high dimensionality and sparsity of the data. Several methods have been developed, including transformation techniques of term-frequency inverse-document frequency (TF-IDF), dimension reduction methods such as singular value decomposition (SVD), factor analysis, and autoencoders. Yet, a comprehensive study on the mentioned methods has not been fully performed. It is not clear what is the best practice when analyzing scATAC-seq data. We compared several scenarios for transformation and dimension reduction as well as the SVD-based feature analysis to investigate potential enhancements in scATAC-seq information retrieval. Additionally, we investigate if autoencoders benefit from the TF-IDF transformation. Our results reveal that the TF-IDF transformation generally leads to improved clustering and biologically relevant feature extraction.Comment: 6 pages, 4 figures, 3 tables. Accepted to the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM

    A novel patient-derived tumorgraft model with TRAF1-ALK anaplastic large-cell lymphoma translocation.

    Get PDF
    Although anaplastic large-cell lymphomas (ALCL) carrying anaplastic lymphoma kinase (ALK) have a relatively good prognosis, aggressive forms exist. We have identified a novel translocation, causing the fusion of the TRAF1 and ALK genes, in one patient who presented with a leukemic ALK+ ALCL (ALCL-11). To uncover the mechanisms leading to high-grade ALCL, we developed a human patient-derived tumorgraft (hPDT) line. Molecular characterization of primary and PDT cells demonstrated the activation of ALK and nuclear factor kB (NFkB) pathways. Genomic studies of ALCL-11 showed the TP53 loss and the in vivo subclonal expansion of lymphoma cells, lacking PRDM1/Blimp1 and carrying c-MYC gene amplification. The treatment with proteasome inhibitors of TRAF1-ALK cells led to the downregulation of p50/p52 and lymphoma growth inhibition. Moreover, a NFkB gene set classifier stratified ALCL in distinct subsets with different clinical outcome. Although a selective ALK inhibitor (CEP28122) resulted in a significant clinical response of hPDT mice, nevertheless the disease could not be eradicated. These data indicate that the activation of NFkB signaling contributes to the neoplastic phenotype of TRAF1-ALK ALCL. ALCL hPDTs are invaluable tools to validate the role of druggable molecules, predict therapeutic responses and implement patient specific therapies

    Automated Gene Classification using Nonnegative Matrix Factorization on Biomedical Literature

    Get PDF
    Understanding functional gene relationships is a challenging problem for biological applications. High-throughput technologies such as DNA microarrays have inundated biologists with a wealth of information, however, processing that information remains problematic. To help with this problem, researchers have begun applying text mining techniques to the biological literature. This work extends previous work based on Latent Semantic Indexing (LSI) by examining Nonnegative Matrix Factorization (NMF). Whereas LSI incorporates the singular value decomposition (SVD) to approximate data in a dense, mixed-sign space, NMF produces a parts-based factorization that is directly interpretable. This space can, in theory, be used to augment existing ontologies and annotations by identifying themes within the literature. Of course, performing NMF does not come without a price—namely, the large number of parameters. This work attempts to analyze the effects of some of the NMF parameters on both convergence and labeling accuracy. Since there is a dearth of automated label evaluation techniques as well as “gold standard” hierarchies, a method to produce “correct” trees is proposed as well as a technique to label trees and to evaluate those labels

    Non-coding genome contributions to the development and evolution of mammalian organs

    Get PDF
    Protein-coding sequences only cover 1-2% of a typical mammalian genome. The remaining non-coding space hides thousands of genomic elements, some of which act via their DNA sequence while others are transcribed into non-coding RNAs. Many well-characterized non-coding elements are involved in the regulation of other genes, a process essential for the emergence of different cell types and organs during development. Changes in the expression of conserved genes during development are in turn thought to facilitate evolutionary innovation in form and function. Thus, non-coding genomic elements are hypothesized to play important roles in developmental and evolutionary processes. However, challenges related to the identification and characterization of these elements, in particular in non-model organisms, has limited the study of their overall contributions to mammalian organ development and evolution. During my dissertation work, I addressed this gap by studying two major classes of non-coding elements, long non-coding RNAs (lncRNAs) and cis-regulatory elements (CREs). In the first part of my thesis, I analyzed the expression profiles of lncRNAs during the development of seven major organs in six mammals and a bird. I showed that, unlike protein-coding genes, only a small fraction of lncRNAs is expressed in reproducibly dynamic patterns during organ development. These lncRNAs are enriched for a series of features associated with functional relevance, including increased evolutionary conservation and regulatory complexity, highlighting them as candidates for further molecular characterization. I then associated these lncRNAs with specific genes and functions based on their spatiotemporal expression profiles. My analyses also revealed differences in lncRNA contributions across organs and developmental stages, identifying a developmental transition from broadly expressed and conserved lncRNAs towards an increasing number of lineage- and organ-specific lncRNAs. Following up on these global analyses, I then focused on a newly-identified lncRNA in the marsupial opossum, Female Specific on chromosome X (FSX). The broad and likely autonomous female-specific expression of FSX suggests a role in marsupial X-chromosome inactivation (XCI). I showed that FSX shares many expression and sequence features with another lncRNA, RSX — a known regulator of XCI in marsupials. Comparisons to other marsupials revealed that both RSX and FSX emerged in the common marsupial ancestor and have since been preserved in marsupial genomes, while their broad and female-specific expression has been retained for at least 76 million years of evolution. Taken together, my analyses highlighted FSX as a novel candidate for regulating marsupial XCI. In the third part of this work, I shifted my focus to CREs and their cell type-specific activities in the developing mouse cerebellum. After annotating cerebellar cell types and states based on single-cell chromatin accessibility data, I identified putative CREs and characterized their spatiotemporal activity across cell types and developmental stages. Focusing on progenitor cells, I described temporal changes in CRE activity that are shared between early germinal zones, supporting a model of cell fate induction through common developmental cues. By examining chromatin accessibility dynamics during neuronal differentiation, I revealed a gradual divergence in the regulatory programs of major cerebellar neuron types. In the final part, I explored the evolutionary histories of CREs and their potential contributions to gene expression changes between species. By comparing mouse CREs to vertebrate genomes and chromatin accessibility profiles from the marsupial opossum, I identified a temporal decrease in CRE conservation, which is shared across cerebellar cell types. However, I also found differences in constraint between cell types, with microglia having the fastest evolving CREs in the mouse cerebellum. Finally, I used deep learning models to study the regulatory grammar of cerebellar cell types in human and mouse, showing that the sequence rules determining CRE activity are conserved across mammals. I then used these models to retrace the evolutionary changes leading to divergent CRE activity between species. Collectively, my PhD work provides insights into the evolutionary dynamics of non-coding genes and regulatory elements, the processes associated with their conservation, and their contributions to the development and evolution of mammalian cell types and organs
    corecore