17 research outputs found

    Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets.

    Get PDF
    Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation

    Gene expression across mammalian organ development

    Get PDF
    The evolution of gene expression in mammalian organ development remains largely uncharacterized. Here we report the transcriptomes of seven organs (cerebrum, cerebellum, heart, kidney, liver, ovary and testis) across developmental time points from early organogenesis to adulthood for human, macaque, mouse, rat, rabbit, opossum and chicken. Comparisons of gene expression patterns identified developmental stage correspondences across species, and differences in the timing of key events during the development of the gonads. We found that the breadth of gene expression and the extent of purifying selection gradually decrease during development, whereas the amount of positive selection and expression of new genes increase. We identified differences in the temporal trajectories of expression of individual genes across species, with brain tissues showing the smallest percentage of trajectory changes, and the liver and testis showing the largest. Our work provides a resource of developmental transcriptomes of seven organs across seven species, and comparative analyses that characterize the development and evolution of mammalian organs

    Developmental gene expression differences between humans and mammalian models

    Get PDF
    Identifying the molecular programs underlying human organ development and how they differ from model species is key for understanding human health and disease. Developmental gene expression profiles provide a window into the genes underlying organ development and a direct means to compare them across species. We use a transcriptomic resource covering the development of seven organs to characterize the temporal profiles of human genes associated with distinct disease classes and to determine, for each human gene, the similarity of its spatiotemporal expression with its orthologs in rhesus macaque, mouse, rat, and rabbit. We find clear associations between spatiotemporal profiles and the phenotypic manifestations of diseases. We also find that half of human genes differ from their mouse orthologs in their temporal trajectories in at least one of the organs. These include more than 200 genes associated with brain, heart, and liver disease for which mouse models should undergo extra scrutiny

    Drug-perturbation-based stratification of blood cancer

    Get PDF
    As new generations of targeted therapies emerge and tumor genome sequencing discovers increasingly comprehensive mutation repertoires, the functional relationships of mutations to tumor phenotypes remain largely unknown. Here, we measured ex vivo sensitivity of 246 blood cancers to 63 drugs alongside genome, transcriptome, and DNA methylome analysis to understand determinants of drug response. We assembled a primary blood cancer cell encyclopedia data set that revealed disease-specific sensitivities for each cancer. Within chronic lymphocytic leukemia (CLL), responses to 62% of drugs were associated with 2 or more mutations, and linked the B cell receptor (BCR) pathway to trisomy 12, an important driver of CLL. Based on drug responses, the disease could be organized into phenotypic subgroups characterized by exploitable dependencies on BCR, mTOR, or MEK signaling and associated with mutations, gene expression, and DNA methylation. Fourteen percent of CLLs were driven by mTOR signaling in a non-BCR-dependent manner. Multivariate modeling revealed immunoglobulin heavy chain variable gene (IGHV) mutation status and trisomy 12 as the most important modulators of response to kinase inhibitors in CLL. Ex vivo drug responses were associated with outcome. This study overcomes the perception that most mutations do not influence drug response of cancer, and points to an updated approach to understanding tumor biology, with implications for biomarker discovery and cancer care.Peer reviewe

    Multivariate Methods for Heterogeneous High-Dimensional Data in Genome Biology

    No full text
    Technological advances have transformed the scientific landscape by enabling comprehensive quantitative measurements, thereby increasingly facilitating data-driven research. This includes genome biology, where many data sets nowadays comprise a collection of heterogeneous high-dimensional data modalities, collected from different assays, tissues, organisms, time points or conditions. An important example are multi-omics data, i.e. data combining measurements from multiple biological layers. Jointly, such data promise to provide a better and more comprehensive understanding of biological processes and complex traits. A critical step to realize these promises is the development of statistical and computational methods that facilitate moving from the data to sound conclusions and biological insights. For this purpose, an integrative analysis that combines information from different data modalities is essential. In this thesis, we propose novel methods that provide a multivariate approach to data integration, and we apply them in the context of multi-omics studies in precision medicine and single cell biology. Given a collection of different data modalities on a set of samples, we aim at addressing two main questions: First, how can we obtain an (unbiased) overview of the main structures that are present in the data, both within and across data modalities? And second, how can we use all data to predict a response of interest and identify relevant features, whilst taking the heterogeneity of the features into account? The first question is important in all exploratory data analysis and leads us to unsupervised methods for data integration. Finding hidden structures in the data can give important insights into biological and technical sources of variation and yield an informative low-dimensional data representation. To this end, we introduce multi-table methods and latent factor models that can capture main axes of variation and co-variation in the data. Based on this, we present a novel factor method, multi-omics factor analysis (MOFA), to integrate information from different data modalities. By sparsity assumptions on the factor loadings, MOFA decomposes variation into axes present in all, some, or single modalities and promotes interpretable factors with a direct link to molecular drivers. MOFA combines a statistical model that accommodates different data types and missing data with a scalable inference algorithm, thereby ensuring a broad applicability. Once learnt, the factors enable a range of downstream analyses, including identification of sample subgroups, outlier detection and data imputation. We demonstrate its flexibility and potential to generate biological insight by applying MOFA to a multi-omics study on chronic lymphocytic leukaemia as well as a multi-omics single cell data set. The second question leads us to supervised methods that enable building predictive models and selecting features relevant for a response of interest. Reliable methods for this purpose would have far-reaching consequences in many applications. For example, it would be extremely useful for decisions in clinical care if treatment outcome or disease progression could be predicted from available molecular or clinical data. Furthermore, the identification of important molecular markers could give insights into underlying biological mechanisms and eventually open up new treatment options. For this purpose, we turn to penalized regression methods and, based on this, develop a method for penalized regression that takes into account additional information on the features to adapt the relative strength of penalization in a data-driven manner. Such additional information in form of external covariates is available in many applications and can for example encode structural knowledge on the data, e.g. different assay types, or provide information on a feature's variance, frequency or signal-to-noise ratio. We show that incorporating informative covariates can improve prediction performance in penalized regression, and we investigate the use of important covariates in genome biology such as the omics or tissue type

    The changing career paths of PhDs and postdocs trained at EMBL

    No full text
    Individuals with PhDs and postdoctoral experience in the life sciences can pursue a variety of career paths. Many PhD students and postdocs aspire to a permanent research position at a university or research institute, but competition for such positions has increased. Here, we report a time-resolved analysis of the career paths of 2284 researchers who completed a PhD or a postdoc at the European Molecular Biology Laboratory (EMBL) between 1997 and 2020. The most prevalent career outcome was Academia: Principal Investigator (636/2284=27.8% of alumni), followed by Academia: Other (16.8%), Science-related Non-research (15.3%), Industry Research (14.5%), Academia: Postdoc (10.7%) and Non-science-related (4%); we were unable to determine the career path of the remaining 10.9% of alumni. While positions in Academia (Principal Investigator, Postdoc and Other) remained the most common destination for more recent alumni, entry into Science-related Non-research, Industry Research and Non-science-related positions has increased over time, and entry into Academia: Principal Investigator positions has decreased. Our analysis also reveals information on a number of factors – including publication records – that correlate with the career paths followed by researchers
    corecore