106 research outputs found

    CELDA - an ontology for the comprehensive representation of cells in complex systems

    Get PDF
    BACKGROUND: The need for detailed description and modeling of cells drives the continuous generation of large and diverse datasets. Unfortunately, there exists no systematic and comprehensive way to organize these datasets and their information. CELDA (Cell: Expression, Localization, Development, Anatomy) is a novel ontology for the association of primary experimental data and derived knowledge to various types of cells of organisms. RESULTS: CELDA is a structure that can help to categorize cell types based on species, anatomical localization, subcellular structures, developmental stages and origin. It targets cells in vitro as well as in vivo. Instead of developing a novel ontology from scratch, we carefully designed CELDA in such a way that existing ontologies were integrated as much as possible, and only minimal extensions were performed to cover those classes and areas not present in any existing model. Currently, ten existing ontologies and models are linked to CELDA through the top-level ontology BioTop. Together with 15.439 newly created classes, CELDA contains more than 196.000 classes and 233.670 relationship axioms. CELDA is primarily used as a representational framework for modeling, analyzing and comparing cells within and across species in CellFinder, a web based data repository on cells (http://cellfinder.org). CONCLUSIONS: CELDA can semantically link diverse types of information about cell types. It has been integrated within the research platform CellFinder, where it exemplarily relates cell types from liver and kidney during development on the one hand and anatomical locations in humans on the other, integrating information on all spatial and temporal stages. CELDA is available from the CellFinder website: http://cellfinder.org/about/ontology

    CellFinder: a cell data repository

    Get PDF
    CellFinder (http://www.cellfinder.org) is a comprehensive one-stop resource for molecular data characterizing mammalian cells in different tissues and in different development stages. It is built from carefully selected data sets stemming from other curated databases and the biomedical literature. To date, CellFinder describes 3394 cell types and 50 951 cell lines. The database currently contains 3055 microscopic and anatomical images, 205 whole-genome expression profiles of 194 cell/tissue types from RNA-seq and microarrays and 553 905 protein expressions for 535 cells/tissues. Text mining of a corpus of >2000 publications followed by manual curation confirmed expression information on ∌900 proteins and genes. CellFinder's data model is capable to seamlessly represent entities from single cells to the organ level, to incorporate mappings between homologous entities in different species and to describe processes of cell development and differentiation. Its ontological backbone currently consists of 204 741 ontology terms incorporated from 10 different ontologies unified under the novel CELDA ontology. CellFinder's web portal allows searching, browsing and comparing the stored data, interactive construction of developmental trees and navigating the partonomic hierarchy of cells and tissues through a unique body browser designed for life scientists and clinicians

    BNO : An ontology for describing the behaviour of complex biomolecular networks

    Get PDF
    International audienceThe use of semantic technologies, such as ontologies, to describe and analyse biological systems is at the heart of systems biology. Indeed, understanding the behaviour of cells requires a large amount of context information. In this paper, we propose an ontology entitled ”Biomolecular Network ontology” using the OWL language. The BNO ontology standardises the terminology used by biologists experts to address issues including semantic behaviour representation, reasoning and knowledge sharing. The main benefit of this proposed ontology is the ability to reason about dynamical behaviour of complex biomolecular networks over time. We demonstrate our proposed ontology with a detailed example, the bacteriophage T4 gene 32 use case

    A foundation for ontology modularisation

    Get PDF
    There has been great interest in realising the Semantic Web. Ontologies are used to define Semantic Web applications. Ontologies have grown to be large and complex to the point where it causes cognitive overload for humans, in understanding and maintaining, and for machines, in processing and reasoning. Furthermore, building ontologies from scratch is time-consuming and not always necessary. Prospective ontology developers could consider using existing ontologies that are of good quality. However, an entire large ontology is not always required for a particular application, but a subset of the knowledge may be relevant. Modularity deals with simplifying an ontology for a particular context or by structure into smaller ontologies, thereby preserving the contextual knowledge. There are a number of benefits in modularising an ontology including simplified maintenance and machine processing, as well as collaborative efforts whereby work can be shared among experts. Modularity has been successfully applied to a number of different ontologies to improve usability and assist with complexity. However, problems exist for modularity that have not been satisfactorily addressed. Currently, modularity tools generate large modules that do not exclusively represent the context. Partitioning tools, which ought to generate disjoint modules, sometimes create overlapping modules. These problems arise from a number of issues: different module types have not been clearly characterised, it is unclear what the properties of a 'good' module are, and it is unclear which evaluation criteria applies to specific module types. In order to successfully solve the problem, a number of theoretical aspects have to be investigated. It is important to determine which ontology module types are the most widely-used and to characterise each such type by distinguishing properties. One must identify properties that a 'good' or 'usable' module meets. In this thesis, we investigate these problems with modularity systematically. We begin by identifying dimensions for modularity to define its foundation: use-case, technique, type, property, and evaluation metric. Each dimension is populated with sub-dimensions as fine-grained values. The dimensions are used to create an empirically-based framework for modularity by classifying a set of ontologies with them, which results in dependencies among the dimensions. The formal framework can be used to guide the user in modularising an ontology and as a starting point in the modularisation process. To solve the problem with module quality, new and existing metrics were implemented into a novel tool TOMM, and an experimental evaluation with a set of modules was performed resulting in dependencies between the metrics and module types. These dependencies can be used to determine whether a module is of good quality. For the issue with existing modularity techniques, we created five new algorithms to improve the current tools and techniques and experimentally evaluate them. The algorithms of the tool, NOMSA, performs as well as other tools for most performance criteria. For NOMSA's generated modules, two of its algorithms' generated modules are good quality when compared to the expected dependencies of the framework. The remaining three algorithms' modules correspond to some of the expected values for the metrics for the ontology set in question. The success of solving the problems with modularity resulted in a formal foundation for modularity which comprises: an exhaustive set of modularity dimensions with dependencies between them, a framework for guiding the modularisation process and annotating module, a way to measure the quality of modules using the novel TOMM tool which has new and existing evaluation metrics, the SUGOI tool for module management that has been investigated for module interchangeability, and an implementation of new algorithms to fill in the gaps of insufficient tools and techniques

    Computational approaches for single-cell omics and multi-omics data

    Get PDF
    Single-cell omics and multi-omics technologies have enabled the study of cellular heterogeneity with unprecedented resolution and the discovery of new cell types. The core of identifying heterogeneous cell types, both existing and novel ones, relies on efficient computational approaches, including especially cluster analysis. Additionally, gene regulatory network analysis and various integrative approaches are needed to combine data across studies and different multi-omics layers. This thesis comprehensively compared Bayesian clustering models for single-cell RNAsequencing (scRNA-seq) data and selected integrative approaches were used to study the cell-type specific gene regulation of uterus. Additionally, single-cell multi-omics data integration approaches for cell heterogeneity analysis were investigated. Article I investigated analytical approaches for cluster analysis in scRNA-seq data, particularly, latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP) models. The comparison of LDA and HDP together with the existing state-of-art methods revealed that topic modeling-based models can be useful in scRNA-seq cluster analysis. Evaluation of the cluster qualities for LDA and HDP with intrinsic and extrinsic cluster quality metrics indicated that the clustering performance of these methods is dataset dependent. Article II and Article III focused on cell-type specific integrative analysis of uterine or decidual stromal (dS) and natural killer (dNK) cells that are important for successful pregnancy. Article II integrated the existing preeclampsia RNA-seq studies of the decidua together with recent scRNA-seq datasets in order to investigate cell-type-specific contributions of early onset preeclampsia (EOP) and late onset preeclampsia (LOP). It was discovered that the dS marker genes were enriched for LOP downregulated genes and the dNK marker genes were enriched for upregulated EOP genes. Article III presented a gene regulatory network analysis for the subpopulations of dS and dNK cells. This study identified novel subpopulation specific transcription factors that promote decidualization of stromal cells and dNK mediated maternal immunotolerance. In Article IV, different strategies and methodological frameworks for data integration in single-cell multi-omics data analysis were reviewed in detail. Data integration methods were grouped into early, late and intermediate data integration strategies. The specific stage and order of data integration can have substantial effect on the results of the integrative analysis. The central details of the approaches were presented, and potential future directions were discussed.  Laskennallisia menetelmiĂ€ yksisolusekvensointi- ja multiomiikkatulosten analyyseihin Yksisolusekvensointitekniikat mahdollistavat solujen heterogeenisyyden tutkimuksen ennennĂ€kemĂ€ttömĂ€llĂ€ resoluutiolla ja uusien solutyyppien löytĂ€misen. Solutyyppien tunnistamisessa keskeisessĂ€ roolissa on ryhmittely eli klusterointianalyysi. Myös geenien sÀÀtelyverkostojen sekĂ€ eri molekyylidatatasojen yhdistĂ€minen on keskeistĂ€ analyysissĂ€. VĂ€itöskirjassa verrataan bayesilaisia klusterointimenetelmiĂ€ ja yhdistetÀÀn eri menetelmillĂ€ kerĂ€ttyjĂ€ tietoja kohdun solutyyppispesifisessĂ€ geeninsÀÀtelyanalyysissĂ€. LisĂ€ksi yksisolutiedon integraatiomenetelmiĂ€ selvitetÀÀn kattavasti. Julkaisu I keskittyy analyyttisten menetelmien, erityisesti latenttiin Dirichletallokaatioon (LDA) ja hierarkkiseen Dirichlet-prosessiin (HDP) perustuvien mallien tutkimiseen yksisoludatan klusterianalyysissĂ€. Kattava vertailu nĂ€iden kahden mallin sekĂ€ olemassa olevien menetelmien kanssa paljasti, ettĂ€ aihemallinnuspohjaiset menetelmĂ€t voivat olla hyödyllisiĂ€ yksisoludatan klusterianalyysissĂ€. Menetelmien suorituskyky riippui myös kunkin analysoitavan datasetin ominaisuuksista. Julkaisuissa II ja III keskitytÀÀn naisen lisÀÀntymisterveydelle tĂ€rkeiden kohdun stroomasolujen ja NK-immuunisolujen solutyyppispesifiseen analyysiin. Artikkelissa II yhdistettiin olemassa olevia tuloksia pre-eklampsiasta viimeisimpiin yksisolusekvensointituloksiin ja löydettiin varhain alkavan pre-eklampsian (EOP) ja myöhÀÀn alkavan pre-eklampsian (LOP) solutyyppispesifisiĂ€ vaikutuksia. Havaittiin, ettĂ€ erilaistuneen strooman markkerigeenien ilmentyminen vĂ€hentyi LOP:ssa ja NK-markkerigeenien ilmentyminen lisÀÀntyi EOP:ssa. Julkaisu III analysoi strooman ja NK-solujen alapopulaatiospesifisiĂ€ geeninsÀÀtelyverkostoja ja niiden transkriptiofaktoreita. Tutkimus tunnisti uusia alapopulaatiospesifisiĂ€ sÀÀtelijöitĂ€, jotka edistĂ€vĂ€t strooman erilaistumista ja NK-soluvĂ€litteistĂ€ immunotoleranssia Julkaisu IV tarkastelee yksityiskohtaisesti strategioita ja menetelmiĂ€ erilaisten yksisoludatatasojen (multi-omiikka) integroimiseksi. IntegrointimenetelmĂ€t ryhmiteltiin varhaisen, myöhĂ€isen ja vĂ€livaiheen strategioihin ja kunkin lĂ€hestymistavan menetelmiĂ€ esiteltiin tarkemmin. LisĂ€ksi keskusteltiin mahdollisista tulevaisuuden suunnista

    Integrative single-cell meta-analysis reveals disease-relevant vascular cell states and markers in human atherosclerosis

    Get PDF
    Coronary artery disease (CAD) is characterized by atherosclerotic plaque formation in the arterial wall. CAD progression involves complex interactions and phenotypic plasticity among vascular and immune cell lineages. Single-cell RNA-seq (scRNA-seq) studies have highlighted lineage-specific transcriptomic signatures, but human cell phenotypes remain controversial. Here, we perform an integrated meta-analysis of 22 scRNA-seq libraries to generate a comprehensive map of human atherosclerosis with 118,578 cells. Besides characterizing granular cell-type diversity and communication, we leverage this atlas to provide insights into smooth muscle cell (SMC) modulation. We integrate genome-wide association study data and uncover a critical role for modulated SMC phenotypes in CAD, myocardial infarction, and coronary calcification. Finally, we identify fibromyocyte/fibrochondrogenic SMC markers (LTBP1 and CRTAC1) as proxies of atherosclerosis progression and validate these through omics and spatial imaging analyses. Altogether, we create a unified atlas of human atherosclerosis informing cell state-specific mechanistic and translational studies of cardiovascular diseases.</p

    Unraveling disease mechanisms of different lung pathologies with single-cell RNA sequencing

    Get PDF
    The respiratory system is composed of different tissues with their respective cell types that together work in concert to perform air conductance and gas exchange. With the advent of single-cell RNA-sequencing (scRNA-seq), it is now possible to comprehensively interrogate the function of each individual cell in homeostatic and diseased states. In this dissertation, various roles of epithelial, mesenchymal, and immune cell types of the respiratory system in idiopathic pulmonary fibrosis (IPF) and corona virus disease 2019 (COVID-19) were investigated with scRNA-seq. IPF is a chronic interstitial lung disease characterized by the progressive scarring of the lung parenchyma. Previous studies that surveyed the cellular landscape of IPF lungs utilized explant lungs that reflect end-stage fibrosis. To uncover disease mechanisms of airway cell types in early-stage fibrosis, air-liquid interface (ALI) cultures of primary cells taken from newly diagnosed IPF patients were used. This identified proinflammatory epithelial cells, profibrotic basal cells, and primed fibroblasts as early-stage drivers of IPF. Treatment with antifibrotic compounds nintedanib, pirfenidone, and saracatinib fail to completely ameliorate the identified signatures. With the emergence of the COVID-19 pandemic and its extensive public health burden, it was imperative to understand the molecular mechanisms of viral entry and disease pathology to identify potential risk factors and therapeutic targets. In the early stages of the pandemic, viral entry factors ACE2, TMPRSS2, and FURIN were found to be expressed by a transient secretory cell type (differentiating from secretory to ciliated cell) of the airway mucosa and by alveolar type 2 cells of the alveolar epithelium. With further investigation of severe COVID-19, the early-stage of COVID-19 infection characterized itself with a hyperactivated immune response mediated by proinflammatory macrophages. On the other hand, late-stage COVID-19, especially those with acute respiratory distress syndrome (ARDS), was characterized by an accumulation of profibrotic macrophages and activated myofibroblasts that drove pulmonary scarring and fibrosis. Although IPF and COVID-19 are different diseases by their own right, they share a commonality in aberrant wound healing responses. Both diseases are characterized by tissue inflammation that is followed by a profibrotic phase. Unlike in IPF where the tissue remodeling is progressive and chronic, COVID-19 ARDS-associated fibrosis undergoes a resolution phase. Future studies comparing the cellular and transcriptional landscape of both conditions in early and late stages of disease will uncover pathogenic mechanisms and therapeutic targets of lung fibrosis. The application of high-resolution transcriptomic profiling techniques such as scRNA-seq permits the interrogation of individual cell types and their direct contribution to the development of diseases. Moreover, it allows the comparison and transfer of identified pathomechanisms across different pulmonary diseases and, in doing so, provides deeper and generalizable insights. As this field continues to evolve, it will undoubtedly continue to provide a deeper understanding of respiratory diseases.Das respiratorische System setzt sich aus verschiedenen Geweben und ihren zugrundeliegenden Zelltypen zusammen, die gemeinsam Luftaufnahme und Gasaustausch gewĂ€hrleisten. Mit dem Aufkommen der Einzelzell-RNA-Sequenzierung (scRNA-seq) ist es nun möglich, die Funktion jeder einzelnen Zelle in homöostatischen und kranken ZustĂ€nden umfassend zu untersuchen. In dieser Dissertation wurden verschiedene Rollen von Epithel-, Mesenchymal- und Immunzelltypen des Atmungssystems bei idiopathischer Lungenfibrose (IPF) und der Coronavirus-Krankheit-2019 (COVID-19) mit scRNA-seq untersucht. IPF ist eine chronische interstitielle Lungenerkrankung, die durch eine fortschreitende Vernarbung des Lungenparenchyms gekennzeichnet ist. FrĂŒhere Studien, die die Zellkomposition von IPF-Lungen untersuchten, verwendeten Lungenexplantate, die das Endstadium der Fibrose widerspiegeln. Um Krankheitsmechanismen von Atemwegszelltypen im FrĂŒhstadium der Fibrose aufzudecken, wurden Air-Liquid-Interface (ALI)-Kulturen von primĂ€ren Zellen verwendet, die frisch diagnostizierten IPF-Patienten entnommen wurden. Dabei wurden proinflammatorische Epithelzellen, profibrotische Basalzellen und aktivierte Fibroblasten als treibende KrĂ€fte im FrĂŒhstadium der IPF identifiziert. Die Behandlung mit den antifibrotischen Wirkstoffen Nintedanib, Pirfenidon und Saracatinib fĂŒhrte nicht zu einer vollstĂ€ndigen Verbesserung der identifizierten Signaturen. Mit dem Beginn der COVID-19-Pandemie und ihrer großen Belastung fĂŒr die öffentliche Gesundheit war es unerlĂ€sslich, die molekularen Mechanismen des Viruseintritts und der Krankheitspathologie zu verstehen, um potenzielle Risikofaktoren und therapeutische AnsĂ€tze zu identifizieren. In den frĂŒhen Stadien der Pandemie wurde festgestellt, dass die viralen Eintrittsfaktoren ACE2, TMPRSS2 und FURIN von einem vorĂŒbergehenden sekretorischen Zelltyp (der sich von sekretorischen zu ziliierten Zellen differenziert) der Atemwegsschleimhaut und von Typ-2 -Pneumozyten des Alveolarepithels exprimiert werden. Bei der weiteren Untersuchung von schweren COVID-19 VerlĂ€ufen zeigte sich, dass das FrĂŒhstadium der COVID-19-Infektion durch eine hyperaktivierte Immunantwort charakterisiert ist, die durch proinflammatorische Makrophagen vermittelt wird. Andererseits war das SpĂ€tstadium der COVID-19-Infektion, insbesondere bei Patienten mit akutem Atemnotsyndrom (ARDS), durch eine AnhĂ€ufung von profibrotischen Makrophagen und aktivierten Myofibroblasten gekennzeichnet, die die pulmonale Narbenbildung und Fibrose vorantrieben. Obwohl es sich bei IPF und COVID-19 um unterschiedliche Krankheiten handelt, Ă€hneln sie sich in ihrer gestörten Wundheilung. Beide Krankheiten sindS durch eine GewebeentzĂŒndung gekennzeichnet, auf die eine profibrotische Phase folgt. Im Gegensatz zur IPF, bei der die GewebeverĂ€nderung fortschreitend und chronisch ist, durchlĂ€uft die COVID-19 ARDS-assoziierte Fibrose eine Reparationsphase. ZukĂŒnftige Studien, die die zellulĂ€re und transkriptionelle Landschaft beider Erkrankungen in frĂŒhen und spĂ€ten Stadien vergleichen, werden pathogene Mechanismen und therapeutische AnsĂ€tze der Lungenfibrose aufdecken können. Die Anwendung hochauflösender transkriptomischer Sequenzierung wie scRNA-seq ermöglicht die Untersuchung einzelner Zelltypen und ihren Beitrag zur Entstehung von Krankheiten. DarĂŒber hinaus ermöglicht sie den Vergleich und die Übertragbarkeit identifizierter Pathomechanismen ĂŒber verschiedene Lungenkrankheiten hinweg und liefert so tiefere und generalisierbare Erkenntnisse. Da sich dieses Feld stetig weiter entwickelt, wird es zweifellos auch weiterhin zu einem tieferen VerstĂ€ndnis von Atemwegserkrankungen beitragen

    A Semantic Basis for Meaning Construction in Constructivist Interactions

    Get PDF
    • 

    corecore