55 research outputs found

    Suuremahuliste andmete kasutamine geenidevaheliste seoste leidmiseks

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.Geenid määravad ära, millistest RNA ja valgu molekulidest elusorganism koosneb. Ainult geenide tuvastamisest ei piisa, et aru saada kuidas organism toimib, millal ja kuidas erinevad geenide produktid avalduvad ja mida need teevad. Elusorganismi olemuse mõistmiseks ja bioloogiliste protsesside mõjutamiseks on vajalik aru saada geenide ja valkude omavahelistest seostest. Suure läbilaskevõimega tehnoloogiad võimaldavad hõlpsasti mõõta bioloogiliste protsesside erinevaid tahke. See omakorda on toonud kaasa andmemahtude üha kiireneva kasvutrendi ning vajaduse uute meetodite järele, mis aitaks toorandmeid analüüsida, andmeid omavahel kombineerida ning tulemusi visualiseerida. Samuti on kasvanud vajadus arvutuslike meetoditega katsetada, kas olemasolevad andmemudelid kirjeldavad bioloogilist uurimisobjekti piisavalt täpselt. Käesolevas uurimistöös on näidatud erinevaid bioinformaatilisi meetodeid, kuidas suuremahuliste ning eritüübiliste eksperimentaalsete andmete kombineerimist saab rakendada geenidevaheliste seoste leidmiseks. Suuremahulistele andmetele on integreerimise ja omavahel võrreldavaks tegemisega võimalik anda lisaväärtust. Töö käigus koondati kokku ja tehti avalikkusele ligipääsetavaks embrüonaalsete tüvirakkude regulatsiooni käsitlevate publikatsioonide lisafailides avaldatud info ESCDb andmebaasi näol. Neid andmeid kasutades on teadlaskonnal võimalik leida geenide vahelisi seoseid, mida eraldiseisvaid andmeid analüüsides ei ole võimalik välja selgitada. Andmebaasi kogutud info kombineerimisel arvutusliku mudeldamisega õnnestus leida käesoleva töö raames uus regulaator embrüonaalsetes tüvirakkudes — IL11. Lisaks võimaldas erinevate andmetüüpide kombineerimine leida embrüonaalsete tüvirakkude keskse regulaatori — OCT4 geeni alternatiivsed märklaudgeenide moodulid. Kasutades DNA konserveerumisinfot koos regulatoorsete motiivide analüüsiga leiti kolm uut rasvatüvirakkude diferentseerumise regulaatorvalku. Samuti käsitletakse töös automaatset grupeerimis- ja visualiseerimismetoodikat VisHiC, mis aitab esile tõsta huvitavaid geenigruppe, mida teiste meetoditega edasi uurida. Töös on näidatud erinevaid suuremahuliste andmestike integreerimise viise, mis võimaldavad leida selliseid geenidevahelisi seoseid, mida ei oleks võimalik leida kui analüüsiksime üht andmestikku korraga.In order to understand the basic principles of how organisms function, and to be able to affect the biological processes, we need to understand relationships between genes and proteins. Modern high-throughput technology enables to study different sides of biological processes in a rapid manner. This, however, has led to a steady growth of amount of data available. The need for more sophisticated methods for analysing raw data, for combining different data sources, and to visualise the results, has emerged. Additionally, computational modeling is required to test if our understanding of biological processes is supported by the available data. A variety of bioinformatics methods are used to demonstrate how to combine different type of high-throughput data for identifying relationships between genes. Furthermore, it was shown that through combining various data types from different sources adds value to already published data. In the thesis, data from publications about embryonic stem cell regulation were collected together and made available through Embryonic Stem Cell Database (ESCDb). Complementary data in the database allows researchers to find relationships between genes that would not be possible when analysing only one dataset at a time. One of the main findings of this study illustrates how using computational modelling on data from the ESCDb allowed to find a novel pluripotency regulator — IL11. Additionally, integration of different data types led to identification of alternative gene regulatory modules of core pluripotency regulator OCT4. Similarly, combination of conservation data and regulatory motif analysis led to identification of three new regulators of adipocyte differentiation. This thesis also covers innovative methodology, VisHiC, for automatic identification and visualisation of functionally related gene sets. This methodology allows to find relevant gene sets for further characterisation from large high-throughput datasets. This doctoral thesis demonstrates that integration of different high-throughput datasets enables establishing gene-gene relationships that would not be possible when looking at a single data type in isolation

    Using bio.tools to generate and annotate workbench tool descriptions

    Get PDF
    Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata

    Autoantibody Repertoire in APECED Patients Targets Two Distinct Subgroups of Protiens

    Get PDF
    High titer autoantibodies produced by B lymphocytes are clinically important features of many common autoimmune diseases. APECED patients with deficient autoimmune regulator (AIRE) gene collectively display a broad repertoire of high titer autoantibodies, including some which are pathognomonic for major autoimmune diseases. AIRE deficiency severely reduces thymic expression of gene-products ordinarily restricted to discrete peripheral tissues, and developing T cells reactive to those gene-products are not inactivated during their development. However, the extent of the autoantibody repertoire in APECED and its relation to thymic expression of self-antigens are unclear. We here undertook a broad protein array approach to assess autoantibody repertoire in APECED patients. Our results show that in addition to shared autoantigen reactivities, APECED patients display high inter-individual variation in their autoantigen profiles, which collectively are enriched in evolutionarily conserved, cytosolic and nuclear phosphoproteins. The APECED autoantigens have two major origins; proteins expressed in thymic medullary epithelial cells and proteins expressed in lymphoid cells. These findings support the hypothesis that specific protein properties strongly contribute to the etiology of B cell autoimmunity.Peer reviewe

    Comprehensive transcriptome analysis of mouse embryonic stem cell adipogenesis unravels new processes of adipocyte development

    Get PDF
    International audienceBACKGROUND: The current epidemic of obesity has caused a surge of interest in the study of adipose tissue formation. While major progress has been made in defining the molecular networks that control adipocyte terminal differentiation, the early steps of adipocyte development and the embryonic origin of this lineage remain largely unknown. RESULTS: Here we performed genome-wide analysis of gene expression during adipogenesis of mouse embryonic stem cells (ESCs). We then pursued comprehensive bioinformatic analyses, including de novo functional annotation and curation of the generated data within the context of biological pathways, to uncover novel biological functions associated with the early steps of adipocyte development. By combining in-depth gene regulation studies and in silico analysis of transcription factor binding site enrichment, we also provide insights into the transcriptional networks that might govern these early steps. CONCLUSIONS: This study supports several biological findings: firstly, adipocyte development in mouse ESCs is coupled to blood vessel morphogenesis and neural development, just as it is during mouse development. Secondly, the early steps of adipocyte formation involve major changes in signaling and transcriptional networks. A large proportion of the transcription factors that we uncovered in mouse ESCs are also expressed in the mouse embryonic mesenchyme and in adipose tissues, demonstrating the power of our approach to probe for genes associated with early developmental processes on a genome-wide scale. Finally, we reveal a plethora of novel candidate genes for adipocyte development and present a unique resource that can be further explored in functional assays

    Longitudinal proteomic profiling reveals increased early inflammation and sustained apoptosis proteins in severe COVID-19

    Get PDF
    SARS-CoV-2 infection has a risk to develop into life-threatening COVID-19 disease. Whereas age, hypertension, and chronic inflammatory conditions are risk factors, underlying host factors and markers for disease severity, e.g. requiring intensive care unit (ICU) treatment, remain poorly defined. To this end, we longitudinally profiled blood inflammation markers, antibodies, and 101 plasma proteins of hospitalized COVID-19 patients who did or did not require ICU admission. While essentially all patients displayed SARS-CoV-2-specific antibodies and virus-neutralization capacity within 12-15 days, a rapid, mostly transient upregulation of selective inflammatory markers including IL-6, CXCL10, CXCL11, IFN gamma, IL-10, and monocyte-attracting CCL2, CCL7 and CCL8, was particularly evident in ICU patients. In addition, there was consistent and sustained upregulation of apoptosis-associated proteins CASP8, TNFSF14, HGF, and TGFB1, with HGF discriminating between ICU and non-ICU cohorts. Thus, COVID-19 is associated with a selective inflammatory milieu within which the apoptotic pathway is a cardinal feature with potential to aid risk-based patient stratification.Peer reviewe

    g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments

    Get PDF
    g:Profiler (http://biit.cs.ut.ee/gprofiler/) is a public web server for characterising and manipulating gene lists resulting from mining high-throughput genomic data. g:Profiler has a simple, user-friendly web interface with powerful visualisation for capturing Gene Ontology (GO), pathway, or transcription factor binding site enrichments down to individual gene levels. Besides standard multiple testing corrections, a new improved method for estimating the true effect of multiple testing over complex structures like GO has been introduced. Interpreting ranked gene lists is supported from the same interface with very efficient algorithms. Such ordered lists may arise when studying the most significantly affected genes from high-throughput data or genes co-expressed with the query gene. Other important aspects of practical data analysis are supported by modules tightly integrated with g:Profiler. These are: g:Convert for converting between different database identifiers; g:Orth for finding orthologous genes from other species; and g:Sorter for searching a large body of public gene expression data for co-expression. g:Profiler supports 31 different species, and underlying data is updated regularly from sources like the Ensembl database. Bioinformatics communities wishing to integrate with g:Profiler can use alternative simple textual outputs

    eQTL Catalogue 2023: New datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs

    Get PDF
    The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases

    Genome-wide promoter analysis of histone modifications in human monocyte-derived antigen presenting cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Monocyte-derived macrophages and dendritic cells (DCs) are important in inflammatory processes and are often used for immunotherapeutic approaches. Blood monocytes can be differentiated into macrophages and DCs, which is accompanied with transcriptional changes in many genes, including chemokines and cell surface markers.</p> <p>Results</p> <p>To study the chromatin modifications associated with this differentiation, we performed a genome wide analysis of histone H3 trimethylation on lysine 4 (H3K4me3) and 27 (H3K27me3) as well as acetylation of H3 lysines (AcH3) in promoter regions. We report that both H3K4me3 and AcH3 marks significantly correlate with transcriptionally active genes whereas H3K27me3 mark is associated with inactive gene promoters. During differentiation, the H3K4me3 levels decreased on monocyte-specific CD14, CCR2 and CX3CR1 but increased on DC-specific TM7SF4/DC-STAMP, TREM2 and CD209/DC-SIGN genes. Genes associated with phagocytosis and antigen presentation were marked by H3K4me3 modifications. We also report that H3K4me3 levels on clustered chemokine and surface marker genes often correlate with transcriptional activity.</p> <p>Conclusion</p> <p>Our results provide a basis for further functional correlations between gene expression and histone modifications in monocyte-derived macrophages and DCs.</p

    A framework to assess the quality and impact of bioinformatics training across ELIXIR.

    Get PDF
    ELIXIR is a pan-European intergovernmental organisation for life science that aims to coordinate bioinformatics resources in a single infrastructure across Europe; bioinformatics training is central to its strategy, which aims to develop a training community that spans all ELIXIR member states. In an evidence-based approach for strengthening bioinformatics training programmes across Europe, the ELIXIR Training Platform, led by the ELIXIR EXCELERATE Quality and Impact Assessment Subtask in collaboration with the ELIXIR Training Coordinators Group, has implemented an assessment strategy to measure quality and impact of its entire training portfolio. Here, we present ELIXIR's framework for assessing training quality and impact, which includes the following: specifying assessment aims, determining what data to collect in order to address these aims, and our strategy for centralised data collection to allow for ELIXIR-wide analyses. In addition, we present an overview of the ELIXIR training data collected over the past 4 years. We highlight the importance of a coordinated and consistent data collection approach and the relevance of defining specific metrics and answer scales for consortium-wide analyses as well as for comparison of data across iterations of the same course
    corecore