27 research outputs found

    Probabilistic analysis of the human transcriptome with side information

    Get PDF
    Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure

    A novel atlas of gene expression in human skeletal muscle reveals molecular changes associated with aging

    Get PDF
    Background: Although high-throughput studies of gene expression have generated large amounts of data, most of which is freely available in public archives, the use of this valuable resource is limited by computational complications and non-homogenous annotation. To address these issues, we have performed a complete re-annotation of public microarray data from human skeletal muscle biopsies and constructed a muscle expression compendium consisting of nearly 3000 samples. The created muscle compendium is a publicly available resource including all curated annotation. Using this data set, we aimed to elucidate the molecular mechanism of muscle aging and to describe how physical exercise may alleviate negative physiological effects. Results: We find 957 genes to be significantly associated with aging (p <0.05, FDR = 5 %, n = 361). Aging was associated with perturbation of many central metabolic pathways like mitochondrial function including reduced expression of genes in the ATP synthase, NADH dehydrogenase, cytochrome C reductase and oxidase complexes, as well as in glucose and pyruvate processing. Among the genes with the strongest association with aging were H3 histone, family 3B (H3F3B, p = 3.4 x 10(-13)), AHNAK nucleoprotein, desmoyokin (AHNAK, p = 6.9 x 10(-12)), and histone deacetylase 4 (HDAC4, p = 4.0 x 10(-9)). We also discover genes previously not linked to muscle aging and metabolism, such as fasciculation and elongation protein zeta 2 (FEZ2, p = 2.8 x 10(-8)). Out of the 957 genes associated with aging, 21 (p <0.001, false discovery rate = 5 %, n = 116) were also associated with maximal oxygen consumption (VO2MAX). Strikingly, 20 out of those 21 genes are regulated in opposite direction when comparing increasing age with increasing VO2MAX. Conclusions: These results support that mitochondrial dysfunction is a major age-related factor and also highlight the beneficial effects of maintaining a high physical capacity for prevention of age-related sarcopenia.Peer reviewe

    Analyzing Acute Myeloid Leukemia by RNA-sequencing

    Get PDF
    Bulk and single cell RNA sequencing have revolutionized biomedical research and empower researchers to quantify the global gene expression of populations and single cells to further understand the development, manifestation and the treatment of diseases like cancer. Acute myeloid leukemia (AML), a cancer of the myeloid line of blood cells, could benefit from these technologies as relapse and mortality rates remain high despite the extensive research conducted over several decades. This is partly because AML is a heterogeneous disease, differing substantially between patients and hence requiring more fine-grained classifications and specialised treatment strategies, for example by incorporating expression profiles. In addition, single cell RNA sequencing (scRNA-seq) can resolve genetic and epigenetic subclonal structures within a patient to improve understanding and treatment of AML. However, improving and adapting RNA-seq technologies is still often necessary to efficiently and reliably obtain expression profiles, especially from small or suboptimally processed samples. To this end, we developed a bulk RNA-seq protocol, which copes with the major challenges of limited sample quantities, different sample types, throughput and costs and subsequently applied this method to further understand the subclonal structures in AML. We were able to characterize a plastic cell state of AML cells that is defined by increased stemness and dormancy and could influence treatment outcome and relapse. For this, we isolated non-dividing AML cells based on a proliferation-sensitive dye from patient derived xenograft (PDX) models of two AML patients. We found that these cells have low levels of cell cycle genes confirming dormancy, and additionally had similar expression patterns to previously described dormant minimal residual disease (MRD) cells in lymphoblastic leukemia (ALL). This included high expression levels of cell adhesion molecules, potentially reflecting the persistence of dormant AML and ALL cells in the hematopoietic niche. Lastly, we could show that resting and cycling AML cells can transition between these two states, indicating that dormancy might be a general property of AML cells and not depend on particular genetic subclones. In a second project, we optimized a single cell RNA-seq technology. We used a systematic approach to evaluate experimental conditions of SCRB-seq, a powerful and efficient scRNA-seq method. Focussing on reverse transcription, arguably the most important and inefficient reaction, , we used a standardized human RNA (UHRR) and systematically tested nine different RT enzymes, several reaction enhancers and primer compositions to increase sensitivity. We found that Maxima H- showed the highest sensitivity and that molecular crowding using poylethylene glycol (PEG) could increase the efficiency of the reaction significantly. Together with several smaller changes in the workflow, primer design and PCR conditions, we developed mcSCRB-seq (molecular crowding SCRB-seq). We verified the 2.5x increase in sensitivity using mES cells in a side by side test between SCRB-seq and mcSCRB-seq, and further found mcSCRB-seq to be amongst the most sensitive methods using artificial RNA spike in molecules (ERCCS). Lastly, since method comparisons between studies suffer from missing accuracy due to batch effects and external factors, we participated in a complex scRNA-seq benchmark study aiming to provide a fair comparison between methods concerning sensitivity, accuracy and applicability for building expression atlases. In contrast to before, we found that in this particular setting, mcSCRB-seq did not perform well and ídentified fields for further improvement. In conclusion, my work described in this thesis not only contributes towards a deeper understanding of the emergence and progression of AML but also towards the development of experimental bulk and single-cell RNA sequencing methods, improving their widespread application to biomedical problems such as leukemia

    Measuring primate gene expression evolution using high throughput transcriptomics and massively parallel reporter assays

    Get PDF
    A key question in biology is how one genome sequence can lead to the great cellular diversity present in multicellular organisms. Enabled by he sequencing revolution, RNA sequencing (RNA-seq) has emerged as a central tool to measure transcriptome-wide gene expression levels. More recently, single cell RNA-seq was introduced and is becoming a feasible alternative to the more established bulk sequencing. While many different methods have been proposed, a thorough optimisation of established protocols can lead to improvements in robustness, sensitivity, scalability and cost effectiveness. Towards this goal, I have contributed to optimizing the single cell RNA-seq method "Single Cell RNA Barcoding and sequencing" (SCRB-seq) and publishing an improved version that uses optimized reaction conditions and molecular crowding (mcSCRB-seq). mcSCRB-seq achieves higher sensitivity at lower cost per cell and shows the highest RNA capture rate when compared with other published methods. We next sought the direct comparison to other scRNA-seq protocols within the Human Cell Atlas (HCA) benchmarking effort. Here we used mcSCRB-seq to profile a common reference sample that included heterogeneous cell populations from different sources. Transfer of the acquired knowledge on single cell RNA sequencing methods to bulk RNA-seq, led to the development of the prime-seq protocol. A sensitive, robust and cost-efficient bulk RNA-seq protocol that can be performed in any molecular biology laboratory. We compared the data generated, using the prime-seq protocol to the gold standard method TruSeq, using power simulations and found that the statistical power to detect differentially expressed genes is comparable, at 40-fold lower cost. While gene expression is an informative phenotype, the regulation that leads to the different phenotypes is still poorly understood. A state-of-the-art method to measure the activity of cis-regulatory elements (CRE) in a high throughput fashion are Massively Parallel Reporter Assays (MPRA). These assays can be used to measure the activity of thousands of cis-Regulatory Elements (CRE) in parallel. A good way to decode the genotype to phenotype conundrum is using evolutionary information. Cross-species comparisons of closely related species can help understand how particular diverging phenotypes emerged and how conserved gene regulatory programs are encoded in the genome. A very useful tool to perform comparative studies are cell lines, particularly induced Pluripotent Stem Cells (iPSCs). iPSCs can be reprogrammed from different primary somatic cells and are per definition pluripotent, meaning they can be differentiated into cells of all three germlayers. A main challenge for primate research is to obtain primary cells. To this end I contributed to establishing a protocol to generate iPSCs from a non-invasive source of primary cells, namely urine. By using prime-seq we characterized the primary Urine Derived Stem Cells (UDSCs) and the reprogrammed iPSCs. Finally, I used an MPRA to measure activity of putative regulatory elements of the gene TRNP1 across the mammalian phylogeny. We found co-evolution of one particular CRE with brain folding in old world monkeys. To validate the finding we looked for transcription factor binding sites within the identified CRE and intersected the list with transcription factors confirmed to be expressed in the cellular system using prime-seq. In addition we found that changes in the protein coding sequence of TRNP1 and neural stem cell proliferation induced by TRNP1 orthologs correlate with brain size. In summary, within my doctorate I developed methods that enable measuring gene expression and gene regulation in a comparative genomics setting. I further applied these methods in a cross mammalian study of the regulatory sequences of the gene TRNP1 and its association with brain phenotypes

    Structured data abstractions and interpretable latent representations for single-cell multimodal genomics

    Get PDF
    Single-cell multimodal genomics involves simultaneous measurement of multiple types of molecular data, such as gene expression, epigenetic marks and protein abundance, in individual cells. This allows for a comprehensive and nuanced understanding of the molecular basis of cellular identity and function. The large volume of data generated by single-cell multimodal genomics experiments requires specialised methods and tools for handling, storing, and analysing it. This work provides contributions on multiple levels. First, it introduces a single-cell multimodal data standard — MuData — designed to facilitate the handling, storage and exchange of multimodal data. MuData provides interfaces that enable transparent access to multimodal annotations as well as data from individual modalities. This data structure has formed the foundation for the multimodal integration framework, which enables complex and composable workflows that can be naturally integrated with existing omics-specific analysis approaches. Joint analysis of multimodal data can be performed using integration methods. In order to enable integration of single-cell data, an improved multi-omics factor analysis model (MOFA+) has been designed and implemented building on the canonical dimensionality reduction approach for multi-omics integration. Inferring later factors that explain variation across multiple modalities of the data, MOFA+ enables the modelling of latent factors with cell group-specific patterns of activity. MOFA+ model has been implemented as part of the respective multi-omics integration framework, and its utility has been extended by software solutions that facilitate interactive model exploration and interpretation. The newly improved model for multi-omics integration of single cells has been applied to the study of gene expression signatures upon targeted gene activation. In a dataset featuring targeted activation of candidate regulators of zygotic genome activation (ZGA) — a crucial transcriptional event in early embryonic development, — modelling expression of both coding and non-coding loci with MOFA+ allowed to rank genes by their potency to activate a ZGA-like transcriptional response. With identification of Patz1, Dppa2 and Smarca5 as potent inducers of ZGA-like transcription in mouse embryonic stem cells, these findings have contributed to the understanding of molecular mechanisms behind ZGA and laid the foundation for future research of ZGA in vivo. In summary, this work’s contributions include the development of data handling and integration methods as well as new biological insights that arose from applying these methods to studying gene expression regulation in early development. This highlights how single-cell multimodal genomics can aid to generate valuable insights into complex biological systems

    The Human Cell Atlas White Paper

    Get PDF
    The Human Cell Atlas (HCA) will be made up of comprehensive reference maps of all human cells - the fundamental units of life - as a basis for understanding fundamental human biological processes and diagnosing, monitoring, and treating disease. It will help scientists understand how genetic variants impact disease risk, define drug toxicities, discover better therapies, and advance regenerative medicine. A resource of such ambition and scale should be built in stages, increasing in size, breadth, and resolution as technologies develop and understanding deepens. We will therefore pursue Phase 1 as a suite of flagship projects in key tissues, systems, and organs. We will bring together experts in biology, medicine, genomics, technology development and computation (including data analysis, software engineering, and visualization). We will also need standardized experimental and computational methods that will allow us to compare diverse cell and tissue types - and samples across human communities - in consistent ways, ensuring that the resulting resource is truly global. This document, the first version of the HCA White Paper, was written by experts in the field with feedback and suggestions from the HCA community, gathered during recent international meetings. The White Paper, released at the close of this yearlong planning process, will be a living document that evolves as the HCA community provides additional feedback, as technological and computational advances are made, and as lessons are learned during the construction of the atlas

    Molecular mechanisms underlying the effects of dietary fiber in the large intestine

    Get PDF
    Abstract Interactions between diet, microbiota and host response are important for intestinal health. Dietary fibers are known to promote intestinal health. Dietary fibers are edible plant-derived food components that encompass complex carbohydrates and lignin, resist the digestion in the small intestine of which some are degraded and fermented by gut microbiota in the large intestine, i.e. cecum and colon. The beneficial health effects of dietary fiber are suggested to be mediated by short-chain fatty acids (SCFA), which are produced by gut microbial fermentation. The underlying mechanisms of the interaction between dietary fiber, SCFA, and the host, however, are not in detail known. The objective of the research described in this thesis was to investigate the molecular effects and mechanisms underlying the effects of dietary fiber and its fermentation products, SCFA, in the large intestine. Firstly, the colonic transcriptional response to the main SCFA, acetate, propionate and butyrate, was investigated. SCFA were administered by rectal infusion in C57BL/6 mice fed a low fat/high carbohydrate (LFD) or high fat/low carbohydrate diet (HFD) and whole-genome gene expression analysis was performed on colonic scrapings by microarray technology. The analysis revealed specific and overlapping genes regulated between acetate, propionate and butyrate. In addition, gene response to SCFA was dependent on the diet, in particular for propionate. A set of propionate-regulated genes was activated on LFD while suppressed on a HFD and vice versa, indicating that diet composition is important factor in colonic response to SCFA. Secondly, the molecular effects of different dietary fibers and a control diet on the large intestine were investigated. Five different dietary fibers (inulin, fructo-oligosaccharide, arabinoxylan, guar gum, resistant starch) and a control diet were fed to C57BL/6 mice (10 days). The transcriptional response to the fermentable fibers was comparable in gene expression, microbiota composition, and luminal SCFA level in colon. In common for all fermented dietary fibers, the transcriptional regulator Pparg was identified as potential upstream regulator for the mucosal gene expression response. Moreover, bacteria mainly belonging to Clostridium cluster XIVa were found to correlate with mucosal genes related to metabolic, energy-generating processes. Next to common responses, analysis of the transcriptome revealed distinct responses of different dietary fibers. With respect to the cecal metatranscriptome, we identified distinct activities of bacterial families in the fermentation of dietary fiber. Moreover, using multivariate statistical analysis, we found correlations of the mucosal transcriptome with both the microbiota composition and metatranscriptome. In addition, we showed that SCFA, particularly butyrate and to a lesser extend propionate, transactivate PPARg and regulate the PPARg target gene Angptl4 in colonic cells. Thirdly, we tested the hypothesis that epithelial Pparg plays an important role in the fermentation of dietary fibers in the gut. Mice with an intestine-specific knock out (KO) of Pparg (cre-villin) and wild type (WT) mice were fed inulin (10 days). Whole-genome gene expression analysis of the colon revealed that diet had a larger effect than genotype on colonic, luminal microbiota composition, metabolome and mucosal transcriptome. We identified genes that were regulated by inulin in Pparg-dependent manner. In addition, we also identified genes regulated by butyrate in Pparg-dependent manner in organoids grown from colonic crypt cells derived from KO or WT mice. In conclusion, we identified distinct mucosal gene expression responses to the main fermentation products of dietary fiber, SCFA, on both low fat/high carbohydrate and high fat/low carbohydrate diet backgrounds. Dietary fibers induce common and specific effects in colon. Epithelial Pparg partially governs the response to fermentation of dietary fiber in colon. Next to the commonalties of dietary fiber for intestinal physiology, specific and differential effects were identified for microbial gene activity and composition as well as mucosal transcriptome response indicating that omics tools are useful in elucidating and dissecting effects of dietary fiber. </p

    From tools and databases to clinically relevant applications in miRNA research

    Get PDF
    While especially early research focused on the small portion of the human genome that encodes proteins, it became apparent that molecules responsible for many key functions were also encoded in the remaining regions. Originally, non-coding RNAs, i.e., molecules that are not translated into proteins, were thought to be composed of only two classes (ribosomal RNAs and transfer RNAs). However, starting from the early 1980s many other non-coding RNA classes were discovered. In the past two decades, small non-coding RNAs (sncRNAs) and in particular microRNAs (miRNAs), have become essential molecules in biological and biomedical research. In this thesis, five aspects of miRNA research have been addressed. Starting from the development of advanced computational software to analyze miRNA data (1), an in-depth understanding of human and non-human miRNAs was generated and databases hosting this knowledge were created (2). In addition, the effects of technological advances were evaluated (3). We also contributed to the understanding on how miRNAs act in an orchestrated manner to target human genes (4). Finally, based on the insights gained from the tools and resources of the mentioned aspects we evaluated the suitability of miRNAs as biomarkers (5). With the establishment of next-generation sequencing, the primary goal of this thesis was the creation of an advanced bioinformatics analysis pipeline for high-throughput miRNA sequencing data, primarily focused on human. Consequently, miRMaster, a web-based software solution to analyze hundreds sequencing samples within few hours was implemented. The tool was implemented in a way that it could support different sequencing technologies and library preparation techniques. This flexibility allowed miRMaster to build a consequent user-base, resulting in over 120,000 processed samples and 1,5 billion processed reads, as of July 2021, and therefore laid out the basis for the second goal of this thesis. Indeed, the implementation of a feature allowing users to share their uploaded data contributed strongly to the generation of a detailed annotation of the human small non-coding transcriptome. This annotation was integrated into a new miRNA database, miRCarta, modelling thousands of miRNA candidates and corresponding read expression profiles. A subset of these candidates was then evaluated in the context of different diseases and validated. The thereby gained knowledge was subsequently used to validate additional miRNA candidates and to generate an estimate of the number of miRNAs in human. The large collection of samples, gathered over many years with miRMaster was also integrated into a web server evaluating miRNA arm shifts and switches, miRSwitch. Finally, we published an updated version of miRMaster, expanding its scope to other species and adding additional downstream analysis capabilities. The second goal of this thesis was further pursued by investigating the distribution of miRNAs across different human tissues and body fluids, as well as the variability of miRNA profiles over the four seasons of the year. Furthermore, small non-coding RNAs in zoo animals were examined and a tissue atlas of small non-coding RNAs for mice was generated. The third goal, the assessment of technological advances, was addressed by evaluating the new combinatorial probe-anchor synthesis-based sequencing technology published by BGI, analyzing the effect of RNA integrity on sequencing data, analyzing low-input library preparation protocols, and comparing template-switch based library preparation protocols to ligation-based ones. In addition, an antibody-based labeling sequencing chemistry, CoolMPS, was investigated. Deriving an understanding of the orchestrated regulation by miRNAs, the fourth goal of this thesis, was pursued in a first step by the implementation of a web server visualizing miRNA-gene interaction networks, miRTargetLink. Subsequently, miRPathDB, a database incorporating pathways affected by miRNAs and their targets was implemented, as well as miEAA 2.0, a web server offering quick miRNA set enrichment analyses in over 130,000 categories spanning 10 different species. In addition, miRSNPdb, a database evaluating the effects of single nucleotide polymorphisms and variants in miRNAs or in their target genes was created. Finally, the fifth goal of the thesis, the evaluation of the suitability of miRNAs as biomarkers for human diseases was tackled by investigating the expression profiles of miRNAs with machine learning. An Alzheimer's disease cohort with over 400 individuals was analyzed, as well as another neurodegenerative disease cohort with multiple time points of Parkinson's disease patients and healthy controls. Furthermore, a lung cancer cohort covering 3,000 individuals was examined to evaluate the suitability of an early detection test. In addition, we evaluated the expression profile changes induced by aging on a cohort of 1,334 healthy individuals and over 3,000 diseased patients. Altogether, the herein described tools, databases and research papers present valuable advances and insights into the miRNA research field and have been used and cited by the research community over 2,000 times as of July 2021.Während insbesondere die frühe Genetik-Forschung sich auf den kleinen Teil des menschlichen Genoms konzentrierte, der für Proteine kodiert, wurde deutlich, dass auch in den übrigen Regionen Moleküle kodiert werden, die für viele wichtige Funktionen verantwortlich sind. Ursprünglich ging man davon aus, dass nicht codierende RNAs, d. h. Moleküle, die nicht in Proteine übersetzt werden, nur aus zwei Klassen bestehen (ribosomale RNAs und Transfer-RNAs). Seit den frühen 1980er Jahren wurden jedoch viele andere nicht-kodierende RNA-Klassen entdeckt. In den letzten zwei Jahrzehnten sind kleine nichtcodierende RNAs (sncRNAs) und insbesondere microRNAs (miRNAs) zu wichtigen Molekülen in der biologischen und biomedizinischen Forschung geworden. In dieser Arbeit werden fünf Aspekte der miRNA-Forschung behandelt. Ausgehend von der Entwicklung fortschrittlicher Computersoftware zur Analyse von miRNA-Daten (1) wurde ein tiefgreifendes Verständnis menschlicher und nicht-menschlicher miRNAs entwickelt und Datenbanken mit diesem Wissen erstellt (2). Darüber hinaus wurden die Auswirkungen des technologischen Fortschritts bewertet (3). Wir haben auch dazu beigetragen, zu verstehen, wie miRNAs koordiniert agieren, um menschliche Gene zu regulieren (4). Schließlich bewerteten wir anhand der Erkenntnisse, die wir mit den Tools und Ressourcen der genannten Aspekte gewonnen hatten, die Eignung von miRNAs als Biomarker (5). Mit der Etablierung der Sequenzierung der nächsten Generation war das primäre Ziel dieser Arbeit die Schaffung einer fortschrittlichen bioinformatischen Analysepipeline für Hochdurchsatz-MiRNA-Sequenzierungsdaten, die sich in erster Linie auf den Menschen konzentriert. Daher wurde miRMaster, eine webbasierte Softwarelösung zur Analyse von Hunderten von Sequenzierproben innerhalb weniger Stunden, implementiert. Das Tool wurde so implementiert, dass es verschiedene Sequenzierungstechnologien und Bibliotheksvorbereitungstechniken unterstützen kann. Diese Flexibilität ermöglichte es miRMaster, eine konsequente Nutzerbasis aufzubauen, die im Juli 2021 über 120.000 verarbeitete Proben und 1,5 Milliarden verarbeitete Reads umfasste, womit die Grundlage für das zweite Ziel dieser Arbeit geschaffen wurde. Die Implementierung einer Funktion, die es den Nutzern ermöglicht, ihre hochgeladenen Daten mit anderen zu teilen, trug wesentlich zur Erstellung einer detaillierten Annotation des menschlichen kleinen nicht-kodierenden Transkriptoms bei. Diese Annotation wurde in eine neue miRNA-Datenbank, miRCarta, integriert, die Tausende von miRNA-Kandidaten und entsprechende Expressionsprofile abbildet. Eine Teilmenge dieser Kandidaten wurde dann im Zusammenhang mit verschiedenen Krankheiten bewertet und validiert. Die so gewonnenen Erkenntnisse wurden anschließend genutzt, um weitere miRNA-Kandidaten zu validieren und eine Schätzung der Anzahl der miRNAs im Menschen vorzunehmen. Die große Sammlung von Proben, die über viele Jahre mit miRMaster gesammelt wurde, wurde auch in einen Webserver integriert, der miRNA-Armverschiebungen und -Wechsel auswertet, miRSwitch. Schließlich haben wir eine aktualisierte Version von miRMaster veröffentlicht, die den Anwendungsbereich auf andere Spezies ausweitet und zusätzliche Downstream-Analysefunktionen hinzufügt. Das zweite Ziel dieser Arbeit wurde weiterverfolgt, indem die Verteilung von miRNAs in verschiedenen menschlichen Geweben und Körperflüssigkeiten sowie die Variabilität der miRNA-Profile über die vier Jahreszeiten hinweg untersucht wurde. Darüber hinaus wurden kleine nichtkodierende RNAs in Zootieren untersucht und ein Gewebeatlas der kleinen nichtkodierenden RNAs für Mäuse erstellt. Das dritte Ziel, die Einschätzung des technologischen Fortschritts, wurde angegangen, indem die neue kombinatorische Sonden-Anker-Synthese-basierte Sequenzierungstechnologie, die vom BGI veröffentlicht wurde, bewertet wurde, die Auswirkungen der RNA-Integrität auf die Sequenzierungsdaten analysiert wurden, Protokolle für die Bibliotheksvorbereitung mit geringem Input analysiert wurden und Protokolle für die Bibliotheksvorbereitung auf der Basis von Template-Switch mit solchen auf Ligationsbasis verglichen wurden. Darüber hinaus wurde eine auf Antikörpern basierende Labeling-Sequenzierungschemie, CoolMPS, untersucht. Das vierte Ziel dieser Arbeit, das Verständnis der orchestrierten Regulation durch miRNAs, wurde in einem ersten Schritt durch die Implementierung eines Webservers zur Visualisierung von miRNA-Gen-Interaktionsnetzwerken, miRTargetLink, verfolgt. Anschließend wurde miRPathDB implementiert, eine Datenbank, die von miRNAs und ihren Zielgenen beeinflusste Pfade enthält, sowie miEAA 2.0, ein Webserver, der schnelle miRNA-Anreicherungsanalysen in über 130.000 Kategorien aus 10 verschiedenen Spezies bietet. Darüber hinaus wurde miRSNPdb, eine Datenbank zur Bewertung der Auswirkungen von Einzelnukleotid-Polymorphismen und Varianten in miRNAs oder ihren Zielgenen, erstellt. Schließlich wurde das fünfte Ziel der Arbeit, die Bewertung der Eignung von miRNAs als Biomarker für menschliche Krankheiten, durch die Untersuchung der Expressionsprofile von miRNAs anhand von maschinellem Lernen angegangen. Eine Alzheimer-Kohorte mit über 400 Personen wurde analysiert, ebenso wie eine weitere neurodegenerative Krankheitskohorte mit Parkinson-Patienten an mehreren Zeitpunkten der Krankheit und gesunden Kontrollen. Außerdem wurde eine Lungenkrebskohorte mit 3.000 Personen untersucht, um die Eignung eines Früherkennungstests zu bewerten. Darüber hinaus haben wir die altersbedingten Veränderungen des Expressionsprofils bei einer Kohorte von 1.334 gesunden Personen und über 3.000 kranken Patienten untersucht. Insgesamt stellen die hier beschriebenen Tools, Datenbanken und Forschungsarbeiten wertvolle Fortschritte und Erkenntnisse auf dem Gebiet der miRNA-Forschung dar und wurden bis Juli 2021 von der Forschungsgemeinschaft über 2.000 Mal verwendet und zitiert
    corecore