227 research outputs found

    In silico modelling of RNA-RNA dimer and its application for rational siRNA design and ncRNA target search

    Get PDF
    Non-protein coding region, which constitutes 98.5% of the human genome, were long depreciated as evolutive relict. It is only recently that the biological relevance of\ud the non-coding RNAs associated with these non-coding regions was recognized. The development of experimental and bioinformatical methods aimed at detecting these non-coding RNAs (ncRNAs) lead to the discovery of more than 29,000,000 sequences, grouped into more than 1300 families. More often than not these ncRNAs function by binding to other RNAs, either pro- tein coding or non-protein coding. Compared to the number of tools to detect and classify ncRNAs, the number of tools to search for putative RNA binding partners is negligible. This leads to the actual situation where the function of the majority of the annotated ncRNAs genes is completely unknown. The aim of this work is to assess the function of different families of ncRNAs by developing new algorithms and methods to study RNA-RNA interactions. These new methods are extensions of RNA-folding algorithms applied to the problem of RNA- RNA interactions. Depending on the class of ncRNA studied, different methods were developed and tested. This work shows that the development of RNA-folding algorithms to study RNA- RNA interactions is a promising way to functionally annotate ncRNAs. Still other factors like RNA-proteins interaction, RNA-concentration or RNA-expression, play an important role in the process of RNA hybridization and will have to be taken into account in future works in order to achieve reliable prediction of RNA binding partners.Non-protein coding region, which constitutes 98.5% of the human genome, were long depreciated as evolutive relict. It is only recently that the biological relevance of the non-coding RNAs associated with these non-coding regions was recognized. The development of experimental and bioinformatical methods aimed at detecting these non-coding RNAs (ncRNAs) lead to the discovery of more than 29,000,000 sequences, grouped into more than 1300 families. More often than not these ncRNAs function by binding to other RNAs, either pro- tein coding or non-protein coding. Compared to the number of tools to detect and classify ncRNAs, the number of tools to search for putative RNA binding partners is negligible. This leads to the actual situation where the function of the majority of the annotated ncRNAs genes is completely unknown. The aim of this work is to assess the function of different families of ncRNAs by developing new algorithms and methods to study RNA-RNA interactions. These new methods are extensions of RNA-folding algorithms applied to the problem of RNA- RNA interactions. Depending on the class of ncRNA studied, different methods were developed and tested. This work shows that the development of RNA-folding algorithms to study RNA- RNA interactions is a promising way to functionally annotate ncRNAs. Still other factors like RNA-proteins interaction, RNA-concentration or RNA-expression, play an important role in the process of RNA hybridization and will have to be taken into account in future works in order to achieve reliable prediction of RNA binding partners

    Variations on RNA folding and alignment: lessons from Benasque

    Get PDF
    Dynamic Programming Algorithms solve many standard problems of RNA bioinformatics in polynomial time. In this contribution we discuss a series of variations on these standard methods that implement refined biophysical models, such as a restriction of RNA folding to canonical structures, and an extension of structural alignments to an explicit scoring of stacking propensities. Furthermore, we demonstrate that a local structural alignment can be employed for ncRNA gene finding. In this context we discuss scanning variants for folding and alignment algorithms

    Emerging Topics in Genome Sequencing and Analysis

    Get PDF
    This dissertation studies the emerging topics in genome sequencing and analysis with DNA and RNA. The optimal hybrid sequencing and assembly for accurate genome reconstruction and efficient detection approaches for novel ncRNAs in genomes are discussed. The next-generation sequencing is a significant topic that provides whole genetic information for the further biological research. Recent advances in high-throughput genome sequencing technologies have enabled the systematic study of various genomes by making whole genome sequencing affordable. To date, many hybrid genome assembly algorithms have been developed that can take reads from multiple read sources to reconstruct the original genome. An important aspect of hybrid sequencing and assembly is that the feasibility conditions for genome reconstruction can be satisfied by different combinations of the available read sources, opening up the possibility of optimally combining the sources to minimize the sequencing cost while ensuring accurate genome reconstruction. In this study, we derive the conditions for whole genome reconstruction from multiple read sources at a given confidence level and also introduce the optimal strategy for combining reads from different sources to minimize the overall sequencing cost. We show that the optimal read set, which simultaneously satisfies the feasibility conditions for genome reconstruction and minimizes the sequencing cost, can be effectively predicted through constrained discrete optimization. The availability of genome-wide sequences for a variety of species provides a large database for the further RNA analysis with computational methods. Recent studies have shown that noncoding RNAs (ncRNAs) are known to play crucial roles in various biological processes, and some ncRNAs are related to the genome stability and a variety of inherited diseases. The discovery of novel ncRNAs is hence an important topic, and there is a pressing need for accurate computational detection approaches that can be used to efficiently detect novel ncRNAs in genomes. One important issue is RNA structure alignment for comparative genome analysis, as RNA secondary structures are better conserved than the RNA sequences. Simultaneous RNA alignment and folding algorithms aim to accurately align RNAs by predicting the consensus structure and alignment at the same time, but the computational complexity of the optimal dynamic programming algorithm for simultaneous alignment and folding is extremely high. In this work, we proposed an innovative method, TOPAS, for RNA structural alignment that can efficiently align RNAs through topological networks. Although many ncRNAs are known to have a well conserved secondary structure, which provides useful clues for computational prediction, the prediction of ncRNAs is still challenging, since it has been shown that a structure-based approach alone may not be sufficient for detecting ncRNAs in a single sequence. In this study, we first develop a new approach by utilizing the n-gram model to classify the sequences and extract effective features to capture sequence homology. Based on this approach, we propose an advanced method, piRNAdetect, for reliable computational prediction of piRNAs in genome sequences. Utilizing the n-gram model can enhance the detection of ncRNAs that have sparse folding structures with many unpaired bases. By incorporating the n-gram model with the generalized ensemble defect, which assesses structure conservation and conformation to the consensus structure, we further propose RNAdetect, a novel computational method for accurate detection of ncRNAs through comparative genome analysis. Extensive performance evaluation based on the Rfam database and bacterial genomes demonstrates that our approaches can accurately and reliably detect novel ncRNAs, outperforming the current advanced methods

    Efficient Algorithms for Probing the RNA Mutation Landscape

    Get PDF
    The diversity and importance of the role played by RNAs in the regulation and development of the cell are now well-known and well-documented. This broad range of functions is achieved through specific structures that have been (presumably) optimized through evolution. State-of-the-art methods, such as McCaskill's algorithm, use a statistical mechanics framework based on the computation of the partition function over the canonical ensemble of all possible secondary structures on a given sequence. Although secondary structure predictions from thermodynamics-based algorithms are not as accurate as methods employing comparative genomics, the former methods are the only available tools to investigate novel RNAs, such as the many RNAs of unknown function recently reported by the ENCODE consortium. In this paper, we generalize the McCaskill partition function algorithm to sum over the grand canonical ensemble of all secondary structures of all mutants of the given sequence. Specifically, our new program, RNAmutants, simultaneously computes for each integer k the minimum free energy structure MFE(k) and the partition function Z(k) over all secondary structures of all k-point mutants, even allowing the user to specify certain positions required not to mutate and certain positions required to base-pair or remain unpaired. This technically important extension allows us to study the resilience of an RNA molecule to pointwise mutations. By computing the mutation profile of a sequence, a novel graphical representation of the mutational tendency of nucleotide positions, we analyze the deleterious nature of mutating specific nucleotide positions or groups of positions. We have successfully applied RNAmutants to investigate deleterious mutations (mutations that radically modify the secondary structure) in the Hepatitis C virus cis-acting replication element and to evaluate the evolutionary pressure applied on different regions of the HIV trans-activation response element. In particular, we show qualitative agreement between published Hepatitis C and HIV experimental mutagenesis studies and our analysis of deleterious mutations using RNAmutants. Our work also predicts other deleterious mutations, which could be verified experimentally. Finally, we provide evidence that the 3′ UTR of the GB RNA virus C has been optimized to preserve evolutionarily conserved stem regions from a deleterious effect of pointwise mutations. We hope that there will be long-term potential applications of RNAmutants in de novo RNA design and drug design against RNA viruses. This work also suggests potential applications for large-scale exploration of the RNA sequence-structure network. Binary distributions are available at http://RNAmutants.csail.mit.edu/

    Region based gene expression via reanalysis of publicly available microarray data sets.

    Get PDF
    A DNA microarray is a high-throughput technology used to identify relative gene expression. One of the most widely used platforms is the Affymetrix® GeneChip® technology which detects gene expression levels based on probe sets composed of a set of twenty-five nucleotide probes designed to hybridize with specific gene targets. Given a particular Affymetrix® GeneChip® platform, the design of the probes is fixed. However, the method of analysis is dynamic in nature due to the ability to annotate and group probes into uniquely defined groupings. This is particularly important since publicly available repositories of microarray datasets, such as ArrayExpress and NCBI’s Gene Expression Omnibus (GEO) have made millions of samples readily available to be reanalyzed computationally without the need for new biological experiments. One way in which the analysis can dynamically change is by correcting the mapping between probe sets and targets by creating custom Chip Description Files (CDFs) to arrange which probes belong to which probe set based on the latest genomic information or specific annotations of interest. Since default probe sets in Affymetrix® GeneChip® platforms are specific for a gene, transcript or exon, the analyses are then limited to profile differential expression at the gene, transcript or individual exon level. However, it has been revealed that untranslated regions (UTRs) of mRNA have important impacts on the regulation of proteins. We therefore developed a new probe mapping protocol that addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome information and grouping the probes into region (UTR, individual exon), gene and transcript level targets of interest to support a better understanding of the effect of UTRs and individual exons on gene expression levels. Furthermore, we developed an R package, affyCustomCdf, for users to dynamically create custom CDFs. The affyCustomCdf tool takes annotations in a General/Gene Transfer Format File (GTF), aligns probes to gene annotations via Nested Containment List (NCList) indexing and generates a custom Chip Description File (CDF) to regroup probes into probe sets based on a region (UTR and individual exon), transcript or gene level. Our results indicate that removing probes that no longer align to the genome without mismatches or align to multiple locations can help to reduce false-positive differential expression, as can removal of probes in regions overlapping multiple genes. Moreover, our method based on regions can detect changes that would have been missed by analysis based on gene and transcript. It also allows for a better understanding of 3’ UTR dynamics through the reanalysis of publicly available data

    Expanding the SnoRNA Interaction Network: Conservation of Guiding Function in Vertebrates

    Get PDF
    Small nucleolar RNAs (snoRNAs) are one of the most abundant and evolutionary ancient group of small non-coding RNAs. Their main function is to target chemical modifications of ribosomal RNAs (rRNAs) and small nuclear (snRNAs). They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of modification that they govern. The box H/ACA snoRNAs are responsible for targeting pseudouridylation sites and the box C/D snoRNAs for directing 2’-O-methylation of ribonucleotides. A subclass that localize to the Cajal bodies, termed scaRNAs, are responsible for methylation and pseudouridylation of snRNAs. In addition an amazing diversity of non-canonical functions of individual snoRNAs arose. The modification patterns in rRNAs and snRNAs are retained during evolution making it even possible to project them from yeast onto human. The stringent conservation of modification sites and the slow evolution of rRNAs and snRNAs contradicts the rapid evolution of snoRNA sequences. Recent studies that incorporate high-throughput sequencing experiments still identify undetected snoRNAs even in well studied organisms as human. The snoRNAbase, which has been the standard database for human snoRNAs has not been updated ince 2006 and misses these new data. Along with the lack of a centralized data collection across species, which incorporates also snoRNA class specific characteristics the need to integrate distributed data from literature and databases into a comprehensive snoRNA set arose. Although several snoRNA studies included pro forma target predictions in individual species and more and more studies focus on non-canonical functions of subclasses a systematic survey on the guiding function and especially functional homologies of snoRNAs was not available. To establish a sound set of snoRNAs a computational snoRNA annotation pipeline, named snoStrip that identifies homologous snoRNAs in related species was employed. For large scale investigation of the snoRNA function, state-of-the-art target pedictions were performed with our software RNAsnoop and PLEXY. Further, a new measure the Interaction Conservation Index (ICI) was developed to evaluate the conservation of snoRNA function. The snoStrip pipeline was applied to vertebrate species, where the genome sequence has been available. In addition, it was used in several ncRNA annotation studies (48 avian, spotted gar) of newly assembled genomes to contribute the snoRNA genes. Detailed target analysis of the new vertebrate snoRNA set revealed that in general functions of homologous snoRNAs are evolutionarily stable, thus, members of the same snoRNA family guide equivalent modifications. The conservation of snoRNA sequences is high at target binding regions while the remaining sequence varies significantly. In addition to elucidating principles of correlated evolution it was possible, with the help of the ICI measure, to assign functions to previously orphan snoRNAs and to associate snoRNAs as partners to known but so far unexplained chemical modifications. As further pattern redundant guiding became apparent. For many modification sites more than one snoRNA encodes the appropriate antisense element (ASE), which could ensure constant modification through snoRNAs that have different expression patterns. Furthermore, predictions of snoRNA functions in conjunction with sequence conservation could identify distant homologies. Due to the high overall entropy of snoRNA sequences, such relationships are hard to detect by means of sequence homology search methods alone. The snoRNA interaction network was further expanded through novel snoRNAs that were detected in data from high-throughput experiments in human and mouse. Through subsequent target analysis the new snoRNAs could immediately explain known modifications that had no appropriate snoRNA guide assigned before. In a further study a full catalog of expressed snoRNAs in human was provided. Beside canonical snoRNAs also recent findings like AluACAs, sno-lncRNAs and extraordinary short SNORD-like transcripts were taken into account. Again the target analysis workflow identified undetected connections between snoRNA guides and modifications. Especially some species/clade specific interactions of SNORD-like genes emerged that seem to act as bona fide snoRNA guides for rRNA and snRNA modifications. For all high confident new snoRNA genes identified during this work official gene names were requested from the HUGO Gene Nomenclature Committee (HGNC) avoiding further naming confusion

    From RNA folding to inverse folding: a computational study: Folding and design of RNA molecules

    Get PDF
    Since the discovery of the structure of DNA in the early 1953s and its double-chained complement of information hinting at its means of replication, biologists have recognized the strong connection between molecular structure and function. In the past two decades, there has been a surge of research on an ever-growing class of RNA molecules that are non-coding but whose various folded structures allow a diverse array of vital functions. From the well-known splicing and modification of ribosomal RNA, non-coding RNAs (ncRNAs) are now known to be intimately involved in possibly every stage of DNA translation and protein transcription, as well as RNA signalling and gene regulation processes. Despite the rapid development and declining cost of modern molecular methods, they typically can only describe ncRNA's structural conformations in vitro, which differ from their in vivo counterparts. Moreover, it is estimated that only a tiny fraction of known ncRNAs has been documented experimentally, often at a high cost. There is thus a growing realization that computational methods must play a central role in the analysis of ncRNAs. Not only do computational approaches hold the promise of rapidly characterizing many ncRNAs yet to be described, but there is also the hope that by understanding the rules that determine their structure, we will gain better insight into their function and design. Many studies revealed that the ncRNA functions are performed by high-level structures that often depend on their low-level structures, such as the secondary structure. This thesis studies the computational folding mechanism and inverse folding of ncRNAs at the secondary level. In this thesis, we describe the development of two bioinformatic tools that have the potential to improve our understanding of RNA secondary structure. These tools are as follows: (1) RAFFT for efficient prediction of pseudoknot-free RNA folding pathways using the fast Fourier transform (FFT)}; (2) aRNAque, an evolutionary algorithm inspired by LĂ©vy flights for RNA inverse folding with or without pseudoknot (A secondary structure that often poses difficulties for bio-computational detection). The first tool, RAFFT, implements a novel heuristic to predict RNA secondary structure formation pathways that has two components: (i) a folding algorithm and (ii) a kinetic ansatz. When considering the best prediction in the ensemble of 50 secondary structures predicted by RAFFT, its performance matches the recent deep-learning-based structure prediction methods. RAFFT also acts as a folding kinetic ansatz, which we tested on two RNAs: the CFSE and a classic bi-stable sequence. In both test cases, fewer structures were required to reproduce the full kinetics, whereas known methods (such as Treekin) required a sample of 20,000 structures and more. The second tool, aRNAque, implements an evolutionary algorithm (EA) inspired by the LĂ©vy flight, allowing both local global search and which supports pseudoknotted target structures. The number of point mutations at every step of aRNAque's EA is drawn from a Zipf distribution. Therefore, our proposed method increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. The overall performance showed improved empirical results compared to existing tools through intensive benchmarks on both pseudoknotted and pseudoknot-free datasets. In conclusion, we highlight some promising extensions of the versatile RAFFT method to RNA-RNA interaction studies. We also provide an outlook on both tools' implications in studying evolutionary dynamics

    Insights into the function of short interspersed degenerated retroposons in the protozoan parasite Leishmania

    Full text link
    Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal

    From tools and databases to clinically relevant applications in miRNA research

    Get PDF
    While especially early research focused on the small portion of the human genome that encodes proteins, it became apparent that molecules responsible for many key functions were also encoded in the remaining regions. Originally, non-coding RNAs, i.e., molecules that are not translated into proteins, were thought to be composed of only two classes (ribosomal RNAs and transfer RNAs). However, starting from the early 1980s many other non-coding RNA classes were discovered. In the past two decades, small non-coding RNAs (sncRNAs) and in particular microRNAs (miRNAs), have become essential molecules in biological and biomedical research. In this thesis, five aspects of miRNA research have been addressed. Starting from the development of advanced computational software to analyze miRNA data (1), an in-depth understanding of human and non-human miRNAs was generated and databases hosting this knowledge were created (2). In addition, the effects of technological advances were evaluated (3). We also contributed to the understanding on how miRNAs act in an orchestrated manner to target human genes (4). Finally, based on the insights gained from the tools and resources of the mentioned aspects we evaluated the suitability of miRNAs as biomarkers (5). With the establishment of next-generation sequencing, the primary goal of this thesis was the creation of an advanced bioinformatics analysis pipeline for high-throughput miRNA sequencing data, primarily focused on human. Consequently, miRMaster, a web-based software solution to analyze hundreds sequencing samples within few hours was implemented. The tool was implemented in a way that it could support different sequencing technologies and library preparation techniques. This flexibility allowed miRMaster to build a consequent user-base, resulting in over 120,000 processed samples and 1,5 billion processed reads, as of July 2021, and therefore laid out the basis for the second goal of this thesis. Indeed, the implementation of a feature allowing users to share their uploaded data contributed strongly to the generation of a detailed annotation of the human small non-coding transcriptome. This annotation was integrated into a new miRNA database, miRCarta, modelling thousands of miRNA candidates and corresponding read expression profiles. A subset of these candidates was then evaluated in the context of different diseases and validated. The thereby gained knowledge was subsequently used to validate additional miRNA candidates and to generate an estimate of the number of miRNAs in human. The large collection of samples, gathered over many years with miRMaster was also integrated into a web server evaluating miRNA arm shifts and switches, miRSwitch. Finally, we published an updated version of miRMaster, expanding its scope to other species and adding additional downstream analysis capabilities. The second goal of this thesis was further pursued by investigating the distribution of miRNAs across different human tissues and body fluids, as well as the variability of miRNA profiles over the four seasons of the year. Furthermore, small non-coding RNAs in zoo animals were examined and a tissue atlas of small non-coding RNAs for mice was generated. The third goal, the assessment of technological advances, was addressed by evaluating the new combinatorial probe-anchor synthesis-based sequencing technology published by BGI, analyzing the effect of RNA integrity on sequencing data, analyzing low-input library preparation protocols, and comparing template-switch based library preparation protocols to ligation-based ones. In addition, an antibody-based labeling sequencing chemistry, CoolMPS, was investigated. Deriving an understanding of the orchestrated regulation by miRNAs, the fourth goal of this thesis, was pursued in a first step by the implementation of a web server visualizing miRNA-gene interaction networks, miRTargetLink. Subsequently, miRPathDB, a database incorporating pathways affected by miRNAs and their targets was implemented, as well as miEAA 2.0, a web server offering quick miRNA set enrichment analyses in over 130,000 categories spanning 10 different species. In addition, miRSNPdb, a database evaluating the effects of single nucleotide polymorphisms and variants in miRNAs or in their target genes was created. Finally, the fifth goal of the thesis, the evaluation of the suitability of miRNAs as biomarkers for human diseases was tackled by investigating the expression profiles of miRNAs with machine learning. An Alzheimer's disease cohort with over 400 individuals was analyzed, as well as another neurodegenerative disease cohort with multiple time points of Parkinson's disease patients and healthy controls. Furthermore, a lung cancer cohort covering 3,000 individuals was examined to evaluate the suitability of an early detection test. In addition, we evaluated the expression profile changes induced by aging on a cohort of 1,334 healthy individuals and over 3,000 diseased patients. Altogether, the herein described tools, databases and research papers present valuable advances and insights into the miRNA research field and have been used and cited by the research community over 2,000 times as of July 2021.Während insbesondere die frühe Genetik-Forschung sich auf den kleinen Teil des menschlichen Genoms konzentrierte, der für Proteine kodiert, wurde deutlich, dass auch in den übrigen Regionen Moleküle kodiert werden, die für viele wichtige Funktionen verantwortlich sind. Ursprünglich ging man davon aus, dass nicht codierende RNAs, d. h. Moleküle, die nicht in Proteine übersetzt werden, nur aus zwei Klassen bestehen (ribosomale RNAs und Transfer-RNAs). Seit den frühen 1980er Jahren wurden jedoch viele andere nicht-kodierende RNA-Klassen entdeckt. In den letzten zwei Jahrzehnten sind kleine nichtcodierende RNAs (sncRNAs) und insbesondere microRNAs (miRNAs) zu wichtigen Molekülen in der biologischen und biomedizinischen Forschung geworden. In dieser Arbeit werden fünf Aspekte der miRNA-Forschung behandelt. Ausgehend von der Entwicklung fortschrittlicher Computersoftware zur Analyse von miRNA-Daten (1) wurde ein tiefgreifendes Verständnis menschlicher und nicht-menschlicher miRNAs entwickelt und Datenbanken mit diesem Wissen erstellt (2). Darüber hinaus wurden die Auswirkungen des technologischen Fortschritts bewertet (3). Wir haben auch dazu beigetragen, zu verstehen, wie miRNAs koordiniert agieren, um menschliche Gene zu regulieren (4). Schließlich bewerteten wir anhand der Erkenntnisse, die wir mit den Tools und Ressourcen der genannten Aspekte gewonnen hatten, die Eignung von miRNAs als Biomarker (5). Mit der Etablierung der Sequenzierung der nächsten Generation war das primäre Ziel dieser Arbeit die Schaffung einer fortschrittlichen bioinformatischen Analysepipeline für Hochdurchsatz-MiRNA-Sequenzierungsdaten, die sich in erster Linie auf den Menschen konzentriert. Daher wurde miRMaster, eine webbasierte Softwarelösung zur Analyse von Hunderten von Sequenzierproben innerhalb weniger Stunden, implementiert. Das Tool wurde so implementiert, dass es verschiedene Sequenzierungstechnologien und Bibliotheksvorbereitungstechniken unterstützen kann. Diese Flexibilität ermöglichte es miRMaster, eine konsequente Nutzerbasis aufzubauen, die im Juli 2021 über 120.000 verarbeitete Proben und 1,5 Milliarden verarbeitete Reads umfasste, womit die Grundlage für das zweite Ziel dieser Arbeit geschaffen wurde. Die Implementierung einer Funktion, die es den Nutzern ermöglicht, ihre hochgeladenen Daten mit anderen zu teilen, trug wesentlich zur Erstellung einer detaillierten Annotation des menschlichen kleinen nicht-kodierenden Transkriptoms bei. Diese Annotation wurde in eine neue miRNA-Datenbank, miRCarta, integriert, die Tausende von miRNA-Kandidaten und entsprechende Expressionsprofile abbildet. Eine Teilmenge dieser Kandidaten wurde dann im Zusammenhang mit verschiedenen Krankheiten bewertet und validiert. Die so gewonnenen Erkenntnisse wurden anschließend genutzt, um weitere miRNA-Kandidaten zu validieren und eine Schätzung der Anzahl der miRNAs im Menschen vorzunehmen. Die große Sammlung von Proben, die über viele Jahre mit miRMaster gesammelt wurde, wurde auch in einen Webserver integriert, der miRNA-Armverschiebungen und -Wechsel auswertet, miRSwitch. Schließlich haben wir eine aktualisierte Version von miRMaster veröffentlicht, die den Anwendungsbereich auf andere Spezies ausweitet und zusätzliche Downstream-Analysefunktionen hinzufügt. Das zweite Ziel dieser Arbeit wurde weiterverfolgt, indem die Verteilung von miRNAs in verschiedenen menschlichen Geweben und Körperflüssigkeiten sowie die Variabilität der miRNA-Profile über die vier Jahreszeiten hinweg untersucht wurde. Darüber hinaus wurden kleine nichtkodierende RNAs in Zootieren untersucht und ein Gewebeatlas der kleinen nichtkodierenden RNAs für Mäuse erstellt. Das dritte Ziel, die Einschätzung des technologischen Fortschritts, wurde angegangen, indem die neue kombinatorische Sonden-Anker-Synthese-basierte Sequenzierungstechnologie, die vom BGI veröffentlicht wurde, bewertet wurde, die Auswirkungen der RNA-Integrität auf die Sequenzierungsdaten analysiert wurden, Protokolle für die Bibliotheksvorbereitung mit geringem Input analysiert wurden und Protokolle für die Bibliotheksvorbereitung auf der Basis von Template-Switch mit solchen auf Ligationsbasis verglichen wurden. Darüber hinaus wurde eine auf Antikörpern basierende Labeling-Sequenzierungschemie, CoolMPS, untersucht. Das vierte Ziel dieser Arbeit, das Verständnis der orchestrierten Regulation durch miRNAs, wurde in einem ersten Schritt durch die Implementierung eines Webservers zur Visualisierung von miRNA-Gen-Interaktionsnetzwerken, miRTargetLink, verfolgt. Anschließend wurde miRPathDB implementiert, eine Datenbank, die von miRNAs und ihren Zielgenen beeinflusste Pfade enthält, sowie miEAA 2.0, ein Webserver, der schnelle miRNA-Anreicherungsanalysen in über 130.000 Kategorien aus 10 verschiedenen Spezies bietet. Darüber hinaus wurde miRSNPdb, eine Datenbank zur Bewertung der Auswirkungen von Einzelnukleotid-Polymorphismen und Varianten in miRNAs oder ihren Zielgenen, erstellt. Schließlich wurde das fünfte Ziel der Arbeit, die Bewertung der Eignung von miRNAs als Biomarker für menschliche Krankheiten, durch die Untersuchung der Expressionsprofile von miRNAs anhand von maschinellem Lernen angegangen. Eine Alzheimer-Kohorte mit über 400 Personen wurde analysiert, ebenso wie eine weitere neurodegenerative Krankheitskohorte mit Parkinson-Patienten an mehreren Zeitpunkten der Krankheit und gesunden Kontrollen. Außerdem wurde eine Lungenkrebskohorte mit 3.000 Personen untersucht, um die Eignung eines Früherkennungstests zu bewerten. Darüber hinaus haben wir die altersbedingten Veränderungen des Expressionsprofils bei einer Kohorte von 1.334 gesunden Personen und über 3.000 kranken Patienten untersucht. Insgesamt stellen die hier beschriebenen Tools, Datenbanken und Forschungsarbeiten wertvolle Fortschritte und Erkenntnisse auf dem Gebiet der miRNA-Forschung dar und wurden bis Juli 2021 von der Forschungsgemeinschaft über 2.000 Mal verwendet und zitiert
    • …
    corecore