47 research outputs found

    Gene regulatory network modelling with evolutionary algorithms -an integrative approach

    Get PDF
    Building models for gene regulation has been an important aim of Systems Biology over the past years, driven by the large amount of gene expression data that has become available. Models represent regulatory interactions between genes and transcription factors and can provide better understanding of biological processes, and means of simulating both natural and perturbed systems (e.g. those associated with disease). Gene regulatory network (GRN) quantitative modelling is still limited, however, due to data issues such as noise and restricted length of time series, typically used for GRN reverse engineering. These issues create an under-determination problem, with many models possibly fitting the data. However, large amounts of other types of biological data and knowledge are available, such as cross-platform measurements, knockout experiments, annotations, binding site affinities for transcription factors and so on. It has been postulated that integration of these can improve model quality obtained, by facilitating further filtering of possible models. However, integration is not straightforward, as the different types of data can provide contradictory information, and are intrinsically noisy, hence large scale integration has not been fully explored, to date. Here, we present an integrative parallel framework for GRN modelling, which employs evolutionary computation and different types of data to enhance model inference. Integration is performed at different levels. (i) An analysis of cross-platform integration of time series microarray data, discussing the effects on the resulting models and exploring crossplatform normalisation techniques, is presented. This shows that time-course data integration is possible, and results in models more robust to noise and parameter perturbation, as well as reduced noise over-fitting. (ii) Other types of measurements and knowledge, such as knock-out experiments, annotated transcription factors, binding site affinities and promoter sequences are integrated within the evolutionary framework to obtain more plausible GRN models. This is performed by customising initialisation, mutation and evaluation of candidate model solutions. The different data types are investigated and both qualitative and quantitative improvements are obtained. Results suggest that caution is needed in order to obtain improved models from combined data, and the case study presented here provides an example of how this can be achieved. Furthermore, (iii), RNA-seq data is studied in comparison to microarray experiments, to identify overlapping features and possibilities of integration within the framework. The extension of the framework to this data type is straightforward and qualitative improvements are obtained when combining predicted interactions from single-channel and RNA-seq datasets

    Computational Integrative Models for Cellular Conversion: Application to Cellular Reprogramming and Disease Modeling

    Get PDF
    The groundbreaking identification of only four transcription factors that are able to induce pluripotency in any somatic cell upon perturbation stimulated the discovery of copious amounts of instructive factors triggering different cellular conversions. Such conversions are highly significant to regenerative medicine with its ultimate goal of replacing or regenerating damaged and lost cells. Precise directed conversion of damaged cells into healthy cells offers the tantalizing prospect of promoting regeneration in situ. In the advent of high-throughput sequencing technologies, the distinct transcriptional and accessible chromatin landscapes of several cell types have been characterized. This characterization provided clear evidences for the existence of cell type specific gene regulatory networks determined by their distinct epigenetic landscapes that control cellular phenotypes. Further, these networks are known to dynamically change during the ectopic expression of genes initiating cellular conversions and stabilize again to represent the desired phenotype. Over the years, several computational approaches have been developed to leverage the large amounts of high-throughput datasets for a systematic prediction of instructive factors that can potentially induce desired cellular conversions. To date, the most promising approaches rely on the reconstruction of gene regulatory networks for a panel of well-studied cell types relying predominantly on transcriptional data alone. Though useful, these methods are not designed for newly identified cell types as their frameworks are restricted only to the panel of cell types originally incorporated. More importantly, these approaches rely majorly on gene expression data and cannot account for the cell type specific regulations modulated by the interplay of the transcriptional and epigenetic landscape. In this thesis, a computational method for reconstructing cell type specific gene regulatory networks is proposed that aims at addressing the aforementioned limitations of current approaches. This method integrates transcriptomics, chromatin accessibility assays and available prior knowledge about gene regulatory interactions for predicting instructive factors that can potentially induce desired cellular conversions. Its application to the prioritization of drugs for reverting pathologic phenotypes and the identification of instructive factors for inducing the cellular conversion of adipocytes into osteoblasts underlines the potential to assist in the discovery of novel therapeutic interventions

    Computational Integrative Models for Cellular Conversion: Application to Cellular Reprogramming and Disease Modeling

    Get PDF
    The groundbreaking identification of only four transcription factors that are able to induce pluripotency in any somatic cell upon perturbation stimulated the discovery of copious amounts of instructive factors triggering different cellular conversions. Such conversions are highly significant to regenerative medicine with its ultimate goal of replacing or regenerating damaged and lost cells. Precise directed conversion of damaged cells into healthy cells offers the tantalizing prospect of promoting regeneration in situ. In the advent of high-throughput sequencing technologies, the distinct transcriptional and accessible chromatin landscapes of several cell types have been characterized. This characterization provided clear evidences for the existence of cell type specific gene regulatory networks determined by their distinct epigenetic landscapes that control cellular phenotypes. Further, these networks are known to dynamically change during the ectopic expression of genes initiating cellular conversions and stabilize again to represent the desired phenotype. Over the years, several computational approaches have been developed to leverage the large amounts of high-throughput datasets for a systematic prediction of instructive factors that can potentially induce desired cellular conversions. To date, the most promising approaches rely on the reconstruction of gene regulatory networks for a panel of well-studied cell types relying predominantly on transcriptional data alone. Though useful, these methods are not designed for newly identified cell types as their frameworks are restricted only to the panel of cell types originally incorporated. More importantly, these approaches rely majorly on gene expression data and cannot account for the cell type specific regulations modulated by the interplay of the transcriptional and epigenetic landscape. In this thesis, a computational method for reconstructing cell type specific gene regulatory networks is proposed that aims at addressing the aforementioned limitations of current approaches. This method integrates transcriptomics, chromatin accessibility assays and available prior knowledge about gene regulatory interactions for predicting instructive factors that can potentially induce desired cellular conversions. Its application to the prioritization of drugs for reverting pathologic phenotypes and the identification of instructive factors for inducing the cellular conversion of adipocytes into osteoblasts underlines the potential to assist in the discovery of novel therapeutic interventions

    Computational Design and Experimental Validation of Functional Ribonucleic Acid Nanostructures

    Get PDF
    In living cells, two major classes of ribonucleic acid (RNA) molecules can be found. The first class called the messenger RNA (mRNA) contains the genetic information that allows the ribosome to read and translate it into proteins. The second class called non-coding RNA (ncRNA), do not code for proteins and are involved with key cellular processes, such as gene expression regulation, splicing, differentiation, and development. NcRNAs fold into an ensemble of thermodynamically stable secondary structures, which will eventually lead the molecule to fold into a specific 3D structure. It is widely known that ncRNAs carry their functions via their 3D structures as well as their molecular composition. The secondary structure of ncRNAs is composed of different types of structural elements (motifs) such as stacking base pairs, internal loops, hairpin loops and pseudoknots. Pseudoknots are specifically difficult to model, are abundant in nature and known to stabilize the functional form of the molecule. Due to the diverse range of functions of ncRNAs, their computational design and analysis have numerous applications in nano-technology, therapeutics, synthetic biology, and materials engineering. The RNA design problem is to find novel RNA sequences that are predicted to fold into target structure(s) while satisfying specific qualitative characteristics and constraints. RNA design can be modeled as a combinatorial optimization problem (COP) and is known to be computationally challenging or more precisely NP-hard. Numerous algorithms to solve the RNA design problem have been developed over the past two decades, however mostly ignore pseudoknots and therefore limit application to only a slice of real-world modeling and design problems. Moreover, the few existing pseudoknot designer methods which were developed only recently, do not provide any evidence about the applicability of their proposed design methodology in biological contexts. The two objectives of this thesis are set to address these two shortcomings. First, we are interested in developing an efficient computational method for the design of RNA secondary structures including pseudoknots that show significantly improved in-silico quality characteristics than the state of the art. Second, we are interested in showing the real-world worthiness of the proposed method by validating it experimentally. More precisely, our aim is to design instances of certain types of RNA enzymes (i.e. ribozymes) and demonstrate that they are functionally active. This would likely only happen if their predicted folding matched their actual folding in the in-vitro experiments. In this thesis, we present four contributions. First, we propose a novel adaptive defect weighted sampling algorithm to efficiently solve the RNA secondary structure design problem where pseudoknots are included. We compare the performance of our design algorithm with the state of the art and show that our method generates molecules that are thermodynamically more stable and less defective than those generated by state of the art methods. Moreover, we show when the effect of fitness evaluation is decoupled from the search and optimization process, our optimization method converges faster than the non-dominated sorting genetic algorithm (NSGA II) and the ant colony optimization (ACO) algorithm do. Second, we use our algorithmic development to implement an RNA design pipeline called Enzymer and make it available as an open source package useful for wet lab practitioners and RNA bioinformaticians. Enzymer uses multiple sequence alignment (MSA) data to generate initial design templates for further optimization. Our design pipeline can then be used to re-engineer naturally occurring RNA enzymes such as ribozymes and riboswitches. Our first and second contributions are published in the RNA section of the Journal of Frontiers in Genetics. Third, we use Enzymer to reengineer three different species of pseudoknotted ribozymes: a hammerhead ribozyme from the mouse gut metagenome, a hammerhead ribozyme from Yarrowia lipolytica and a glmS ribozyme from Thermoanaerobacter tengcogensis. We designed a total of 18 ribozyme sequences and showed the 16 of them were active in-vitro. Our experimental results have been submitted to the RNA journal and strongly suggest that Enzymer is a reliable tool to design pseudoknotted ncRNAs with desired secondary structure. Finally, we propose a novel architecture for a new ribozyme-based gene regulatory network where a hammerhead ribozyme modulates expression of a reporter gene when an external stimulus IPTG is present. Our in-vivo results show expected results in 7 out of 12 cases

    Large-Scale and Pan-Cancer Multi-omic Analyses with Machine Learning

    Get PDF
    Multi-omic data analysis has been foundational in many fields of molecular biology, including cancer research. Investigation of the relationship between different omic data types reveals patterns that cannot otherwise be found in a single data type alone. With recent technological advancements in mass spectrometry (MS), MS-based proteomics has enabled the quantification of thousands of proteins in hundreds of cell lines and human tissue samples. This thesis presents several machine learning-based methods that facilitate the integrative analysis of multi-omic data. First, we reviewed five existing multi-omic data integration methods and performed a benchmarking analysis, using a large-scale multi-omic cancer cell line dataset. We evaluated the performance of these machine learning methods for drug response prediction and cancer type classification. Our result provides recommendations to researchers regarding optimal machine learning method selection for their applications. Second, we generated a pan-cancer proteomic map of 949 cancer cell lines across 40 cancer types and developed a machine learning method DeeProM to analyse the multi-omic information of these lines. This pan-cancer proteomic map (ProCan-DepMapSanger) is now publicly available and represents a major resource for the scientific community, for biomarker discovery and for the study of fundamental aspects of protein regulation. Third, we focused on publicly available multi-omic datasets of both cancer cell lines and human tissue samples and developed a Transformer-based deep learning method, DeePathNet, which integrates human knowledge with machine intelligence. We applied DeePathNet on three evaluation tasks, namely drug response prediction, cancer type classification and breast cancer subtype classification. Taken together, our analyses and methods allowed more accurate cancer diagnosis and prognosis

    The Eukaryotic Chromatin Computer: Components, Mode of Action, Properties, Tasks, Computational Power, and Disease Relevance

    Get PDF
    Eukaryotic genomes are typically organized as chromatin, the complex of DNA and proteins that forms chromosomes within the cell\\\''s nucleus. Chromatin has pivotal roles for a multitude of functions, most of which are carried out by a complex system of covalent chemical modifications of histone proteins. The propagation of patterns of these histone post-translational modifications across cell divisions is particularly important for maintenance of the cell state in general and the transcriptional program in particular. The discovery of epigenetic inheritance phenomena - mitotically and/or meiotically heritable changes in gene function resulting from changes in a chromosome without alterations in the DNA sequence - was remarkable because it disproved the assumption that information is passed to daughter cells exclusively through DNA. However, DNA replication constitutes a dramatic disruption of the chromatin state that effectively amounts to partial erasure of stored information. To preserve its epigenetic state the cell reconstructs (at least part of) the histone post-translational modifications by means of processes that are still very poorly understood. A plausible hypothesis is that the different combinations of reader and writer domains in histone-modifying enzymes implement local rewriting rules that are capable of \\\"recomputing\\\" the desired parental patterns of histone post-translational modifications on the basis of the partial information contained in that half of the nucleosomes that predate replication. It is becoming increasingly clear that both information processing and computation are omnipresent and of fundamental importance in many fields of the natural sciences and the cell in particular. The latter is exemplified by the increasingly popular research areas that focus on computing with DNA and membranes. Recent work suggests that during evolution, chromatin has been converted into a powerful cellular memory device capable of storing and processing large amounts of information. Eukaryotic chromatin may therefore also act as a cellular computational device capable of performing actual computations in a biological context. A recent theoretical study indeed demonstrated that even relatively simple models of chromatin computation are computationally universal and hence conceptually more powerful than gene regulatory networks. In the first part of this thesis, I establish a deeper understanding of the computational capacities and limits of chromatin, which have remained largely unexplored. I analyze selected biological building blocks of the chromatin computer and compare it to system components of general purpose computers, particularly focusing on memory and the logical and arithmetical operations. I argue that it has a massively parallel architecture, a set of read-write rules that operate non-deterministically on chromatin, the capability of self-modification, and more generally striking analogies to amorphous computing. I therefore propose a cellular automata-like 1-D string as its computational paradigm on which sets of local rewriting rules are applied asynchronously with time-dependent probabilities. Its mode of operation is therefore conceptually similar to well-known concepts from the complex systems theory. Furthermore, the chromatin computer provides volatile memory with a massive information content that can be exploited by the cell. I estimate that its memory size lies in the realms of several hundred megabytes of writable information per cell, a value that I compare with DNA itself and cis-regulatory modules. I furthermore show that it has the potential to not only perform computations in a biological context but also in a strict informatics sense. At least theoretically it may therefore be used to calculate any computable function or algorithm more generally. Chromatin is therefore another representative of the growing number of non-standard computing examples. As an example for a biological challenge that may be solved by the \\\"chromatin computer\\\", I formulate epigenetic inheritance as a computational problem and develop a flexible stochastic simulation system for the study of recomputation-based epigenetic inheritance of individual histone post-translational modifications. The implementation uses Gillespie\\\''s stochastic simulation algorithm for exactly simulating the time evolution of the chemical master equation of the underlying stochastic process. Furthermore, it is efficient enough to use an evolutionary algorithm to find a system of enzymes that can stably maintain a particular chromatin state across multiple cell divisions. I find that it is easy to evolve such a system of enzymes even without explicit boundary elements separating differentially modified chromatin domains. However, the success of this task depends on several previously unanticipated factors such as the length of the initial state, the specific pattern that should be maintained, the time between replications, and various chemical parameters. All these factors also influence the accumulation of errors in the wake of cell divisions. Chromatin-regulatory processes and epigenetic (inheritance) mechanisms constitute an intricate and sensitive system, and any misregulation may contribute significantly to various diseases such as Alzheimer\\\''s disease. Intriguingly, the role of epigenetics and chromatin-based processes as well as non-coding RNAs in the etiology of Alzheimer\\\''s disease is increasingly being recognized. In the second part of this thesis, I explicitly and systematically address the two hypotheses that (i) a dysregulated chromatin computer plays important roles in Alzheimer\\\''s disease and (ii) Alzheimer\\\''s disease may be considered as an evolutionarily young disease. In summary, I found support for both hypotheses although for hypothesis 1, it is very difficult to establish causalities due to the complexity of the disease. However, I identify numerous chromatin-associated, differentially expressed loci for histone proteins, chromatin-modifying enzymes or integral parts thereof, non-coding RNAs with guiding functions for chromatin-modifying complexes, and proteins that directly or indirectly influence epigenetic stability (e.g., by altering cell cycle regulation and therefore potentially also the stability of epigenetic states). %Notably, we generally observed enrichment of probes located in non-coding regions, particularly antisense to known annotations (e.g., introns). For the identification of differentially expressed loci in Alzheimer\\\''s disease, I use a custom expression microarray that was constructed with a novel bioinformatics pipeline. Despite the emergence of more advanced high-throughput methods such as RNA-seq, microarrays still offer some advantages and will remain a useful and accurate tool for transcriptome profiling and expression studies. However, it is non-trivial to establish an appropriate probe design strategy for custom expression microarrays because alternative splicing and transcription from non-coding regions are much more pervasive than previously appreciated. To obtain an accurate and complete expression atlas of genomic loci of interest in the post-ENCODE era, this additional transcriptional complexity must be considered during microarray design and requires well-considered probe design strategies that are often neglected. This encompasses, for example, adequate preparation of a set of target sequences and accurate estimation of probe specificity. With the help of this pipeline, two custom-tailored microarrays have been constructed that include a comprehensive collection of non-coding RNAs. Additionally, a user-friendly web server has been set up that makes the developed pipeline publicly available for other researchers.Eukaryotische Genome sind typischerweise in Form von Chromatin organisiert, dem Komplex aus DNA und Proteinen, aus dem die Chromosomen im Zellkern bestehen. Chromatin hat lebenswichtige Funktionen in einer Vielzahl von Prozessen, von denen die meisten durch ein komplexes System von kovalenten Modifikationen an Histon-Proteinen ablaufen. Muster dieser Modifikationen sind wichtige Informationsträger, deren Weitergabe über die Zellteilung hinaus an beide Tochterzellen besonders wichtig für die Aufrechterhaltung des Zellzustandes im Allgemeinen und des Transkriptionsprogrammes im Speziellen ist. Die Entdeckung von epigenetischen Vererbungsphänomenen - mitotisch und/oder meiotisch vererbbare Veränderungen von Genfunktionen, hervorgerufen durch Veränderungen an Chromosomen, die nicht auf Modifikationen der DNA-Sequenz zurückzuführen sind - war bemerkenswert, weil es die Hypothese widerlegt hat, dass Informationen an Tochterzellen ausschließlich durch DNA übertragen werden. Die Replikation der DNA erzeugt eine dramatische Störung des Chromatinzustandes, welche letztendlich ein partielles Löschen der gespeicherten Informationen zur Folge hat. Um den epigenetischen Zustand zu erhalten, muss die Zelle Teile der parentalen Muster der Histonmodifikationen durch Prozesse rekonstruieren, die noch immer sehr wenig verstanden sind. Eine plausible Hypothese postuliert, dass die verschiedenen Kombinationen der Lese- und Schreibdomänen innerhalb von Histon-modifizierenden Enzymen lokale Umschreibregeln implementieren, die letztendlich das parentale Modifikationsmuster der Histone neu errechnen. Dies geschieht auf Basis der partiellen Informationen, die in der Hälfte der vererbten Histone gespeichert sind. Es wird zunehmend klarer, dass sowohl Informationsverarbeitung als auch computerähnliche Berechnungen omnipräsent und in vielen Bereichen der Naturwissenschaften von fundamentaler Bedeutung sind, insbesondere in der Zelle. Dies wird exemplarisch durch die zunehmend populärer werdenden Forschungsbereiche belegt, die sich auf computerähnliche Berechnungen mithilfe von DNA und Membranen konzentrieren. Jüngste Forschungen suggerieren, dass sich Chromatin während der Evolution in eine mächtige zelluläre Speichereinheit entwickelt hat und in der Lage ist, eine große Menge an Informationen zu speichern und zu prozessieren. Eukaryotisches Chromatin könnte also als ein zellulärer Computer agieren, der in der Lage ist, computerähnliche Berechnungen in einem biologischen Kontext auszuführen. Eine theoretische Studie hat kürzlich demonstriert, dass bereits relativ simple Modelle eines Chromatincomputers berechnungsuniversell und damit mächtiger als reine genregulatorische Netzwerke sind. Im ersten Teil meiner Dissertation stelle ich ein tieferes Verständnis des Leistungsvermögens und der Beschränkungen des Chromatincomputers her, welche bisher größtenteils unerforscht waren. Ich analysiere ausgewählte Grundbestandteile des Chromatincomputers und vergleiche sie mit den Komponenten eines klassischen Computers, mit besonderem Fokus auf Speicher sowie logische und arithmetische Operationen. Ich argumentiere, dass Chromatin eine massiv parallele Architektur, eine Menge von Lese-Schreib-Regeln, die nicht-deterministisch auf Chromatin operieren, die Fähigkeit zur Selbstmodifikation, und allgemeine verblüffende Ähnlichkeiten mit amorphen Berechnungsmodellen besitzt. Ich schlage deswegen eine Zellularautomaten-ähnliche eindimensionale Kette als Berechnungsparadigma vor, auf dem lokale Lese-Schreib-Regeln auf asynchrone Weise mit zeitabhängigen Wahrscheinlichkeiten ausgeführt werden. Seine Wirkungsweise ist demzufolge konzeptionell ähnlich zu den wohlbekannten Theorien von komplexen Systemen. Zudem hat der Chromatincomputer volatilen Speicher mit einem massiven Informationsgehalt, der von der Zelle benutzt werden kann. Ich schätze ab, dass die Speicherkapazität im Bereich von mehreren Hundert Megabytes von schreibbarer Information pro Zelle liegt, was ich zudem mit DNA und cis-regulatorischen Modulen vergleiche. Ich zeige weiterhin, dass ein Chromatincomputer nicht nur Berechnungen in einem biologischen Kontext ausführen kann, sondern auch in einem strikt informatischen Sinn. Zumindest theoretisch kann er deswegen für jede berechenbare Funktion benutzt werden. Chromatin ist demzufolge ein weiteres Beispiel für die steigende Anzahl von unkonventionellen Berechnungsmodellen. Als Beispiel für eine biologische Herausforderung, die vom Chromatincomputer gelöst werden kann, formuliere ich die epigenetische Vererbung als rechnergestütztes Problem. Ich entwickle ein flexibles Simulationssystem zur Untersuchung der epigenetische Vererbung von individuellen Histonmodifikationen, welches auf der Neuberechnung der partiell verlorengegangenen Informationen der Histonmodifikationen beruht. Die Implementierung benutzt Gillespies stochastischen Simulationsalgorithmus, um die chemische Mastergleichung der zugrundeliegenden stochastischen Prozesse über die Zeit auf exakte Art und Weise zu modellieren. Der Algorithmus ist zudem effizient genug, um in einen evolutionären Algorithmus eingebettet zu werden. Diese Kombination erlaubt es ein System von Enzymen zu finden, dass einen bestimmten Chromatinstatus über mehrere Zellteilungen hinweg stabil vererben kann. Dabei habe ich festgestellt, dass es relativ einfach ist, ein solches System von Enzymen zu evolvieren, auch ohne explizite Einbindung von Randelementen zur Separierung differentiell modifizierter Chromatindomänen. Dennoch ängt der Erfolg dieser Aufgabe von mehreren bisher unbeachteten Faktoren ab, wie zum Beispiel der Länge der Domäne, dem bestimmten zu vererbenden Muster, der Zeit zwischen Replikationen sowie verschiedenen chemischen Parametern. Alle diese Faktoren beeinflussen die Anhäufung von Fehlern als Folge von Zellteilungen. Chromatin-regulatorische Prozesse und epigenetische Vererbungsmechanismen stellen ein komplexes und sensitives System dar und jede Fehlregulation kann bedeutend zu verschiedenen Krankheiten, wie zum Beispiel der Alzheimerschen Krankheit, beitragen. In der Ätiologie der Alzheimerschen Krankheit wird die Bedeutung von epigenetischen und Chromatin-basierten Prozessen sowie nicht-kodierenden RNAs zunehmend erkannt. Im zweiten Teil der Dissertation adressiere ich explizit und auf systematische Art und Weise die zwei Hypothesen, dass (i) ein fehlregulierter Chromatincomputer eine wichtige Rolle in der Alzheimerschen Krankheit spielt und (ii) die Alzheimersche Krankheit eine evolutionär junge Krankheit darstellt. Zusammenfassend finde ich Belege für beide Hypothesen, obwohl es für erstere schwierig ist, aufgrund der Komplexität der Krankheit Kausalitäten zu etablieren. Dennoch identifiziere ich zahlreiche differentiell exprimierte, Chromatin-assoziierte Bereiche, wie zum Beispiel Histone, Chromatin-modifizierende Enzyme oder deren integrale Bestandteile, nicht-kodierende RNAs mit Führungsfunktionen für Chromatin-modifizierende Komplexe oder Proteine, die direkt oder indirekt epigenetische Stabilität durch veränderte Zellzyklus-Regulation beeinflussen. Zur Identifikation von differentiell exprimierten Bereichen in der Alzheimerschen Krankheit benutze ich einen maßgeschneiderten Expressions-Microarray, der mit Hilfe einer neuartigen Bioinformatik-Pipeline erstellt wurde. Trotz des Aufkommens von weiter fortgeschrittenen Hochdurchsatzmethoden, wie zum Beispiel RNA-seq, haben Microarrays immer noch einige Vorteile und werden ein nützliches und akkurates Werkzeug für Expressionsstudien und Transkriptom-Profiling bleiben. Es ist jedoch nicht trivial eine geeignete Strategie für das Sondendesign von maßgeschneiderten Expressions-Microarrays zu finden, weil alternatives Spleißen und Transkription von nicht-kodierenden Bereichen viel verbreiteter sind als ursprünglich angenommen. Um ein akkurates und vollständiges Bild der Expression von genomischen Bereichen in der Zeit nach dem ENCODE-Projekt zu bekommen, muss diese zusätzliche transkriptionelle Komplexität schon während des Designs eines Microarrays berücksichtigt werden und erfordert daher wohlüberlegte und oft ignorierte Strategien für das Sondendesign. Dies umfasst zum Beispiel eine adäquate Vorbereitung der Zielsequenzen und eine genaue Abschätzung der Sondenspezifität. Mit Hilfe der Pipeline wurden zwei maßgeschneiderte Expressions-Microarrays produziert, die beide eine umfangreiche Sammlung von nicht-kodierenden RNAs beinhalten. Zusätzlich wurde ein nutzerfreundlicher Webserver programmiert, der die entwickelte Pipeline für jeden öffentlich zur Verfügung stellt

    Current Challenges in Modeling Cellular Metabolism

    Get PDF
    Mathematical and computational models play an essential role in understanding the cellular metabolism. They are used as platforms to integrate current knowledge on a biological system and to systematically test and predict the effect of manipulations to such systems. The recent advances in genome sequencing techniques have facilitated the reconstruction of genome-scale metabolic networks for a wide variety of organisms from microbes to human cells. These models have been successfully used in multiple biotechnological applications. Despite these advancements, modeling cellular metabolism still presents many challenges. The aim of this Research Topic is not only to expose and consolidate the state-of-the-art in metabolic modeling approaches, but also to push this frontier beyond the current edge through the introduction of innovative solutions. The articles presented in this e-book address some of the main challenges in the field, including the integration of different modeling formalisms, the integration of heterogeneous data sources into metabolic models, explicit representation of other biological processes during phenotype simulation, and standardization efforts in the representation of metabolic models and simulation results

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
    corecore