13 research outputs found

    JCell : a Java framework for inferring genetic networks

    Get PDF
    JCell is a framework for reconstructing and simulating genetic networks in the field of molecular biology. It is completely implemented in Java. The main goal of JCell is to gain deep insights of molecular processes within a cell or tissue under various conditions such as drug concentrations or pathogenic mutations. This question has recently become a major area of research in the field of bioinformatics, because understanding the regulating dependencies enables new therapies of diseases like cancer or Alzheimer. To address the mentioned inference problem, several mathematical models and algorithms have been developed and implemented, which try to infer genetic relationships from genomic experiment data. The program consists of a modular structure, which enables users to utilize the framework also in other research areas such as metabolic pathway reconstruction, signalling cascade analysis or general biochemical processes. Further on, JCell can also be used in other contexts to identify dynamic systems from time series data such as financial applications or engineering problems. Usability was always the primary focus during development, so that even users without a strong computer science background are able to use the program. Another focus was the ability of JCell to natively import as much file formats as possible to be compatible with the most commonly used analysis tools. Due to the usage of the programming language Java, the framework is platform independent and thus able to work on most hardware/software systems. This is especially important for those research facilities where no expensive hardware can purchased and where no restrictions for the used operating systems can be implied. Further more, the framework is open to public development and new modules can be easily implemented.JCell ist ein komplett in Java realisiertes Framework zur Rekonstruktion und Simulation von genetischen Netzwerken in verschiedenen Bereichen der Molekularbiologie. Ziel ist die eingehende Untersuchung von Abläufen innerhalb einer Zelle oder eines Gewebetyps bei gleichzeitiger Zugabe von Wirkstoffen oder im Falle von krankhafter Entartung. Diese Fragestellung ist zur Zeit eines der wichtigsten Themengebiete der Bioinformatik, da das Verständnis von genetischer Regulation tiefgreifende Möglichkeiten der Diagnostik und Therapie von Krankheiten wie Krebs oder Alzheimer eröffnet. Zur Lösung des so genannten Netzwerk-Inferenzproblems wurden verschiedene Algorithmen und mathematische Modelle implementiert, die aus gegebenen genomischen Experimentdaten versuchen, regulatorische Interaktionen zu rekonstruieren. Da die gewählte Programmstruktur modular aufgebaut ist, wurden im Laufe der Entwicklung weitere Einsatzgebiete erschlossen. So kann JCell nun auch in anderen Gebieten der Systembiologie, wie zum Beispiel der Forschung im Bereich metabolischer Systeme und der Rekonstruktion von biochemischen Signalwegen innerhalb einer Zelle, eingesetzt werden. Des Weiteren liegen Anfragen von Biotech-Firmen vor, die dynamische Prozesse in biotechnologischen Anlagen besser verstehen wollen. Bei der Entwicklung war stets die einfache Benutzbarkeit der Applikation das primäre Ziel, damit auch Computer-Laien in der Lage sind, das Programm zu bedienen. Ein weiteres Augenmerk lag auf der Implementierung von Methoden zum Einlesen verschiedenster Dateiformate, sodass die gängigsten Analysetools für Genomexperimente unterstützt werden. Durch Verwendung der Programmiersprache Java ist eine weitreichende Plattformunabhängigkeit gewährleistet, sodass JCell auf den meisten Rechnerarchitekturen läuft. Dies hat den Vorteil, dass Anwender keine spezielle Hardware bereitstellen müssen und auch keinerlei Einschränkungen bei der Auswahl eines Betriebssystems haben. Daneben bietet Java noch den Vorteil, dass fremde Entwickler schnell eigene Module in das bestehende Framework einbinden können, was besonders im Hinblick auf die Open-Source-Verfügbarkeit eine wichtige Rolle spielt

    SBMLsqueezer: A CellDesigner plug-in to generate kinetic rate equations for biochemical networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The development of complex biochemical models has been facilitated through the standardization of machine-readable representations like SBML (Systems Biology Markup Language). This effort is accompanied by the ongoing development of the human-readable diagrammatic representation SBGN (Systems Biology Graphical Notation). The graphical SBML editor CellDesigner allows direct translation of SBGN into SBML, and vice versa. For the assignment of kinetic rate laws, however, this process is not straightforward, as it often requires manual assembly and specific knowledge of kinetic equations.</p> <p>Results</p> <p>SBMLsqueezer facilitates exactly this modeling step via automated equation generation, overcoming the highly error-prone and cumbersome process of manually assigning kinetic equations. For each reaction the kinetic equation is derived from the stoichiometry, the participating species (e.g., proteins, mRNA or simple molecules) as well as the regulatory relations (activation, inhibition or other modulations) of the SBGN diagram. Such information allows distinctions between, for example, translation, phosphorylation or state transitions. The types of kinetics considered are numerous, for instance generalized mass-action, Hill, convenience and several Michaelis-Menten-based kinetics, each including activation and inhibition. These kinetics allow SBMLsqueezer to cover metabolic, gene regulatory, signal transduction and mixed networks. Whenever multiple kinetics are applicable to one reaction, parameter settings allow for user-defined specifications. After invoking SBMLsqueezer, the kinetic formulas are generated and assigned to the model, which can then be simulated in CellDesigner or with external ODE solvers. Furthermore, the equations can be exported to SBML, LaTeX or plain text format.</p> <p>Conclusion</p> <p>SBMLsqueezer considers the annotation of all participating reactants, products and regulators when generating rate laws for reactions. Thus, for each reaction, only applicable kinetic formulas are considered. This modeling scheme creates kinetics in accordance with the diagrammatic representation. In contrast most previously published tools have relied on the stoichiometry and generic modulators of a reaction, thus ignoring and potentially conflicting with the information expressed through the process diagram. Additional material and the source code can be found at the project homepage (URL found in the Availability and requirements section).</p

    EDISA: extracting biclusters from multiple time-series of gene expression profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cells dynamically adapt their gene expression patterns in response to various stimuli. This response is orchestrated into a number of gene expression modules consisting of co-regulated genes. A growing pool of publicly available microarray datasets allows the identification of modules by monitoring expression changes over time. These time-series datasets can be searched for gene expression modules by one of the many clustering methods published to date. For an integrative analysis, several time-series datasets can be joined into a three-dimensional <it>gene-condition-time </it>dataset, to which standard clustering or biclustering methods are, however, not applicable. We thus devise a probabilistic clustering algorithm for <it>gene-condition-time </it>datasets.</p> <p>Results</p> <p>In this work, we present the EDISA (Extended Dimension Iterative Signature Algorithm), a novel probabilistic clustering approach for 3D <it>gene-condition-time </it>datasets. Based on mathematical definitions of gene expression modules, the EDISA samples initial modules from the dataset which are then refined by removing genes and conditions until they comply with the module definition. A subsequent extension step ensures gene and condition maximality. We applied the algorithm to a synthetic dataset and were able to successfully recover the implanted modules over a range of background noise intensities. Analysis of microarray datasets has lead us to define three biologically relevant module types: 1) We found modules with independent response profiles to be the most prevalent ones. These modules comprise genes which are co-regulated under several conditions, yet with a different response pattern under each condition. 2) Coherent modules with similar responses under all conditions occurred frequently, too, and were often contained within these modules. 3) A third module type, which covers a response specific to a single condition was also detected, but rarely. All of these modules are essentially different types of biclusters.</p> <p>Conclusion</p> <p>We successfully applied the EDISA to different 3D datasets. While previous studies were mostly aimed at detecting coherent modules only, our results show that coherent responses are often part of a more general module type with independent response profiles under different conditions. Our approach thus allows for a more comprehensive view of the gene expression response. After subsequent analysis of the resulting modules, the EDISA helped to shed light on the global organization of transcriptional control. An implementation of the algorithm is available at http://www-ra.informatik.uni-tuebingen.de/software/IAGEN/.</p

    Gene regulatory network modelling with evolutionary algorithms -an integrative approach

    Get PDF
    Building models for gene regulation has been an important aim of Systems Biology over the past years, driven by the large amount of gene expression data that has become available. Models represent regulatory interactions between genes and transcription factors and can provide better understanding of biological processes, and means of simulating both natural and perturbed systems (e.g. those associated with disease). Gene regulatory network (GRN) quantitative modelling is still limited, however, due to data issues such as noise and restricted length of time series, typically used for GRN reverse engineering. These issues create an under-determination problem, with many models possibly fitting the data. However, large amounts of other types of biological data and knowledge are available, such as cross-platform measurements, knockout experiments, annotations, binding site affinities for transcription factors and so on. It has been postulated that integration of these can improve model quality obtained, by facilitating further filtering of possible models. However, integration is not straightforward, as the different types of data can provide contradictory information, and are intrinsically noisy, hence large scale integration has not been fully explored, to date. Here, we present an integrative parallel framework for GRN modelling, which employs evolutionary computation and different types of data to enhance model inference. Integration is performed at different levels. (i) An analysis of cross-platform integration of time series microarray data, discussing the effects on the resulting models and exploring crossplatform normalisation techniques, is presented. This shows that time-course data integration is possible, and results in models more robust to noise and parameter perturbation, as well as reduced noise over-fitting. (ii) Other types of measurements and knowledge, such as knock-out experiments, annotated transcription factors, binding site affinities and promoter sequences are integrated within the evolutionary framework to obtain more plausible GRN models. This is performed by customising initialisation, mutation and evaluation of candidate model solutions. The different data types are investigated and both qualitative and quantitative improvements are obtained. Results suggest that caution is needed in order to obtain improved models from combined data, and the case study presented here provides an example of how this can be achieved. Furthermore, (iii), RNA-seq data is studied in comparison to microarray experiments, to identify overlapping features and possibilities of integration within the framework. The extension of the framework to this data type is straightforward and qualitative improvements are obtained when combining predicted interactions from single-channel and RNA-seq datasets

    Modeling metabolic networks in C. glutamicum: a comparison of rate laws in combination with various parameter optimization strategies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To understand the dynamic behavior of cellular systems, mathematical modeling is often necessary and comprises three steps: (1) experimental measurement of participating molecules, (2) assignment of rate laws to each reaction, and (3) parameter calibration with respect to the measurements. In each of these steps the modeler is confronted with a plethora of alternative approaches, e. g., the selection of approximative rate laws in step two as specific equations are often unknown, or the choice of an estimation procedure with its specific settings in step three. This overall process with its numerous choices and the mutual influence between them makes it hard to single out the best modeling approach for a given problem.</p> <p>Results</p> <p>We investigate the modeling process using multiple kinetic equations together with various parameter optimization methods for a well-characterized example network, the biosynthesis of valine and leucine in <it>C. glutamicum</it>. For this purpose, we derive seven dynamic models based on generalized mass action, Michaelis-Menten and convenience kinetics as well as the stochastic Langevin equation. In addition, we introduce two modeling approaches for feedback inhibition to the mass action kinetics. The parameters of each model are estimated using eight optimization strategies. To determine the most promising modeling approaches together with the best optimization algorithms, we carry out a two-step benchmark: (1) coarse-grained comparison of the algorithms on all models and (2) fine-grained tuning of the best optimization algorithms and models. To analyze the space of the best parameters found for each model, we apply clustering, variance, and correlation analysis.</p> <p>Conclusion</p> <p>A mixed model based on the convenience rate law and the Michaelis-Menten equation, in which all reactions are assumed to be reversible, is the most suitable deterministic modeling approach followed by a reversible generalized mass action kinetics model. A Langevin model is advisable to take stochastic effects into account. To estimate the model parameters, three algorithms are particularly useful: For first attempts the settings-free Tribes algorithm yields valuable results. Particle swarm optimization and differential evolution provide significantly better results with appropriate settings.</p

    Spectral methods for the detection and characterization of Topologically Associated Domains

    Get PDF
    The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for identification of hierarchies. Additionally, there are no publicly available tools for comparison of TADs across datasets. These tools are necessary to conduct large-scale genome-wide analysis and comparison of 3D structure. To address the challenge of TAD identification, we developed a novel sliding window-based spectral clustering framework that uses gaps between consecutive eigenvectors for TAD boundary identification. Our method, implemented in an R package, SpectralTAD, has automatic parameter selection, is robust to sequencing depth, resolution and sparsity of Hi-C data, and detects hierarchical, biologically relevant TADs. SpectralTAD outperforms four state-of-the-art TAD callers in simulated and experimental settings. We demonstrate that TAD boundaries shared among multiple levels of the TAD hierarchy were more enriched in classical boundary marks and more conserved across cell lines and tissues. SpectralTAD is available at http://bioconductor.org/packages/SpectralTAD/. To address the problem of TAD comparison, we developed TADCompare. TADCompare is based on a spectral clustering-derived measure called the eigenvector gap, which enables a loci-by-loci comparison of TAD boundary differences between datasets. Using this measure, we introduce methods for identifying differential and consensus TAD boundaries and tracking TAD boundary changes over time. We further propose a novel framework for the systematic classification of TAD boundary changes. Colocalization- and gene enrichment analysis of different types of TAD boundary changes revealed distinct biological functionality associated with them. TADCompare is available on https://github.com/dozmorovlab/TADCompare

    From Cellular Components to Living Cells (and Back): Evolution of Function in Biological Networks

    Get PDF
    Network models pervade modern biology. From ecosystems down to molecular interactions in cells, they provide abstraction and explanation for biological processes. Thus, the relation between structure and function of networks is central to any comprehensive attempt for a theoretical understanding of life. Just as any living system, biological networks are shaped by evolutionary processes. In reverse, artificial evolution can be employed to reconstruct networks and to study their evolution. To this end, I have implemented an evolutionary algorithm specifically designed for the evolution of network models. With the developed evolutionary framework, a study of the evolution of information-processing networks was performed. It is shown that selection favours an organisational structure that is related to function, such that computations can be visualised as transitions between organisations. Furthermore, mathematical modelling is applied to extract reaction-kinetic constants from fluorescence microscopy data, and the model is presented and discussed in detail. Using this approach, a detailed quantitative model of exchange dynamics at PML nuclear bodies (NBs) is created, showing that PML NB components exhibit highly individual exchange kinetics. The FRAP data for PML NBs is additionally used as a test-case for automatic model inference using evolutionary methods, and a set of necessary and sufficient criteria for a good model fit is revealed. In the last part of this thesis, a stochastic analysis of the genetic regulatory system of DEF-like and GLO-like class B floral homeotic genes provides an explanation for their intricate regulatory wiring. The different potential regulatory architectures are investigated using Monte Carlo simulation, a simplified master-equation model, and fixedpoint analysis. It is shown that a positive autoregulatory loop via obligate heterodimerisation of transcription factor proteins reduces noise in cell-fate organ identity decisions.Netzwerkmodelle sind weit verbreitet in der modernen Biologie. In allen Teilgebieten - von der Ökologie bis hin zur Molekularbiologie - bieten sie die Möglichkeit, untersuchte Prozesse und Phänomene zu abstrahieren und damit auf theoretischer Ebene zugänglich zu machen. Es wird ein evolutionärer Algorithmus vorgestellt, der speziell für die Erzeugung von Netzwerkmodellen angepasst ist. Dafür wurde eine Genetische Programmierung der Netzwerkstruktur mit einer Evolutionsstrategie auf den kinetischen Parametern verknüpft. Mit dem neu entwickelten Evolutionären Algorithmus wurde dann eine Studie zur Evolution von informationsverarbeitenden Netzwerken durchgeführt. Selektion erzeugt eine funktionale Organisationsstruktur, in welcher eine Berechnung als Transition zwischen Organisationen abgebildet werden kann. Desweiteren wurden mathematische Modellierungsmethoden verwendet, um kinetische Reaktionskonstanten aus fluoreszenz-mikroskopischen Daten zu gewinnen. Die verwendete Methode wird im Detail vorgestellt und diskutiert. Auf diese Weise entstand ein detailliertes Modell des Proteinaustauschs an PML nuclear bodies (NBs), in welchem die Komponenten der PML NBs sehr differenzierte Austauschverhalten zeigen. Darüber hinaus werden die gewonnenen Daten genutzt, um die automatische Evolution von Netzwerkmodellen in einer realistischen Fallstudie zu testen. Zum Schluss wird eine stochastische Analyse des Zusammenspiels der DEF- und GLO-Gene in der Blütenentwicklung gezeigt, welche eine Erklärung für ihre überraschend komplexe Verschaltung liefert. Die verschiedenen möglichen Regulationsmechanismen werden mithilfe von Monte-Carlo-Simulation, einem Master-Equation-Ansatz und der Fixpunktanalyse verglichen. Es wird gezeigt, dass positive Autoregulation durch obligatorische Heterodimerisierung den Einfluss des Zufalls auf die Organidentität reduziert

    Characterization of post-transcriptional regulatory network of RNA-binding proteins using computational predictions and deep sequencing data

    Get PDF
    This report is divided into three parts: Data Analysis, Mathematical Modeling and Conclusion and future directions. In the Data Analysis part, various methods and tools for characterizing the post-transcriptional regulatory networks of RNA-binding proteins are discussed and applied. Chapter 2 introduces PAR-CLIP, a method for transcriptomewide identification of RNA binding proteins at nucleotide resolution. PAR-CLIP was successfully applied on RNA binding proteins and their binding specificity was characterized. Partly due to their vast volume, the data that were so far generated in CLIP experiments have not been put in a form that enables fast and interactive exploration of binding sites. To address this need, Chapter 3 presents CLIPZ, which is a database and analysis environment for various kinds of deep sequencing (and in particular CLIP) data, that aims to provide an open-access repository of information for post-transcriptional regulatory elements. Chapter 4 revisits various CLIP methods. A set of ideas in terms of both experimental protocols and data analysis are presented to improve the quality and reproducibility of such experiments. In general, cytoplasmic RNAs are isolated in CLIP experiments. Like many high-throughput experiments, CLIP has a certain amount of isolated RNAs which do not represent regulatory binding sites. To improve the quality of the obtained RNAs, a set of novel methods for data analysis are also suggested. These methods are added as new tools to the CLIPZ analysis platform. Argonaute CLIP data could in principle be beneficial in improving the microRNA target site predictions. However, several questions still remain which cannot be addressed using CLIP methods. For example: • Argonaute CLIP data by default does not reveal which microRNAs are more likely to interact to the mRNA binding site at the time of cross-linking. • As mentioned earlier, biochemical and structural studies of Thermus thermophilus Argonaute protein suggest that the protein-RNA interaction between microRNA and the Argonaute protein forms a physical structure that only some positions in the microRNA become accessible to the target binding site. Having inferred the interacting microRNA, it is also interesting to predict the most plausible secondary structure of the hybridized microRNA-mRNA complex. Mathematical Modeling part of the report contains Chapter 5. This chapter presents a novel mathematical model called MIRZA to address the above mentioned questions. An in-depth introduction to MIRZA is presented and its performance in terms of identifying functionally relevant targets of microRNAs is discussed. Finally, Conclusion and future directions part of the report contains Chapter 6 in which discusses the main findings of the projects and gives an outlook of where future work could be taken up
    corecore