24 research outputs found

    GEDEVO: An Evolutionary Graph Edit Distance Algorithm for Biological Network Alignment

    Get PDF
    Introduction: With the so-called OMICS technology the scientific community has generated huge amounts of data that allow us to reconstruct the interplay of all kinds of biological entities. The emerging interaction networks are usually modeled as graphs with thousands of nodes and tens of thousands of edges between them. In addition to sequence alignment, the comparison of biological networks has proven great potential to infer the biological function of proteins and genes. However, the corresponding network alignment problem is computationally hard and theoretically intractable for real world instances. Results: We therefore developed GEDEVO, a novel tool for efficient graph comparison dedicated to real-world size biological networks. Underlying our approach is the so-called Graph Edit Distance (GED) model, where one graph is to be transferred into another one, with a minimal number of (or more general: minimal costs for) edge insertions and deletions. We present a novel evolutionary algorithm aiming to minimize the GED, and we compare our implementation against state of the art tools: SPINAL, GHOST, CGRAAL, and MIGRAAL. On a set of protein-protein interaction networks from different organisms we demonstrate that GEDEVO outperforms the current methods. It thus refines the previously suggested alignments based on topological information only. Conclusion: With GEDEVO, we account for the constantly exploding number and size of available biological networks. The software as well as all used data sets are publicly available at http://gedevo.mpi-inf.mpg.de

    EGIA–evolutionary optimisation of gene regulatory networks, an integrative approach

    Get PDF
    Quantitative modelling of gene regulatory networks (GRNs) is still limited by data issues such as noise and the restricted length of available time series, creating an under-determination problem. However, large amounts of other types of biological data and knowledge are available, such as knockout experiments, annotations and so on, and it has been postulated that integration of these can improve model quality. However, integration has not been fully explored, to date. Here, we present a novel integrative framework for different types of data that aims to enhance model inference. This is based on evolutionary computation and uses different types of knowledge to introduce a novel customised initialisation and mutation operator and complex evaluation criteria, used to distinguish between candidate models. Specifically, the algorithm uses information from (i) knockout experiments, (ii) annotations of transcription factors, (iii) binding site motifs (expressed as position weight matrices) and (iv) DNA sequence of gene promoters, to drive the algorithm towards more plausible network structures. Further, the evaluation basis is also extended to include structure information included in these additional data. This framework is applied to both synthetic and real gene expression data. Models obtained by data integration display both quantitative and qualitative improvement

    Indentifying sub-network functional modules in protein undirected networks

    Get PDF
    Protein networks are usually used to describe the interacting behaviours of complex biosystems. Bioinformatics must be able to provide methods to mine protein undirected networks and to infer subnetworks of interacting proteins for identifying relevant biological pathways. Here we present FunMod an innovative Cytoscape version 2.8 plugin able to identify biologically significant sub-networks within informative protein networks, enabling new opportunities for elucidating pathways involved in diseases. Moreover FunMod calculates three topological coefficients for each subnetwork, for a better understanding of the cooperative interactions between proteins and discriminating the role played by each protein within a functional module. FunMod is the first Cytoscape plugin with the ability of combining pathways and topological analysis allowing the identification of the key proteins within sub-network functional modules

    Σχεδίαση Aρχιτεκτονικής SoC για τον FRM-SSA

    Get PDF
    Στην Ενότητα 2 παρουσιάζονται οι στοχαστικές μέθοδοι προσομοίωσης και αλγόριθμοι SSA και FRM-SSA του Gillespie. Στην Ενότητα 3 παρουσιάζονται αναλυτικά οι προδιαγραφές του συστήματος που υλοποιήθηκε, ο βαθμός παραμετροποίησης του καθώς και οι τρόποι λειτουργίας του. Στην Ενότητα 4 αναλύεται η αρχιτεκτονική FRM SoC σε επίπεδο συστήματος καθώς επίσης γίνεται και σύντομη αναφορά στο σύστημα επικοινωνίας υπολογιστή και συστήματος. Στην Ενότητα 5 παρουσιάζεται η αρχιτεκτονική της επεξεργαστικής μονάδας (FRM Processing Unit - FPU) ενός SSA Core. Δίνεται έμφαση στη δίοδο δεδομένων της FPU ενώ περιγράφονται αναλυτικά και οι υπόλοιπες μονάδες που πλαισιώνουν τη δίοδο δεδομένων της FPU. Επιπλέον παρουσιάζεται και η θεωρητική μελέτη των επιδόσεων που έγινε κατά το σχεδιασμό. Στην Ενότητα 6 παρουσιάζονται τα στατιστικά αποτελέσματα που προέκυψαν από τη σύνθεση του συστήματος για διάφορους τρόπους λειτουργίας. Στην 7 και τελευταία ενότητα παρουσιάζονται πραγματικά αποτέλεσμα από δοκιμές του συστήματος με σκοπό την επικύρωση της σχεδίασης. Για αυτό το λόγο γίνεται σύγκριση των αποτελεσμάτων με τα αποτελέσματα γνωστών πλατφόρμων προσομοίωσης

    Integration of large datasets for plant model organisms

    Get PDF
    This dissertation is concerned with bioinformatics data integration. The first chapter illustrates the current state of biological pathway databases in general, and in particular, plant pathway databases. Key studies are cited to illustrate the potential benefits that may come from further research into integration methods. Different models are explored to interface with the various stakeholders of biological data repositories. A public website (http://www.metnetonline.org) was built to address the role of a bioinformatics data warehouse as a server for external third parties. A dedicated API (MetNetAPI: http://www.metnetonline.org/api) accommodates bioinformaticians (and software developers in general) who wish to build advanced applications on top of MetNet. The API (implemented as .NET and Java libraries) was designed to be as user-friendly to programmers, as the public website is to end-users. Finally, a hybrid model is examined: the use of XML as a repository for information integration, downstream processing, and data manipulation. An overview of the use of XML in biological applications is included. MetNetAPI functions according to certain principles; a subset of the API is abstracted and implemented to interface with a range of other public databases. This results in a new bioinformatics toolkit that can be used to mix and match data from heterogeneous sources in a transparent manner. An example would be the grafting of protein-protein interaction data on top of araCyc pathways. Biological network data is often distributed over a variety of independently modeled databases. This dissertation makes two contributions to the field of bioinformatics: A new service - MetNet Online - is now operating which offers access to the earlier created and integrated MetNetDB data repository. The service is geared toward end-users, students and researchers alike, as well as seasoned bioinformatics software developers who wish to build their own applications on top of an already integrated datasource. Furthermore, integrated databases are only useful when they can be synchronized with their respective external sources. Thus, a framework was created that allows for a systematic approach to such integration efforts. In closing, this work provides a roadmap to maintain current as well as prepare for future integrated biological database projects

    HSimulator: Hybrid Stochastic/Deterministic Simulation of Biochemical Reaction Networks

    Get PDF
    HSimulator is a multithread simulator for mass-action biochemical reaction systems placed in a well-mixed environment. HSimulator provides optimized implementation of a set of widespread state-of-the-art stochastic, deterministic, and hybrid simulation strategies including the first publicly available implementation of the Hybrid Rejection-based Stochastic Simulation Algorithm (HRSSA). HRSSA, the fastest hybrid algorithm to date, allows for an efficient simulation of the models while ensuring the exact simulation of a subset of the reaction network modeling slow reactions. Benchmarks show that HSimulator is often considerably faster than the other considered simulators. The software, running on Java v6.0 or higher, offers a simulation GUI for modeling and visually exploring biological processes and a Javadoc-documented Java library to support the development of custom applications. HSimulator is released under the COSBI Shared Source license agreement (COSBI-SSLA)

    Modular Algorithms for Biomolecular Network Alignment

    Get PDF
    Comparative analysis of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. The rapidly advancing field of systems biology aims to understand the structure, function, dynamics, and evolution of complex biological systems in terms of the underlying networks of interactions among the large number of molecular participants involved including genes, proteins, and metabolites. In particular, the comparative analysis of network models representing biomolecular interactions in different species or tissues offers an important tool for identifying conserved modules, predicting functions of specific genes or proteins and studying the evolution of biological processes, among other applications. The primary focus of this dissertation is on the biomolecular network alignment problem: Given two or more network models, the problem is to optimally match the nodes and links in one network with the nodes and links of the other. The Biomolecular Network Alignment (BiNA) Toolkit developed as part of this dissertation provides a set of efficient (in terms of the running time complexity) and accurate (in terms of various evaluation criteria discussed in the literature) network alignment algorithms for biomolecular networks. BiNA is scalable, user-friendly, modular, and extensible for performing alignments on diverse types of biomolecular networks. The algorithm is applicable to (1) undirected graphs in their weighted and unweighted variations (2) undirected graphs in their labeled and unlabeled variations (3) and has been applied to align multiple networks from hundreds of nodes with a few thousand edges to networks with tens of thousands of nodes with millions of edges. The dissertation provides various applications of network comparison tools including how results from such alignments have been utilized to (1) construct phylogenetic trees based on protein-protein interaction networks, and (2) find biochemical pathways involved in ligand recognition in B cells

    Innovative Algorithms and Evaluation Methods for Biological Motif Finding

    Get PDF
    Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs, and how to validate the biological significance of the ‘candidate motifs’ becomes an important question. Some of sequence motifs are verified by their structural similarities or their functional roles in DNA or protein sequences, and stored in databases. However, biological role of network motifs is still invalidated and currently no databases exist for this purpose. In this thesis, we focus not only on the computational efficiency but also on the biological meanings of the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. Experimental results show that the algorithms perform better than existing algorithms by producing a larger number of high-quality of biological motifs. In addition, we apply biological network motifs for the discovery of essential proteins. Essential proteins are defined as a minimum set of proteins which are vital for development to a fertile adult and in a cellular life in an organism. We design a new centrality algorithm with biological network motifs, named MCGO, and score proteins in a protein-protein interaction (PPI) network to find essential proteins. MCGO is also combined with other centrality measures to predict essential proteins using machine learning techniques. We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins
    corecore