167 research outputs found

    Optimised analysis and visualisation of metabolic data using graph theoretical approaches

    Get PDF
    Since the completion of the Human Genome Project in 2003, it has become increasingly apparent that while genomics has a major role to play in the understanding of human biology, information from other disciplines is necessary to explain the web of interacting signals that allow our bodies to function on a day to day basis and respond to rapid changes in our local environment. One such field, that of metabolomics, focuses on the study of the set of low molecular weight compounds (metabolites) involved in metabolism. Metabolomic studies aim to quantify the concentrations of each of these compounds within a subject under particular conditions, resulting in either information on the physiological effects of a disease or environmental factor (such as a toxin) on the organism, or the identification of metabolites or groups of metabolites that serve as biochemical markers for a state or illness. Whilst metabolite concentrations alone can give great insight into a chosen state, additional information can be obtained by considering the ways in which metabolites interact with each other as parts of a larger system. One method of tackling this problem, metabolic networks, is gaining popularity within the community as it offers a complementary approach to the traditional biological method for studying metabolism, the metabolic pathway. Construction methods are varied; ranging from the mapping of experimental data onto pathway diagrams, through the use of correlation-based techniques, to the analysis of time-series data of metabolic fluxes. However, while many attempts have been made to capture and visualise the complex web of reactions within an organism, few have yet succeeded in showing how they can be used to help identify the metabolites that are most significantly involved in the differences between groups of biological samples. This thesis discusses ways in which graphs may be used to aid researchers in both the visualisation and interpretation of metabolomic datasets, and provide a platform for more automated analysis techniques. To that end, it first presents the background to the relevant topics, metabolomics and graph theory, before moving on to show how metabolic correlation networks can be used to identify and visualise differences in metabolism between groups of subjects. It then introduces Linked Metabolites, a software package that has been developed to help researchers explain differences in metabolism by highlighting relationships between metabolites within the metabolic pathways, and to compile those relationships into directed metabolic graphs suitable for analysis using metrics from graph theory. Finally, the thesis explains how the directed metabolic graphs produced by Linked Metabolites could potentially be used to integrate data gathered from the same sample using different experimental techniques, refining the areas of the underlying biochemistry needing further investigation

    In-silico-Systemanalyse von Biopathways

    Get PDF
    Chen M. In silico systems analysis of biopathways. Bielefeld (Germany): Bielefeld University; 2004.In the past decade with the advent of high-throughput technologies, biology has migrated from a descriptive science to a predictive one. A vast amount of information on the metabolism have been produced; a number of specific genetic/metabolic databases and computational systems have been developed, which makes it possible for biologists to perform in silico analysis of metabolism. With experimental data from laboratory, biologists wish to systematically conduct their analysis with an easy-to-use computational system. One major task is to implement molecular information systems that will allow to integrate different molecular database systems, and to design analysis tools (e.g. simulators of complex metabolic reactions). Three key problems are involved: 1) Modeling and simulation of biological processes; 2) Reconstruction of metabolic pathways, leading to predictions about the integrated function of the network; and 3) Comparison of metabolism, providing an important way to reveal the functional relationship between a set of metabolic pathways. This dissertation addresses these problems of in silico systems analysis of biopathways. We developed a software system to integrate the access to different databases, and exploited the Petri net methodology to model and simulate metabolic networks in cells. It develops a computer modeling and simulation technique based on Petri net methodology; investigates metabolic networks at a system level; proposes a markup language for biological data interchange among diverse biological simulators and Petri net tools; establishes a web-based information retrieval system for metabolic pathway prediction; presents an algorithm for metabolic pathway alignment; recommends a nomenclature of cellular signal transduction; and attempts to standardize the representation of biological pathways. Hybrid Petri net methodology is exploited to model metabolic networks. Kinetic modeling strategy and Petri net modeling algorithm are applied to perform the processes of elements functioning and model analysis. The proposed methodology can be used for all other metabolic networks or the virtual cell metabolism. Moreover, perspectives of Petri net modeling and simulation of metabolic networks are outlined. A proposal for the Biology Petri Net Markup Language (BioPNML) is presented. The concepts and terminology of the interchange format, as well as its syntax (which is based on XML) are introduced. BioPNML is designed to provide a starting point for the development of a standard interchange format for Bioinformatics and Petri nets. The language makes it possible to exchange biology Petri net diagrams between all supported hardware platforms and versions. It is also designed to associate Petri net models and other known metabolic simulators. A web-based metabolic information retrieval system, PathAligner, is developed in order to predict metabolic pathways from rudimentary elements of pathways. It extracts metabolic information from biological databases via the Internet, and builds metabolic pathways with data sources of genes, sequences, enzymes, metabolites, etc. The system also provides a navigation platform to investigate metabolic related information, and transforms the output data into XML files for further modeling and simulation of the reconstructed pathway. An alignment algorithm to compare the similarity between metabolic pathways is presented. A new definition of the metabolic pathway is proposed. The pathway defined as a linear event sequence is practical for our alignment algorithm. The algorithm is based on strip scoring the similarity of 4-hierarchical EC numbers involved in the pathways. The algorithm described has been implemented and is in current use in the context of the PathAligner system. Furthermore, new methods for the classification and nomenclature of cellular signal transductions are recommended. For each type of characterized signal transduction, a unique ST number is provided. The Signal Transduction Classification Database (STCDB), based on the proposed classification and nomenclature, has been established. By merging the ST numbers with EC numbers, alignments of biopathways are possible. Finally, a detailed model of urea cycle that includes gene regulatory networks, metabolic pathways and signal transduction is demonstrated by using our approaches. A system biological interpretation of the observed behavior of the urea cycle and its related transcriptomics information is proposed to provide new insights for metabolic engineering and medical care

    Difference in the distribution pattern of substrate enzymes in the metabolic network of Escherichia coli, according to chaperonin requirement

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chaperonins are important in living systems because they play a role in the folding of proteins. Earlier comprehensive analyses identified substrate proteins for which folding requires the chaperonin GroEL/GroES (GroE) in <it>Escherichia coli</it>, and they revealed that many chaperonin substrates are metabolic enzymes. This result implies the importance of chaperonins in metabolism. However, the relationship between chaperonins and metabolism is still unclear.</p> <p>Results</p> <p>We investigated the distribution of chaperonin substrate enzymes in the metabolic network using network analysis techniques as a first step towards revealing this relationship, and found that as chaperonin requirement increases, substrate enzymes are more laterally distributed in the metabolic. In addition, comparative genome analysis showed that the chaperonin-dependent substrates were less conserved, suggesting that these substrates were acquired later on in evolutionary history.</p> <p>Conclusions</p> <p>This result implies the expansion of metabolic networks due to this chaperonin, and it supports the existing hypothesis of acceleration of evolution by chaperonins. The distribution of chaperonin substrate enzymes in the metabolic network is inexplicable because it does not seem to be associated with individual protein features such as protein abundance, which has been observed characteristically in chaperonin substrates in previous works. However, it becomes clear by considering this expansion process due to chaperonin. This finding provides new insights into metabolic evolution and the roles of chaperonins in living systems.</p

    Nutritional Systems Biology

    Get PDF

    Text mining for metabolic reaction extraction from scientific literature

    Get PDF
    Science relies on data in all its different forms. In molecular biology and bioinformatics in particular large scale data generation has taken centre stage in the form of high-throughput experiments. In line with this exponential increase of experimental data has been the near exponential growth of scientific publications. Yet where classical data mining techniques are still capable of coping with this deluge in structured data (Chapter 2), access of information found in scientific literature is still limited to search engines allowing searches on the level keywords, titles and abstracts. However, large amounts of knowledge about biological entities and their relations are held within the body of articles. When extracted, this data can be used as evidence for existing knowledge or hypothesis generation making scientific literature a valuable scientific resource. To unlock the information inside the articles requires a dedicated set of techniques and approaches tailored to the unstructured nature of free text. Analogous to the field of data mining for the analysis of structured data, the field of text mining has emerged for unstructured text and a number of applications has been developed in that field. This thesis is about text mining in the field of metabolomics. The work focusses on strategies for accessing large collections of scientific text and on the text mining steps required to extract metabolic reactions and their constituents, enzymes and metabolites, from scientific text. Metabolic reactions are important for our understanding of metabolic processes within cells and that information provides an important link between genotype phenotype. Furthermore information about metabolic reactions stored in databases is far from complete making it an excellent target for our text mining application. In order to access the scientific publications for further analysis they can be used as flat text or loaded into database systems. In Chapter 2we assessed and discussed the capabilities and performance of XML-type database systems to store and access very large collections of XML-type documents in the form of the Medline corpus, a collection of more than 20 million of scientific abstracts. XML data formats are common in the field of bioinformatics and are also at the core of most web services. With the increasing amount of data stored in XML comes the need for storing and accessing the data. The database systems were evaluated on a number of aspects broadly ranging from technical requirements to ease-of-use and performance. The performance of the different XML-type database systems was measured Medline abstract collections of increasing size and with a number of different queries. One of the queries assessed the capabilities of each database system to search the full-text of each abstract, which would allow access to the information within the text without further text analysis. The results show that all database systems cope well with the small and medium dataset, but that the full dataset remains a challenge. Also the query possibilities varied greatly across all studied databases. This led us to conclude that the performances and possibilities of the different database types vary greatly, also depending on the type of research question. There is no single system that outperforms the others; instead different circumstances can lead to a different optimal solution. Some of these scenarios are presented in the chapter. Among the conclusions of Chapter 2is that conventional data mining techniques do not work for the natural language part of a publication beyond simple retrieval queries based on pattern matching. The natural language used in written text is too unstructured for that purpose and requires dedicated text mining approaches, the main research topic of this thesis. Two major tasks of text mining are named entity recognition, the identification of relevant entities in the text, and relation extraction, the identification of relations between those named entities. For both text mining tasks many different techniques and approaches have been developed. For the named entity recognition of enzymes and metabolites we used a dictionary-based approach (Chapter 3) and for metabolic reaction extraction a full grammar approach (Chapter 4). In Chapter 3we describe the creation of two thesauri, one for enzymes and one for metabolites with the specific goal of allowing named entity identification, the mapping of identified synonyms to a common identifier, for metabolic reaction extraction. In the case of the enzyme thesaurus these identifiers are Enzyme Nomenclature numbers (EC number), in the case of the metabolite thesaurus KEGG metabolite identifiers. These thesauri are applied to the identification of enzymes and metabolites in the text mining approach of Chapter 4. Both were created from existing data sources by a series of automated steps followed by manual curation. Compared to a previously published chemical thesaurus, created entirely with automated steps, our much smaller metabolite thesaurus performed on the same level for F-measure with a slightly higher precision. The enzyme thesaurus produced results equal to our metabolite thesaurus. The compactness of our thesauri permits the manual curation step important in guaranteeing accuracy of the thesaurus contents, whereas creation from existing resources by automated means limits the effort required for creation. We concluded that our thesauri are compact and of high quality, and that this compactness does not greatly impact recall. In Chapter 4we studied the applicability and performance of a full parsing approach using the two thesauri described in Chapter 3 for the extraction of metabolic reactions from scientific full-text articles. For this we developed a text mining pipeline built around a modified dependency parser from the AGFL grammar lab using a pattern-based approach to extract metabolic reactions from the parsing output. Results of a comparison to a modified rule-based approach by Czarnecki et al.using three previously described metabolic pathways from the EcoCyc database show a slightly lower recall compared to the rule-based approach, but higher precision. We concluded that despite its current recall our full parsing approach to metabolic reaction extraction has high precision and potential to be used to (re-)construct metabolic pathways in an automated setting. Future improvements to the grammar and relation extraction rules should allow reactions to be extracted with even higher specificity. To identify potential improvements to the recall, the effect of a number of text pre-processing steps on the performance was tested in a number of experiments. The one experiment that had the most effect on performance was the conversion of schematic chemical formulas to syntactic complete sentences allowing them to be analysed by the parser. In addition to the improvements to the text mining approach described in Chapter 4I make suggestions in Chapter 5 for potential improvements and extensions to our full parsing approach for metabolic reaction extraction. Core focus here is the increase of recall by optimising each of the steps required for the final goal of extracting metabolic reactions from the text. Some of the discussed improvements are to increase the coverage of the used thesauri, possibly with specialist thesauri depending on the analysed literature. Another potential target is the grammar, where there is still room to increase parsing success by taking into account the characteristics of biomedical language. On a different level are suggestions to include some form of anaphora resolution and across sentence boundary search to increase the amount of information extracted from literature. In the second part of Chapter 5I make suggestions as to how to maximise the information gained from the text mining results. One of the first steps should be integration with other biomedical databases to allow integration with existing knowledge about metabolic reactions and other biological entities. Another aspect is some form of ranking or weighting of the results to be able to distinguish between high quality results useful for automated analyses and lower quality results still useful for manual approaches. Furthermore I provide a perspective on the necessity of computational literature analysis in the form of text mining. The main reasoning here is that human annotators cannot keep up with the amount of publications so that some form of automated analysis is unavoidable. Lastly I discuss the role of text mining in bioinformatics and with that also the accessibility of both text mining results and the literature resources necessary to create them. An important requirement for the future of text mining is that the barriers around high-throughput access to literature for text mining applications have to be removed. With regards to accessing text mining results, there is a long way to go for many applications, including ours, before they can be used directly by biologists. A major factor is that these applications rarely feature a suitable user interface and easy to use setup. To conclude, I see the main role of a text mining system like ours mainly in gathering evidence for existing knowledge and giving insights into the nuances of the research landscape of a given topic. When using the results of our reaction extraction system for the identification of ‘new’ reactions it is important to go back to the actual evidence presented for extra validations and to cross-validate the predictions with other resources or experiments. Ideally text mining will be used for generation of hypotheses, in which the researcher uses text mining findings to get ideas on, in our case, new connections between metabolites and enzymes; subsequently the researcher needs to go back to the original texts for further study. In this role text mining is an essential tool on the workbench of the molecular biologist.</p

    The integrated analysis of metabolic and protein interaction networks reveals novel molecular organizing principles

    Get PDF
    Background: The study of biological interaction networks is a central theme of systems biology. Here, we investigate the relationships between two distinct types of interaction networks: the metabolic pathway map and the protein-protein interaction network (PIN). It has long been established that successive enzymatic steps are often catalyzed by physically interacting proteins forming permanent or transient multi-enzymes complexes. Inspecting high-throughput PIN data, it was shown recently that, indeed, enzymes involved in successive reactions are generally more likely to interact than other protein pairs. In our study, we expanded this line of research to include comparisons of the underlying respective network topologies as well as to investigate whether the spatial organization of enzyme interactions correlates with metabolic efficiency. Results: Analyzing yeast data, we detected long-range correlations between shortest paths between proteins in both network types suggesting a mutual correspondence of both network architectures. We discovered that the organizing principles of physical interactions between metabolic enzymes differ from the general PIN of all proteins. While physical interactions between proteins are generally dissortative, enzyme interactions were observed to be assortative. Thus, enzymes frequently interact with other enzymes of similar rather than different degree. Enzymes carrying high flux loads are more likely to physically interact than enzymes with lower metabolic throughput. In particular, enzymes associated with catabolic pathways as well as enzymes involved in the biosynthesis of complex molecules were found to exhibit high degrees of physical clustering. Single proteins were identified that connect major components of the cellular metabolism and may thus be essential for the structural integrity of several biosynthetic systems. Conclusion: Our results reveal topological equivalences between the protein interaction network and the metabolic pathway network. Evolved protein interactions may contribute significantly towards increasing the efficiency of metabolic processes by permitting higher metabolic fluxes. Thus, our results shed further light on the unifying principles shaping the evolution of both the functional (metabolic) as well as the physical interaction network

    Network design and analysis for multi-enzyme biocatalysis

    Get PDF
    In vitro synthesis is a biotechnological alternative to classic chemical catalysts. However, the manual design of multi-step biosynthesis routes is very challenging, especially when enzymes from different organisms are involved. There is therefore a demand for in silico tools to guide the design of such synthesis routes using computational methods for the path-finding, as well as the reconstruction of suitable genome-scale metabolic networks that are able to harness the growing amount of biological data available. This work presents an algorithm for finding pathways from arbitrary metabolites to a target product of interest. The algorithm is based on a mixed-integer linear program (MILP) and combines graph topology and reaction stoichiometry. The pathway candidates are ranked using different ranking criteria to help finding the best suited synthesis pathway candidates. Additionally, a comprehensive workflow for the reconstruction of metabolic networks based on data of the Kyoto Encyclopedia of Genes and Genomes (KEGG) combined with thermodynamic data for the determination of reaction directions is presented. The workflow comprises a filtering scheme to remove unsuitable data. With this workflow, a panorganism network reconstruction as well as single organism network models are established. These models are analyzed with graph-theoretical methods. It is also discussed how the results can be used for the planning of biosynthetic production pathways.In vitro Synthese ist eine biotechnologische Alternative zu klassischen chemischen Katalysen. Der manuelle Entwurf von mehrstufigen Biosynthesewegen ist jedoch sehr anspruchsvoll, vor allem wenn Enzyme verschiedener Organismen beteiligt sind. Daher besteht ein Bedarf an Methoden, die helfen solche Synthesewege in silico zu entwerfen und die in der Lage sind große Mengen biologischer Daten zu bewältigen - insbesondere in Hinblick auf die Rekonstruktion genomskaliger metabolischer Netzwerkmodelle und die Pfadsuche in solchen Netzwerken. In dieser Arbeit wird ein Algorithmus zur Pfadsuche zu einem Zielprodukt ausgehend von beliebigen Substraten präsentiert. Der Algorithmus basiert auf einem gemischt-ganzzahligen linearen Programm, das Graphtopologie mit Reaktionsstöchiometrien kombiniert. Die Pfadkandidaten werden anhand verschiedener Kriterien geordnet, um die am besten geeigneten Kandidaten für die Synthese zu finden. Außerdem wird ein umfassender Workflow für die Rekonstruktion metabolischer Netzwerke basierend auf der Datenbank KEGG sowie thermodynamischen Daten vorgestellt. Dieser umfasst einen Filter, der anhand verschiedener Kriterien geeignete Reaktionen auswählt. Der Workflow wird zum Erstellen einer organismusübergreifenden Netzwerkrekonstruktion, sowie Netzwerken einzelner Organismen genutzt. Diese Modelle werden mit graphentheoretischen Methoden analysiert. Es wird diskutiert, wie die Ergebnisse für die Planung von biosynthetischen Produktionswegen genutzt werden können.BMBF; Initiative “Biotechnologie 2020+: Basistechnologien für eine nächste Generation biotechnologischer Verfahren”; Projekt MECA
    corecore