Search CORE

1,338 research outputs found

A Study of Geometric Semantic Genetic Programming with Linear Scaling

Author: Sakallioglu Berfin
Publication venue
Publication date: 10/04/2023
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceMachine Learning (ML) is a scientific discipline that endeavors to enable computers to learn without the need for explicit programming. Evolutionary Algorithms (EAs), a subset of ML algorithms, mimic Darwin’s Theory of Evolution by using natural selection mechanisms (i.e., survival of the fittest) to evolve a group of individuals (i.e., possible solutions to a given problem). Genetic Programming (GP) is the most recent type of EA and it evolves computer programs (i.e., individuals) to map a set of input data into known expected outputs. Geometric Semantic Genetic Programming (GSGP) extends this concept by allowing individuals to evolve and vary in the semantic space, where the output vectors are located, rather than being constrained by syntaxbased structures. Linear Scaling (LS) is a method that was introduced to facilitate the task of GP of searching for the best function matching a set of known data. GSGP and LS have both, independently, shown the ability to outperform standard GP for symbolic regression. GSGP uses Geometric Semantic Operators (GSOs), different from the standard ones, without altering the fitness, while LS modifies the fitness without altering the genetic operators. To the best of our knowledge, there has been no prior utilization of the combined methodology of GSGP and LS for classification problems. Furthermore, despite the fact that they have been used together in one practical regression application, a methodological evaluation of the advantages and disadvantages of integrating these methods for regression or classification problems has never been performed. In this dissertation, a study of a system that integrates both GSGP and LS (GSGP-LS) is presented. The performance of the proposed method, GSGPLS, was tested on six hand-tailored regression benchmarks, nine real-life regression problems and three real-life classification problems. The obtained results indicate that GSGP-LS outperforms GSGP in the majority of the cases, confirming the expected benefit of this integration. However, for some particularly hard regression datasets, GSGP-LS overfits training data, being outperformed by GSGP on unseen data. This contradicts the idea that LS is always beneficial for GP, warning the practitioners about its risk of overfitting in some specific cases.A Aprendizagem Automática (AA) é uma disciplina científica que se esforça por permitir que os computadores aprendam sem a necessidade de programação explícita. Algoritmos Evolutivos (AE),um subconjunto de algoritmos de ML, mimetizam a Teoria da Evolução de Darwin, usando a seleção natural e mecanismos de "sobrevivência dos mais aptos"para evoluir um grupo de indivíduos (ou seja, possíveis soluções para um problema dado). A Programação Genética (PG) é um processo algorítmico que evolui programas de computador (ou indivíduos) para ligar características de entrada e saída. A Programação Genética em Geometria Semântica (PGGS) estende esse conceito permitindo que os indivíduos evoluam e variem no espaço semântico, onde os vetores de saída estão localizados, em vez de serem limitados por estruturas baseadas em sintaxe. A Escala Linear (EL) é um método introduzido para facilitar a tarefa da PG de procurar a melhor função que corresponda a um conjunto de dados conhecidos. Tanto a PGGS quanto a EL demonstraram, independentemente, a capacidade de superar a PG padrão para regressão simbólica. A PGGS usa Operadores Semânticos Geométricos (OSGs), diferentes dos padrões, sem alterar o fitness, enquanto a EL modifica o fitness sem alterar os operadores genéticos. Até onde sabemos, não houve utilização prévia da metodologia combinada de PGGS e EL para problemas de classificação. Além disso, apesar de terem sido usados juntos em uma aplicação prática de regressão, nunca foi realizada uma avaliação metodológica das vantagens e desvantagens da integração desses métodos para problemas de regressão ou classificação. Nesta dissertação, é apresentado um estudo de um sistema que integra tanto a PGGS quanto a EL (PGGSEL). O desempenho do método proposto, PGGS-EL, foi testado em seis benchmarks de regressão personalizados, nove problemas de regressão da vida real e três problemas de classificação da vida real. Os resultados obtidos indicam que o PGGS-EL supera o PGGS na maioria dos casos, confirmando o benefício esperado desta integração. No entanto, para alguns conjuntos de dados de regressão particularmente difíceis, o PGGS-EL faz overfit aos dados de treino, obtendo piores resultados em comparação com PGGS em dados não vistos. Isso contradiz a ideia de que a EL é sempre benéfica para a PG, alertando os praticantes sobre o risco de overfitting em alguns casos específicos

Computational Logic for Biomedicine and Neurosciences

Author: Bahrami Abdorrahim
de Maria Elisabetta
Despeyroux Joelle
Felty Amy
Liò Pietro
Olarte Carlos
Publication venue
Publication date: 06/10/2020
Field of study

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Developing methods for the context-specific reconstruction of metabolic models of cancer cells

Author: Gomes Jorge Alexandre Correia
Publication venue
Publication date: 23/11/2018
Field of study

Dissertação de mestrado em BioinformáticaThe recent advances in genome sequencing technologies and other high-throughput methodologies allowed the identification and quantification of individual cell components. These efforts led to the development of genome-scale metabolic models (GSMMs), not only for humans but also for several other organisms. These models have been used to predict cellular metabolic phenotypes under a variety of physiological conditions and contexts, proving to be useful in tasks such as drug discovery, biomarker identification and interactions between hosts and pathogens. Therefore, these models provide a useful tool for targeting diseases such as cancer, Alzheimer or tuberculosis. However, the usefulness of GSSMs is highly dependent on their capabilities to predict phenotypes in the array of different cell types that compose the human body, making the development of tissue/context-specific models mandatory. To address this issue, several methods have been proposed to integrate omics data, such as transcriptomics or proteomics, to improve the phenotype prediction abilities of GSSMs. Despite these efforts, these methods still have some limitations. In most cases, their usage is locked behind commercially licensed software platforms, or not available in a user-friendly fashion, thus restricting their use to users with programming or command-line knowledge. In this work, an open-source tool was developed for the reconstruction of tissue/context-specific models based on a generic template GSMM and the integration of omics data. The Tissue-Specific Model Reconstruction (TSM-Rec) tool was developed under the Python programming language and features the FASTCORE algorithm for the reconstruction of tissue/context-specific metabolic models. Its functionalities include the loading of omics data from a variety of omics databases, a set of filtering and transformation methods to adjust the data for integration with a template metabolic model, and finally the reconstruction of tissue/context-specific metabolic models. To evaluate the functionality of the developed tool, a cancer related case-study was carried. Using omics data from 314 glioma patients, the TSM-Rec tool was used to reconstruct metabolic models of different grade gliomas. A total of three models were generated, corresponding to grade II, III and IV gliomas. These models were analysed regarding their differences and similarities in reactions and pathways. This comparison highlighted biological processes common to all glioma grades, and pathways that are more prominent in each glioma model. The results show that the tool developed during this work can be useful for the reconstruction of cancer metabolic models, in a search for insights into cancer metabolism and possible approaches towards drug-target discovery.Os avanços recentes nas tecnologias de sequenciação de genomas e noutras metodologias experimentais de alto rendimento permitiram a identificação e quantificação dos diversos componentes celulares. Estes esforços levaram ao desenvolvimento de Modelos Metabólicos à Escala Genómica (MMEG) não só de humanos, mas também de diversos organismos. Estes modelos têm sido utilizados para a previsão de fenótipos metabólicos sob uma variedade de contextos e condições fisiológicas, mostrando a sua utilidade em áreas como a descoberta de fármacos, a identificação de biomarcadores ou interações entre hóspede e patógeno. Desta forma, estes modelos revelam-se ferramentas úteis para o estudo de doenças como o cancro, Alzheimer ou a tuberculose. Contudo, a utilidade dos MMEG está altamente dependente das suas capacidades de previsão de fenótipos nos diversostipos celulares que compõem o corpo humano, tornando o desenvolvimento de modelos específicos de tecidos uma tarefa obrigatória. Para resolver este problema, vários métodos têm proposto a integração de dados ómicos como os de transcriptómica ou proteómica para melhorar as capacidades preditivas dos MMEG. Apesar disso, estes métodos ainda sofrem de algumas limitações. Na maioria dos casos o seu uso está confinado a plataformas ou softwares com licenças comerciais, ou não está disponível numa ferramenta de fácil uso, limitando a sua utilização a utilizadores com conhecimentos de programação ou de linha de comandos. Neste trabalho, foi desenvolvida uma ferramenta de acesso livre para a reconstrução de modelos metabólicos específicos para tecidos tendo por base um MMEG genérico e a integração de dados ómicos. A ferramenta TSM-Rec (Tissue-Specific Model Reconstruction), foi desenvolvida na linguagem de programação Python e recorre ao algoritmo FASTCORE para efetuar a reconstrução de modelos metabólicos específicos. As suas funcionalidades permitem a leitura de dados ómicos de diversas bases de dados ómicas, a filtragem e transformação dos mesmos para permitir a sua integração com um modelo metabólico genérico e por fim, a reconstrução de modelos metabólicos específicos. De forma a avaliar o funcionamento da ferramenta desenvolvida, esta foi aplicada num caso de estudo de cancro. Recorrendo a dados ómicos de 314 pacientes com glioma, usou-se a ferramenta TSM-Rec para a reconstrução de modelos metabólicos de gliomas de diferentes graus. No total, foram desenvolvidos três modelos correspondentes a gliomas de grau II, grau III e grau IV. Estes modelos foram analisados no sentido de perceber as diferenças e as similaridades entre as reações e as vias metabólicas envolvidas em cada um dos modelos. Esta comparação permitiu isolar processos biológicos comuns a todos os graus de glioma, assim como vias metabólicas que se destacam em cada um dos graus. Os resultados obtidos demonstram que a ferramenta desenvolvida pode ser útil para a reconstrução de modelos metabólicos de cancro, na procura de um melhor conhecimento do metabolismo do cancro e possíveis abordagens para a descoberta de fármacos

An Exploratory Study of Patient Falls

Author: Coto Jeffrey A
Wilder Coleen
Publication venue: ValpoScholar
Publication date: 01/01/2016
Field of study

Debate continues between the contribution of education level and clinical expertise in the nursing practice environment. Research suggests a link between Baccalaureate of Science in Nursing (BSN) nurses and positive patient outcomes such as lower mortality, decreased falls, and fewer medication errors. Purpose: To examine if there a negative correlation between patient falls and the level of nurse education at an urban hospital located in Midwest Illinois during the years 2010-2014? Methods: A retrospective crosssectional cohort analysis was conducted using data from the National Database of Nursing Quality Indicators (NDNQI) from the years 2010-2014. Sample: Inpatients aged ≥ 18 years who experienced a unintentional sudden descent, with or without injury that resulted in the patient striking the floor or object and occurred on inpatient nursing units. Results: The regression model was constructed with annual patient falls as the dependent variable and formal education and a log transformed variable for percentage of certified nurses as the independent variables. The model overall is a good fit, F (2,22) = 9.014, p = .001, adj. R2 = .40. Conclusion: Annual patient falls will decrease by increasing the number of nurses with baccalaureate degrees and/or certifications from a professional nursing board-governing body

A Field Guide to Genetic Programming

Author: Langdon William B.
McPhee Nicholas F.
Poli Ricardo
Publication venue: [S.L.] : Lulu Press (lulu.com), 2008.
Publication date: 01/01/2008
Field of study

xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

Design and application of gene-pool optimal mixing evolutionary algorithms for genetic programming

Author: Virgolin M. (Marco)
Publication venue
Publication date: 08/06/2020
Field of study

Efficient Decision Support Systems

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This series is directed to diverse managerial professionals who are leading the transformation of individual domains by using expert information and domain knowledge to drive decision support systems (DSSs). The series offers a broad range of subjects addressed in specific areas such as health care, business management, banking, agriculture, environmental improvement, natural resource and spatial management, aviation administration, and hybrid applications of information technology aimed to interdisciplinary issues. This book series is composed of three volumes: Volume 1 consists of general concepts and methodology of DSSs; Volume 2 consists of applications of DSSs in the biomedical domain; Volume 3 consists of hybrid applications of DSSs in multidisciplinary domains. The book is shaped decision support strategies in the new infrastructure that assists the readers in full use of the creative technology to manipulate input data and to transform information into useful decisions for decision makers

Book of Abstracts XVIII Congreso de Biometría CEBMADRID

Author
Publication venue
Publication date: 01/01/2022
Field of study

Abstracts of the XVIII Congreso de Biometría CEBMADRID held from 25 to 27 May in MadridInteractive modelling and prediction of patient evolution via multistate models / Leire Garmendia Bergés, Jordi Cortés Martínez and Guadalupe Gómez Melis : This research was funded by the Ministerio de Ciencia e Innovación (Spain) [PID2019104830RBI00]; and the Generalitat de Catalunya (Spain) [2017SGR622 and 2020PANDE00148].Operating characteristics of a model-based approach to incorporate non-concurrent controls in platform trials / Pavla Krotka, Martin Posch, Marta Bofill Roig : EU-PEARL (EU Patient-cEntric clinicAl tRial pLatforms) project has received funding from the Innovative Medicines Initiative (IMI) 2 Joint Undertaking (JU) under grant agreement No 853966. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA and Children’s Tumor Foundation, Global Alliance for TB Drug Development non-profit organisation, Spring works Therapeutics Inc.Modeling COPD hospitalizations using variable domain functional regression / Pavel Hernández Amaro, María Durbán Reguera, María del Carmen Aguilera Morillo, Cristobal Esteban Gonzalez, Inma Arostegui : This work is supported by the grant ID2019-104901RB-I00 from the Spanish Ministry of Science, Innovation and Universities MCIN/AEI/10.13039/501100011033.Spatio-temporal quantile autoregression for detecting changes in daily temperature in northeastern Spain / Jorge Castillo-Mateo, Alan E. Gelfand, Jesús Asín, Ana C. Cebrián / Spatio-temporal quantile autoregression for detecting changes in daily temperature in northeastern Spain : This work was partially supported by the Ministerio de Ciencia e Innovación under Grant PID2020-116873GB-I00; Gobierno de Aragón under Research Group E46_20R: Modelos Estocásticos; and JC-M was supported by Gobierno de Aragón under Doctoral Scholarship ORDEN CUS/581/2020.Estimation of the area under the ROC curve with complex survey data / Amaia Iparragirre, Irantzu Barrio, Inmaculada Arostegui : This work was financially supported in part by IT1294-19, PID2020-115882RB-I00, KK-2020/00049. The work of AI was supported by PIF18/213.INLAMSM: Adjusting multivariate lattice models with R and INLA / Francisco Palmí Perales, Virgilio Gómez Rubio and Miguel Ángel Martínez Beneito : This work has been supported by grants PPIC-2014-001-P and SBPLY/17/180501/000491, funded by Consejería de Educación, Cultura y Deportes (Junta de Comunidades de Castilla-La Mancha, Spain) and FEDER, grant MTM2016-77501-P, funded by Ministerio de Economía y Competitividad (Spain), grant PID2019-106341GB-I00 from Ministerio de Ciencia e Innovación (Spain) and a grant to support research groups by the University of Castilla-La Mancha (Spain). F. Palmí-Perales has been supported by a Ph.D. scholarship awarded by the University of Castilla-La Mancha (Spain)

Universidad Carlos III de Madrid e-Archivo

Computational Logic for Biomedicine and Neuroscience

Author: Bahrami Abdorrahim
De Maria Elisabetta
Despeyroux Joelle
Felty Amy
Lió Pietro
Olarte Carlos
Publication venue: HAL CCSD
Publication date: 06/10/2020
Field of study

We advocate here the use of computational logic for systems biology, as a \emph{unified and safe} framework well suited for both modeling the dynamic behaviour of biological systems, expressing properties of them, and verifying these properties. The potential candidate logics should have a traditional proof theoretic pedigree (including either induction, or a sequent calculus presentation enjoying cut-elimination and focusing), and should come with certified proof tools. Beyond providing a reliable framework, this allows the correct encodings of our biological systems. % For systems biology in general and biomedicine in particular, we have so far, for the modeling part, three candidate logics: all based on linear logic. The studied properties and their proofs are formalized in a very expressive (non linear) inductive logic: the Calculus of Inductive Constructions (CIC). The examples we have considered so far are relatively simple ones; however, all coming with formal semi-automatic proofs in the Coq system, which implements CIC. In neuroscience, we are directly using CIC and Coq, to model neurons and some simple neuronal circuits and prove some of their dynamic properties. % In biomedicine, the study of multi omic pathway interactions, together with clinical and electronic health record data should help in drug discovery and disease diagnosis. Future work includes using more automatic provers. This should enable us to specify and study more realistic examples, and in the long term to provide a system for disease diagnosis and therapy prognosis.Nous pr{\^o}nons ici l'utilisation d'une logique calculatoire pour la biologie des systèmes, en tant que cadre \emph{unifié et sûr}, bien adapté à la fois à la modélisation du comportement dynamique des systèmes biologiques,à l'expression de leurs propriétés, et à la vérification de ces propriétés.Les logiques candidates potentielles doivent avoir un pedigree traditionnel en théorie de la preuve (y compris, soit l'induction, soit une présentation en calcul des séquents, avec l'élimination des coupures et des règles ``focales''), et doivent être accompagnées d'outils de preuves certifiés.En plus de fournir un cadre fiable, cela nous permet d'encoder de manière correcte nos systèmes biologiques. Pour la biologie des systèmes en général et la biomédecine en particulier, nous avons jusqu'à présent, pour la partie modélisation, trois logiques candidates : toutes basées sur la logique linéaire.Les propriétés étudiées et leurs preuves sont formalisées dans une logique inductive (non linéaire) très expressive : le Calcul des Constructions Inductives (CIC).Les exemples que nous avons étudiés jusqu'à présent sont relativement simples. Cependant, ils sont tous accompagnés de preuves formelles semi-automatiques dans le système Coq, qui implémente CIC. En neurosciences, nous utilisons directement CIC et Coq pour modéliser les neurones et certains circuits neuronaux simples et prouver certaines de leurs propriétés dynamiques.En biomédecine, l'étude des interactions entre des voies multiomiques,ainsi que les études cliniques et les données des dossiers médicaux électroniques devraient aider à la découverte de médicaments et au diagnostic des maladies.Les travaux futurs portent notamment sur l'utilisation de systèmes de preuves plus automatiques.Cela devrait nous permettre de modéliser et d'étudier des exemples plus réalistes,et à terme de fournir un système pour le diagnostic des maladies et le pronostic thérapeutique

INRIA a CCSD electronic archive server

In-silico-Systemanalyse von Biopathways

Author: Chen Ming
Publication venue: Bielefeld University
Publication date: 01/01/2004
Field of study

Chen M. In silico systems analysis of biopathways. Bielefeld (Germany): Bielefeld University; 2004.In the past decade with the advent of high-throughput technologies, biology has migrated from a descriptive science to a predictive one. A vast amount of information on the metabolism have been produced; a number of specific genetic/metabolic databases and computational systems have been developed, which makes it possible for biologists to perform in silico analysis of metabolism. With experimental data from laboratory, biologists wish to systematically conduct their analysis with an easy-to-use computational system. One major task is to implement molecular information systems that will allow to integrate different molecular database systems, and to design analysis tools (e.g. simulators of complex metabolic reactions). Three key problems are involved: 1) Modeling and simulation of biological processes; 2) Reconstruction of metabolic pathways, leading to predictions about the integrated function of the network; and 3) Comparison of metabolism, providing an important way to reveal the functional relationship between a set of metabolic pathways. This dissertation addresses these problems of in silico systems analysis of biopathways. We developed a software system to integrate the access to different databases, and exploited the Petri net methodology to model and simulate metabolic networks in cells. It develops a computer modeling and simulation technique based on Petri net methodology; investigates metabolic networks at a system level; proposes a markup language for biological data interchange among diverse biological simulators and Petri net tools; establishes a web-based information retrieval system for metabolic pathway prediction; presents an algorithm for metabolic pathway alignment; recommends a nomenclature of cellular signal transduction; and attempts to standardize the representation of biological pathways. Hybrid Petri net methodology is exploited to model metabolic networks. Kinetic modeling strategy and Petri net modeling algorithm are applied to perform the processes of elements functioning and model analysis. The proposed methodology can be used for all other metabolic networks or the virtual cell metabolism. Moreover, perspectives of Petri net modeling and simulation of metabolic networks are outlined. A proposal for the Biology Petri Net Markup Language (BioPNML) is presented. The concepts and terminology of the interchange format, as well as its syntax (which is based on XML) are introduced. BioPNML is designed to provide a starting point for the development of a standard interchange format for Bioinformatics and Petri nets. The language makes it possible to exchange biology Petri net diagrams between all supported hardware platforms and versions. It is also designed to associate Petri net models and other known metabolic simulators. A web-based metabolic information retrieval system, PathAligner, is developed in order to predict metabolic pathways from rudimentary elements of pathways. It extracts metabolic information from biological databases via the Internet, and builds metabolic pathways with data sources of genes, sequences, enzymes, metabolites, etc. The system also provides a navigation platform to investigate metabolic related information, and transforms the output data into XML files for further modeling and simulation of the reconstructed pathway. An alignment algorithm to compare the similarity between metabolic pathways is presented. A new definition of the metabolic pathway is proposed. The pathway defined as a linear event sequence is practical for our alignment algorithm. The algorithm is based on strip scoring the similarity of 4-hierarchical EC numbers involved in the pathways. The algorithm described has been implemented and is in current use in the context of the PathAligner system. Furthermore, new methods for the classification and nomenclature of cellular signal transductions are recommended. For each type of characterized signal transduction, a unique ST number is provided. The Signal Transduction Classification Database (STCDB), based on the proposed classification and nomenclature, has been established. By merging the ST numbers with EC numbers, alignments of biopathways are possible. Finally, a detailed model of urea cycle that includes gene regulatory networks, metabolic pathways and signal transduction is demonstrated by using our approaches. A system biological interpretation of the observed behavior of the urea cycle and its related transcriptomics information is proposed to provide new insights for metabolic engineering and medical care

Publications at Bielefeld University