20,371 research outputs found

    From data towards knowledge: Revealing the architecture of signaling systems by unifying knowledge mining and data mining of systematic perturbation data

    Get PDF
    Genetic and pharmacological perturbation experiments, such as deleting a gene and monitoring gene expression responses, are powerful tools for studying cellular signal transduction pathways. However, it remains a challenge to automatically derive knowledge of a cellular signaling system at a conceptual level from systematic perturbation-response data. In this study, we explored a framework that unifies knowledge mining and data mining approaches towards the goal. The framework consists of the following automated processes: 1) applying an ontology-driven knowledge mining approach to identify functional modules among the genes responding to a perturbation in order to reveal potential signals affected by the perturbation; 2) applying a graph-based data mining approach to search for perturbations that affect a common signal with respect to a functional module, and 3) revealing the architecture of a signaling system organize signaling units into a hierarchy based on their relationships. Applying this framework to a compendium of yeast perturbation-response data, we have successfully recovered many well-known signal transduction pathways; in addition, our analysis have led to many hypotheses regarding the yeast signal transduction system; finally, our analysis automatically organized perturbed genes as a graph reflecting the architect of the yeast signaling system. Importantly, this framework transformed molecular findings from a gene level to a conceptual level, which readily can be translated into computable knowledge in the form of rules regarding the yeast signaling system, such as "if genes involved in MAPK signaling are perturbed, genes involved in pheromone responses will be differentially expressed"

    Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery

    Full text link
    Motivation: Signaling pathways control a large variety of cellular processes. However, currently, even within the same database signaling pathways are often curated at different levels of detail. This makes comparative and cross-talk analyses difficult. Results: We present SignaLink, a database containing 8 major signaling pathways from Caenorhabditis elegans, Drosophila melanogaster, and humans. Based on 170 review and approx. 800 research articles, we have compiled pathways with semi-automatic searches and uniform, well-documented curation rules. We found that in humans any two of the 8 pathways can cross-talk. We quantified the possible tissue- and cancer-specific activity of cross-talks and found pathway-specific expression profiles. In addition, we identified 327 proteins relevant for drug target discovery. Conclusions: We provide a novel resource for comparative and cross-talk analyses of signaling pathways. The identified multi-pathway and tissue-specific cross-talks contribute to the understanding of the signaling complexity in health and disease and underscore its importance in network-based drug target selection. Availability: http://SignaLink.orgComment: 9 pages, 4 figures, 2 tables and a supplementary info with 5 Figures and 13 Table

    In-silico-Systemanalyse von Biopathways

    Get PDF
    Chen M. In silico systems analysis of biopathways. Bielefeld (Germany): Bielefeld University; 2004.In the past decade with the advent of high-throughput technologies, biology has migrated from a descriptive science to a predictive one. A vast amount of information on the metabolism have been produced; a number of specific genetic/metabolic databases and computational systems have been developed, which makes it possible for biologists to perform in silico analysis of metabolism. With experimental data from laboratory, biologists wish to systematically conduct their analysis with an easy-to-use computational system. One major task is to implement molecular information systems that will allow to integrate different molecular database systems, and to design analysis tools (e.g. simulators of complex metabolic reactions). Three key problems are involved: 1) Modeling and simulation of biological processes; 2) Reconstruction of metabolic pathways, leading to predictions about the integrated function of the network; and 3) Comparison of metabolism, providing an important way to reveal the functional relationship between a set of metabolic pathways. This dissertation addresses these problems of in silico systems analysis of biopathways. We developed a software system to integrate the access to different databases, and exploited the Petri net methodology to model and simulate metabolic networks in cells. It develops a computer modeling and simulation technique based on Petri net methodology; investigates metabolic networks at a system level; proposes a markup language for biological data interchange among diverse biological simulators and Petri net tools; establishes a web-based information retrieval system for metabolic pathway prediction; presents an algorithm for metabolic pathway alignment; recommends a nomenclature of cellular signal transduction; and attempts to standardize the representation of biological pathways. Hybrid Petri net methodology is exploited to model metabolic networks. Kinetic modeling strategy and Petri net modeling algorithm are applied to perform the processes of elements functioning and model analysis. The proposed methodology can be used for all other metabolic networks or the virtual cell metabolism. Moreover, perspectives of Petri net modeling and simulation of metabolic networks are outlined. A proposal for the Biology Petri Net Markup Language (BioPNML) is presented. The concepts and terminology of the interchange format, as well as its syntax (which is based on XML) are introduced. BioPNML is designed to provide a starting point for the development of a standard interchange format for Bioinformatics and Petri nets. The language makes it possible to exchange biology Petri net diagrams between all supported hardware platforms and versions. It is also designed to associate Petri net models and other known metabolic simulators. A web-based metabolic information retrieval system, PathAligner, is developed in order to predict metabolic pathways from rudimentary elements of pathways. It extracts metabolic information from biological databases via the Internet, and builds metabolic pathways with data sources of genes, sequences, enzymes, metabolites, etc. The system also provides a navigation platform to investigate metabolic related information, and transforms the output data into XML files for further modeling and simulation of the reconstructed pathway. An alignment algorithm to compare the similarity between metabolic pathways is presented. A new definition of the metabolic pathway is proposed. The pathway defined as a linear event sequence is practical for our alignment algorithm. The algorithm is based on strip scoring the similarity of 4-hierarchical EC numbers involved in the pathways. The algorithm described has been implemented and is in current use in the context of the PathAligner system. Furthermore, new methods for the classification and nomenclature of cellular signal transductions are recommended. For each type of characterized signal transduction, a unique ST number is provided. The Signal Transduction Classification Database (STCDB), based on the proposed classification and nomenclature, has been established. By merging the ST numbers with EC numbers, alignments of biopathways are possible. Finally, a detailed model of urea cycle that includes gene regulatory networks, metabolic pathways and signal transduction is demonstrated by using our approaches. A system biological interpretation of the observed behavior of the urea cycle and its related transcriptomics information is proposed to provide new insights for metabolic engineering and medical care

    Electronic data sources for kinetic models of cell signaling

    Get PDF
    Functional understanding of signaling pathways requires detailed information about the constituent molecules and their interactions. Simulations of signaling pathways therefore build upon a great deal of data from various sources. We first survey electronic data resources for cell signaling modeling and then based on the type of data representation the data sources are broadly classified into five groups. None of the data sources surveyed provide all required data in a ready-to-be-modeled fashion. We then put forward a wish list for the desired attributes for an ideal modeling centric database. Finally, we close with perspectives on how electronic data sources for cell signaling modeling have developed. We suggest that future directions in such data sources are largely model-driven and are hinged on interoperability of data sources

    BioSilicoSystems - A Multipronged Approach Towards Analysis and Representation of Biological Data (PhD Thesis)

    Get PDF
    The rising field of integrative bioinformatics provides the vital methods to integrate, manage and also to analyze the diverse data and allows gaining new and deeper insights and a clear understanding of the intricate biological systems. The difficulty is not only to facilitate the study of heterogeneous data within the biological context, but it also more fundamental, how to represent and make the available knowledge accessible. Moreover, adding valuable information and functions that persuade the user to discover the interesting relations hidden within the data is, in itself, a great challenge. Also, the cumulative information can provide greater biological insight than is possible with individual information sources. Furthermore, the rapidly growing number of databases and data types poses the challenge of integrating the heterogeneous data types, especially in biology. This rapid increase in the volume and number of data resources drive for providing polymorphic views of the same data and often overlap in multiple resources. 

In this thesis a multi-pronged approach is proposed that deals with various methods for the analysis and representation of the diverse biological data which are present in different data sources. This is an effort to explain and emphasize on different concepts which are developed for the analysis of molecular data and also to explain its biological significance. The hypotheses proposed are in context with various other results and findings published in the past. The approach demonstrated also explains different ways to integrate the molecular data from various sources along with the need for a comprehensive understanding and clear projection of the concept or the algorithm and its results, but with simple means and methods. The multifarious approach proposed in this work comprises of different tools or methods spanning significant areas of bioinformatics research such as data integration, data visualization, biological network construction / reconstruction and alignment of biological pathways. Each tool deals with a unique approach to utilize the molecular data for different areas of biological research and is built based on the kernel of the thesis. Furthermore these methods are combined with graphical representation that make things simple and comprehensible and also helps to understand with ease the underlying biological complexity. Moreover the human eye is often used to and it is more comfortable with the visual representation of the facts

    Systematic reconstruction of TRANSPATH data into Cell System Markup Language

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many biological repositories store information based on experimental study of the biological processes within a cell, such as protein-protein interactions, metabolic pathways, signal transduction pathways, or regulations of transcription factors and miRNA. Unfortunately, it is difficult to directly use such information when generating simulation-based models. Thus, modeling rules for encoding biological knowledge into system-dynamics-oriented standardized formats would be very useful for fully understanding cellular dynamics at the system level.</p> <p>Results</p> <p>We selected the TRANSPATH database, a manually curated high-quality pathway database, which provides a plentiful source of cellular events in humans, mice, and rats, collected from over 31,500 publications. In this work, we have developed 16 modeling rules based on hybrid functional Petri net with extension (HFPNe), which is suitable for graphical representing and simulating biological processes. In the modeling rules, each Petri net element is incorporated with Cell System Ontology to enable semantic interoperability of models. As a formal ontology for biological pathway modeling with dynamics, CSO also defines biological terminology and corresponding icons. By combining HFPNe with the CSO features, it is possible to make TRANSPATH data to simulation-based and semantically valid models. The results are encoded into a biological pathway format, Cell System Markup Language (CSML), which eases the exchange and integration of biological data and models.</p> <p>Conclusion</p> <p>By using the 16 modeling rules, 97% of the reactions in TRANSPATH are converted into simulation-based models represented in CSML. This reconstruction demonstrates that it is possible to use our rules to generate quantitative models from static pathway descriptions.</p

    Using graph theory to analyze biological networks

    Get PDF
    Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system

    ClockOME: searching for oscillatory genes in early vertebrate development

    Get PDF
    Embryo development is a dynamic process regulated in space and time. Cells must integrate biochemical and mechanical signals to generate fully functional organisms, where oscillatory gene expression plays a key role. The embryo molecular clock (EMC) is the best known genetic oscillator active in embryo segmentation, involving genes from the Notch, FGF, and WNT pathways. However, the list of cyclic genes is still incomplete mostly due to the challenges involved with studying periodic systems. Recently, such studies have become more feasible with the development of pseudo-time ordering algorithms that search for candidate oscillatory genes using large transcriptomics datasets sampled without explicit time measurements. This study aims at finding candidate oscillatory genes - ClockOME - active in early chick embryo development. Two Gallus gallus microarray transcriptomics datasets from Presomitic mesoderm (PSM), and one dataset from limb segmentation were gathered from GEO and ArrayExpress. To normalize these data from different experiments, an RData package - FrozenChicken - was developed to apply a frozen Robust MultiArray (fRMA) normalization to the data. Next the datasets were processed with Oscope (a pseudo-time ordering algorithm) to search for candidate periodic genes clustered by similar oscillatory behaviour. The clusters of predicted oscillators were then subject to functional enrichment and interaction network analyses to highlight the biological functions associated with these genes. Oscope predicted three clusters of oscillators: two in PSM (106 and 32 genes), and one in Limb (162 genes). Overall, the genes are associated with regulatory, morphological, and developmental processes. Mesp2, a gene involved with the EMC, was found in this dataset, validating the approach, however, the majority of genes are novel oscillatory candidates, associated with chromatin and transcriptional regulation, as well as protein and oxygen metabolism. The list of candidate oscillators represents a valuable resource for guided experimental validation to discover additional members of the chick EMC. Six genes have been proposed for high-priority experimental validation: SRC, PTCH1, NOTCH2, YAP1, KDR, CTR9.O desenvolvimento embrionário é um processo dinâmico que envolve alterações moleculares no espaço e no tempo. As células embrionárias são constantemente expostas a estímulos bioquímicos e mecânicos, e respondem ao ambiente em que se encontram alterando o seu programa genético. Quando corretamente integradas, estas respostas celulares culminam com o desenvolvimento bem-sucedido de um organismo funcional. Assim, a embriogénese envolve processos moleculares estritamente regulados, sendo a expressão oscilatória de genes uma das formas possíveis para a regulação do comportamento das células ao longo do tempo. O relógio molecular embrionário é um conhecido oscilador genético, e está envolvido na segmentação do tecido paraxial embrionário. O conceito de relógio molecular foi inicialmente proposto em 1976 por Cooke e Zeeman, ao qual chamaram o modelo Clock and Wavefront (Relógio e Frente de Onda)1. Este modelo foi concebido para descrever teoricamente a formação rítmica de sómitos em ambos os lados da mesoderme paraxial (PSM) nos vertebrados, e baseia-se na existência de osciladores genéticos que regulam esse processo de segmentação da PSM ao longo do tempo. Para além do relógio, como diz o nome, o modelo inclui a existência de uma frente de onda, que determina espacialmente o comportamento das células presentes na mesoderme pré-somítica (PSM). Assim, os dois mecanismos guiam a diferenciação das células da PSM, que consequentemente sofrem transformações genéticas que precedem a formação dos sómitos. A base deste relógio molecular consiste na expressão periódica de genes que fazem parte das vias moleculares Notch, FGF e WNT. Contudo, a lista de genes envolvidos no relógio embrionário ainda não se encontra completa, facto este que se deve principalmente às dificuldades experimentais relacionadas com o estudo de sistemas periódicos quando não se conhece de antemão a periodicidade/ritmo da expressão dos genes envolvidos. Com o advento de novas técnicas de transcriptómica que permitem o estudo dos valores de expressão de todos os genes simultaneamente, nomeadamente usando Microarrays, ou mais recentemente através de métodos de sequenciação, como RNA-sequencing ou Single-Cell RNA-sequencing, surge a oportunidade de procurar alargar a lista de genes com expressão oscilatória. Porém, estes métodos implicam a extração do RNA das células amostradas resultando na morte celular. Assim, este processamento inviabiliza o estudo das mesmas células ao longo do tempo, originando dados moleculares estáticos, isto é, os níveis de expressão obtidos representam uma única amostra temporal. Para o estudo de processos periódicos, seria então necessário fazer uma série temporal amostrando diferentes indivíduos ao longo do tempo de desenvolvimento, aumentando grandemente o número de amostras biológicas necessárias para resolver o ciclo de oscilação para cada gene estudado. Assim, sem informação temporal medida explicitamente, a expressão oscilatória de genes pode apenas ser estudada usando modelos matemáticos apropriados, nomeadamente através da aplicação de algoritmos de ordenação pseudo-temporal. Estes métodos ordenam as amostras ao longo do tempo de uma oscilação de forma a obter o padrão do comportamento cíclico para todos os genes cuja expressão oscila concomitantemente. Torna-se assim possível, bioinformaticamente, inferir o potencial oscilatório de genes medidos por estas técnicas de transcriptómica, sem informação temporal explícita. Deste modo, o objetivo deste estudo é encontrar novos genes oscilatórios, a que coletivamente chamamos ClockOME, que estão ativos durante as primeiras etapas do desenvolvimento embrionário (somitogénese) da galinha, nos tecidos da mesoderme présomítica (PSM), e no membro superior (Limb); tecidos estes onde o relógio molecular foi descrito, atuando como regulador temporal das alterações genéticas subjacentes. Para tal, recolheu-se 3 conjuntos de dados (datasets) de transcriptómica obtidos por microarray de dois repositórios de dados públicos: GEO (da instituição americana NCBI) e ArrayExpress (da instituição europeia EMBL-EBI). Dois datasets continham dados de mesoderme paraxial (PSM) – tecido onde ocorre a somitogénese; e um dataset de dados de obtidos do membro superior do embrião de galinha. Com o objetivo de normalizar os três datasets de forma a torná-los comparáveis (uma vez que são oriundos de processos experimentais diferentes), foi desenvolvido um pacote de R denominado “FrozenChicken: Promoting the meta-analysis of chicken microarray data” (publicado em 2021) (https://doi.org/10.1101/2021.02.25.432894). Este pacote contém dados sumarizados de 472 datasets de microarrays de embriões de galinha, tornando possível a normalização por fRMA (frozen Robust MultiArray) de microarrays de Gallus gallus. Após normalização e controlo de qualidade dos valores de expressão genética, os dados da PSM e do membro foram processados com o Oscope (algoritmo de ordenação pseudo-temporal), com o propósito de prever genes oscilatórios. Este algoritmo avalia todas as combinações de pares de genes, agrupando aqueles que apresentem padrões de expressão semelhantes, ou seja, cujos valores de expressão ao longo das amostras seguem trajetórias semelhantes, indiciando um período de oscilação potencialmente semelhante. Os clusters de genes previstos pelo Oscope foram posteriormente submetidos a uma análise de enriquecimento funcional e a uma análise de interações funcionais, com o intuito de perceber o seu potencial papel biológico, e funções moleculares subjacentes. O Oscope reportou três listas de genes potencialmente oscilatórios: dois grupos foram encontrados a partir dos dados da PSM (com 106 e 32 genes cada) e o terceiro grupo de 162 genes foi encontrado nos dados do membro superior. No total, a lista de genes que denominamos ClockOME é composta por 296 genes potencialmente oscilatórios, envolvidos em diversos mecanismos regulatórios importantes para o desenvolvimento embrionário e para a morfogénese. A maioria dos genes presentes nesta lista não estão descritos na literatura como sendo oscilatórios (novel candidates), representando, portanto, uma mais-valia para a comunidade científica que estuda o relógio molecular embrionário. Estes genes parecem estar associados a funções como remodelação da cromatina, regulação da transcrição, metabolismo proteico e metabolismo do oxigénio, sendo, portanto, bons candidatos para futura validação experimental. Notavelmente, o Oscope identificou com sucesso o Mesp2, um gene oscilatório bem descrito na literatura, mostrando assim a validade e o potencial desta abordagem teórica. Em suma, este trabalho produziu uma lista de 296 genes potencialmente oscilatórios. Com base na sua novidade e na função molecular anotada, foi proposta uma lista de seis genes candidatos de particular relevância para validação experimental no futuro próximo, nomeadamente: SRC, PTCH1, NOTCH2, YAP1, KDR, CTR9. Assim, as listas resultantes do trabalho desta tese poderão agora guiar futuras experiências laboratoriais capazes de adicionar novos interactores moleculares ao atual modelo do relógio molecular embrionário
    corecore