20,371 research outputs found
From data towards knowledge: Revealing the architecture of signaling systems by unifying knowledge mining and data mining of systematic perturbation data
Genetic and pharmacological perturbation experiments, such as deleting a gene
and monitoring gene expression responses, are powerful tools for studying
cellular signal transduction pathways. However, it remains a challenge to
automatically derive knowledge of a cellular signaling system at a conceptual
level from systematic perturbation-response data. In this study, we explored a
framework that unifies knowledge mining and data mining approaches towards the
goal. The framework consists of the following automated processes: 1) applying
an ontology-driven knowledge mining approach to identify functional modules
among the genes responding to a perturbation in order to reveal potential
signals affected by the perturbation; 2) applying a graph-based data mining
approach to search for perturbations that affect a common signal with respect
to a functional module, and 3) revealing the architecture of a signaling system
organize signaling units into a hierarchy based on their relationships.
Applying this framework to a compendium of yeast perturbation-response data, we
have successfully recovered many well-known signal transduction pathways; in
addition, our analysis have led to many hypotheses regarding the yeast signal
transduction system; finally, our analysis automatically organized perturbed
genes as a graph reflecting the architect of the yeast signaling system.
Importantly, this framework transformed molecular findings from a gene level to
a conceptual level, which readily can be translated into computable knowledge
in the form of rules regarding the yeast signaling system, such as "if genes
involved in MAPK signaling are perturbed, genes involved in pheromone responses
will be differentially expressed"
Uniformly curated signaling pathways reveal tissue-specific cross-talks and support drug target discovery
Motivation: Signaling pathways control a large variety of cellular processes.
However, currently, even within the same database signaling pathways are often
curated at different levels of detail. This makes comparative and cross-talk
analyses difficult. Results: We present SignaLink, a database containing 8
major signaling pathways from Caenorhabditis elegans, Drosophila melanogaster,
and humans. Based on 170 review and approx. 800 research articles, we have
compiled pathways with semi-automatic searches and uniform, well-documented
curation rules. We found that in humans any two of the 8 pathways can
cross-talk. We quantified the possible tissue- and cancer-specific activity of
cross-talks and found pathway-specific expression profiles. In addition, we
identified 327 proteins relevant for drug target discovery. Conclusions: We
provide a novel resource for comparative and cross-talk analyses of signaling
pathways. The identified multi-pathway and tissue-specific cross-talks
contribute to the understanding of the signaling complexity in health and
disease and underscore its importance in network-based drug target selection.
Availability: http://SignaLink.orgComment: 9 pages, 4 figures, 2 tables and a supplementary info with 5 Figures
and 13 Table
In-silico-Systemanalyse von Biopathways
Chen M. In silico systems analysis of biopathways. Bielefeld (Germany): Bielefeld University; 2004.In the past decade with the advent of high-throughput technologies, biology has migrated from a descriptive science to a predictive one. A vast amount of information on the metabolism have been produced; a number of specific genetic/metabolic databases and computational systems have been developed, which makes it possible for biologists to perform in silico analysis of metabolism. With experimental data from laboratory, biologists wish to systematically conduct their analysis with an easy-to-use computational system. One major task is to implement molecular information systems that will allow to integrate different molecular database systems, and to design analysis tools (e.g. simulators of complex metabolic reactions). Three key problems are involved: 1) Modeling and simulation of biological processes; 2) Reconstruction of metabolic pathways, leading to predictions about the integrated function of the network; and 3) Comparison of metabolism, providing an important way to reveal the functional relationship between a set of metabolic pathways.
This dissertation addresses these problems of in silico systems analysis of biopathways. We developed a software system to integrate the access to different databases, and exploited the Petri net methodology to model and simulate metabolic networks in cells. It develops a computer modeling and simulation technique based on Petri net methodology; investigates metabolic networks at a system level; proposes a markup language for biological data interchange among diverse biological simulators and Petri net tools; establishes a web-based information retrieval system for metabolic pathway prediction; presents an algorithm for metabolic pathway alignment; recommends a nomenclature of cellular signal transduction; and attempts to standardize the representation of biological pathways.
Hybrid Petri net methodology is exploited to model metabolic networks. Kinetic modeling strategy and Petri net modeling algorithm are applied to perform the processes of elements functioning and model analysis. The proposed methodology can be used for all other metabolic networks or the virtual cell metabolism. Moreover, perspectives of Petri net modeling and simulation of metabolic networks are outlined.
A proposal for the Biology Petri Net Markup Language (BioPNML) is presented. The concepts and terminology of the interchange format, as well as its syntax (which is based on XML) are introduced. BioPNML is designed to provide a starting point for the development of a standard interchange format for Bioinformatics and Petri nets. The language makes it possible to exchange biology Petri net diagrams between all supported hardware platforms and versions. It is also designed to associate Petri net models and other known metabolic simulators.
A web-based metabolic information retrieval system, PathAligner, is developed in order to predict metabolic pathways from rudimentary elements of pathways. It extracts metabolic information from biological databases via the Internet, and builds metabolic pathways with data sources of genes, sequences, enzymes, metabolites, etc. The system also provides a navigation platform to investigate metabolic related information, and transforms the output data into XML files for further modeling and simulation of the reconstructed pathway.
An alignment algorithm to compare the similarity between metabolic pathways is presented. A new definition of the metabolic pathway is proposed. The pathway defined as a linear event sequence is practical for our alignment algorithm. The algorithm is based on strip scoring the similarity of 4-hierarchical EC numbers involved in the pathways. The algorithm described has been implemented and is in current use in the context of the PathAligner system.
Furthermore, new methods for the classification and nomenclature of cellular signal transductions are recommended. For each type of characterized signal transduction, a unique ST number is provided. The Signal Transduction Classification Database (STCDB), based on the proposed classification and nomenclature, has been established. By merging the ST numbers with EC numbers, alignments of biopathways are possible.
Finally, a detailed model of urea cycle that includes gene regulatory networks, metabolic pathways and signal transduction is demonstrated by using our approaches. A system biological interpretation of the observed behavior of the urea cycle and its related transcriptomics information is proposed to provide new insights for metabolic engineering and medical care
Electronic data sources for kinetic models of cell signaling
Functional understanding of signaling pathways requires detailed information about the constituent molecules and their interactions. Simulations of signaling pathways therefore build upon a great deal of data from various sources. We first survey electronic data resources for cell signaling modeling and then based on the type of data representation the data sources are broadly classified into five groups. None of the data sources surveyed provide all required data in a ready-to-be-modeled fashion. We then put forward a wish list for the desired attributes for an ideal modeling centric database. Finally, we close with perspectives on how electronic data sources for cell signaling modeling have developed. We suggest that future directions in such data sources are largely model-driven and are hinged on interoperability of data sources
BioSilicoSystems - A Multipronged Approach Towards Analysis and Representation of Biological Data (PhD Thesis)
The rising field of integrative bioinformatics provides the vital methods to integrate, manage and also to analyze the diverse data and allows gaining new and deeper insights and a clear understanding of the intricate biological systems. The difficulty is not only to facilitate the study of heterogeneous data within the biological context, but it also more fundamental, how to represent and make the available knowledge accessible. Moreover, adding valuable information and functions that persuade the user to discover the interesting relations hidden within the data is, in itself, a great challenge. Also, the cumulative information can provide greater biological insight than is possible with individual information sources. Furthermore, the rapidly growing number of databases and data types poses the challenge of integrating the heterogeneous data types, especially in biology. This rapid increase in the volume and number of data resources drive for providing polymorphic views of the same data and often overlap in multiple resources. 

In this thesis a multi-pronged approach is proposed that deals with various methods for the analysis and representation of the diverse biological data which are present in different data sources. This is an effort to explain and emphasize on different concepts which are developed for the analysis of molecular data and also to explain its biological significance. The hypotheses proposed are in context with various other results and findings published in the past. The approach demonstrated also explains different ways to integrate the molecular data from various sources along with the need for a comprehensive understanding and clear projection of the concept or the algorithm and its results, but with simple means and methods. The multifarious approach proposed in this work comprises of different tools or methods spanning significant areas of bioinformatics research such as data integration, data visualization, biological network construction / reconstruction and alignment of biological pathways. Each tool deals with a unique approach to utilize the molecular data for different areas of biological research and is built based on the kernel of the thesis. Furthermore these methods are combined with graphical representation that make things simple and comprehensible and also helps to understand with ease the underlying biological complexity. Moreover the human eye is often used to and it is more comfortable with the visual representation of the facts
Systematic reconstruction of TRANSPATH data into Cell System Markup Language
<p>Abstract</p> <p>Background</p> <p>Many biological repositories store information based on experimental study of the biological processes within a cell, such as protein-protein interactions, metabolic pathways, signal transduction pathways, or regulations of transcription factors and miRNA. Unfortunately, it is difficult to directly use such information when generating simulation-based models. Thus, modeling rules for encoding biological knowledge into system-dynamics-oriented standardized formats would be very useful for fully understanding cellular dynamics at the system level.</p> <p>Results</p> <p>We selected the TRANSPATH database, a manually curated high-quality pathway database, which provides a plentiful source of cellular events in humans, mice, and rats, collected from over 31,500 publications. In this work, we have developed 16 modeling rules based on hybrid functional Petri net with extension (HFPNe), which is suitable for graphical representing and simulating biological processes. In the modeling rules, each Petri net element is incorporated with Cell System Ontology to enable semantic interoperability of models. As a formal ontology for biological pathway modeling with dynamics, CSO also defines biological terminology and corresponding icons. By combining HFPNe with the CSO features, it is possible to make TRANSPATH data to simulation-based and semantically valid models. The results are encoded into a biological pathway format, Cell System Markup Language (CSML), which eases the exchange and integration of biological data and models.</p> <p>Conclusion</p> <p>By using the 16 modeling rules, 97% of the reactions in TRANSPATH are converted into simulation-based models represented in CSML. This reconstruction demonstrates that it is possible to use our rules to generate quantitative models from static pathway descriptions.</p
Using graph theory to analyze biological networks
Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system
ClockOME: searching for oscillatory genes in early vertebrate development
Embryo development is a dynamic process regulated in space and time. Cells must
integrate biochemical and mechanical signals to generate fully functional organisms, where
oscillatory gene expression plays a key role. The embryo molecular clock (EMC) is the best
known genetic oscillator active in embryo segmentation, involving genes from the Notch, FGF,
and WNT pathways. However, the list of cyclic genes is still incomplete mostly due to the
challenges involved with studying periodic systems. Recently, such studies have become more
feasible with the development of pseudo-time ordering algorithms that search for candidate
oscillatory genes using large transcriptomics datasets sampled without explicit time
measurements.
This study aims at finding candidate oscillatory genes - ClockOME - active in early
chick embryo development.
Two Gallus gallus microarray transcriptomics datasets from Presomitic mesoderm
(PSM), and one dataset from limb segmentation were gathered from GEO and ArrayExpress.
To normalize these data from different experiments, an RData package - FrozenChicken - was
developed to apply a frozen Robust MultiArray (fRMA) normalization to the data. Next the
datasets were processed with Oscope (a pseudo-time ordering algorithm) to search for candidate
periodic genes clustered by similar oscillatory behaviour. The clusters of predicted oscillators
were then subject to functional enrichment and interaction network analyses to highlight the
biological functions associated with these genes. Oscope predicted three clusters of oscillators:
two in PSM (106 and 32 genes), and one in Limb (162 genes). Overall, the genes are associated
with regulatory, morphological, and developmental processes. Mesp2, a gene involved with the
EMC, was found in this dataset, validating the approach, however, the majority of genes are
novel oscillatory candidates, associated with chromatin and transcriptional regulation, as well
as protein and oxygen metabolism. The list of candidate oscillators represents a valuable
resource for guided experimental validation to discover additional members of the chick EMC.
Six genes have been proposed for high-priority experimental validation: SRC, PTCH1,
NOTCH2, YAP1, KDR, CTR9.O desenvolvimento embrionário é um processo dinâmico que envolve alterações
moleculares no espaço e no tempo. As células embrionárias são constantemente expostas a
estímulos bioquímicos e mecânicos, e respondem ao ambiente em que se encontram alterando
o seu programa genético. Quando corretamente integradas, estas respostas celulares culminam
com o desenvolvimento bem-sucedido de um organismo funcional. Assim, a embriogénese
envolve processos moleculares estritamente regulados, sendo a expressão oscilatória de genes
uma das formas possíveis para a regulação do comportamento das células ao longo do tempo.
O relógio molecular embrionário é um conhecido oscilador genético, e está envolvido na
segmentação do tecido paraxial embrionário. O conceito de relógio molecular foi inicialmente
proposto em 1976 por Cooke e Zeeman, ao qual chamaram o modelo Clock and Wavefront
(Relógio e Frente de Onda)1. Este modelo foi concebido para descrever teoricamente a
formação rítmica de sómitos em ambos os lados da mesoderme paraxial (PSM) nos vertebrados,
e baseia-se na existência de osciladores genéticos que regulam esse processo de segmentação
da PSM ao longo do tempo. Para além do relógio, como diz o nome, o modelo inclui a existência
de uma frente de onda, que determina espacialmente o comportamento das células presentes na
mesoderme pré-somítica (PSM). Assim, os dois mecanismos guiam a diferenciação das células
da PSM, que consequentemente sofrem transformações genéticas que precedem a formação dos
sómitos. A base deste relógio molecular consiste na expressão periódica de genes que fazem
parte das vias moleculares Notch, FGF e WNT. Contudo, a lista de genes envolvidos no relógio
embrionário ainda não se encontra completa, facto este que se deve principalmente às
dificuldades experimentais relacionadas com o estudo de sistemas periódicos quando não se
conhece de antemão a periodicidade/ritmo da expressão dos genes envolvidos.
Com o advento de novas técnicas de transcriptómica que permitem o estudo dos valores
de expressão de todos os genes simultaneamente, nomeadamente usando Microarrays, ou mais
recentemente através de métodos de sequenciação, como RNA-sequencing ou Single-Cell
RNA-sequencing, surge a oportunidade de procurar alargar a lista de genes com expressão
oscilatória. Porém, estes métodos implicam a extração do RNA das células amostradas
resultando na morte celular. Assim, este processamento inviabiliza o estudo das mesmas células
ao longo do tempo, originando dados moleculares estáticos, isto é, os níveis de expressão
obtidos representam uma única amostra temporal. Para o estudo de processos periódicos, seria
então necessário fazer uma série temporal amostrando diferentes indivíduos ao longo do tempo de desenvolvimento, aumentando grandemente o número de amostras biológicas necessárias
para resolver o ciclo de oscilação para cada gene estudado.
Assim, sem informação temporal medida explicitamente, a expressão oscilatória de
genes pode apenas ser estudada usando modelos matemáticos apropriados, nomeadamente
através da aplicação de algoritmos de ordenação pseudo-temporal. Estes métodos ordenam as
amostras ao longo do tempo de uma oscilação de forma a obter o padrão do comportamento
cíclico para todos os genes cuja expressão oscila concomitantemente. Torna-se assim possível,
bioinformaticamente, inferir o potencial oscilatório de genes medidos por estas técnicas de
transcriptómica, sem informação temporal explícita.
Deste modo, o objetivo deste estudo é encontrar novos genes oscilatórios, a que
coletivamente chamamos ClockOME, que estão ativos durante as primeiras etapas do
desenvolvimento embrionário (somitogénese) da galinha, nos tecidos da mesoderme présomítica
(PSM), e no membro superior (Limb); tecidos estes onde o relógio molecular foi
descrito, atuando como regulador temporal das alterações genéticas subjacentes.
Para tal, recolheu-se 3 conjuntos de dados (datasets) de transcriptómica obtidos por
microarray de dois repositórios de dados públicos: GEO (da instituição americana NCBI) e
ArrayExpress (da instituição europeia EMBL-EBI). Dois datasets continham dados de
mesoderme paraxial (PSM) – tecido onde ocorre a somitogénese; e um dataset de dados de
obtidos do membro superior do embrião de galinha. Com o objetivo de normalizar os três
datasets de forma a torná-los comparáveis (uma vez que são oriundos de processos
experimentais diferentes), foi desenvolvido um pacote de R denominado “FrozenChicken:
Promoting the meta-analysis of chicken microarray data” (publicado em 2021)
(https://doi.org/10.1101/2021.02.25.432894). Este pacote contém dados sumarizados de 472
datasets de microarrays de embriões de galinha, tornando possível a normalização por fRMA
(frozen Robust MultiArray) de microarrays de Gallus gallus. Após normalização e controlo de
qualidade dos valores de expressão genética, os dados da PSM e do membro foram processados
com o Oscope (algoritmo de ordenação pseudo-temporal), com o propósito de prever genes
oscilatórios. Este algoritmo avalia todas as combinações de pares de genes, agrupando aqueles
que apresentem padrões de expressão semelhantes, ou seja, cujos valores de expressão ao longo
das amostras seguem trajetórias semelhantes, indiciando um período de oscilação
potencialmente semelhante. Os clusters de genes previstos pelo Oscope foram posteriormente submetidos a uma análise de enriquecimento funcional e a uma análise de interações funcionais,
com o intuito de perceber o seu potencial papel biológico, e funções moleculares subjacentes.
O Oscope reportou três listas de genes potencialmente oscilatórios: dois grupos foram
encontrados a partir dos dados da PSM (com 106 e 32 genes cada) e o terceiro grupo de 162
genes foi encontrado nos dados do membro superior. No total, a lista de genes que
denominamos ClockOME é composta por 296 genes potencialmente oscilatórios, envolvidos
em diversos mecanismos regulatórios importantes para o desenvolvimento embrionário e para
a morfogénese. A maioria dos genes presentes nesta lista não estão descritos na literatura como
sendo oscilatórios (novel candidates), representando, portanto, uma mais-valia para a
comunidade científica que estuda o relógio molecular embrionário. Estes genes parecem estar
associados a funções como remodelação da cromatina, regulação da transcrição, metabolismo
proteico e metabolismo do oxigénio, sendo, portanto, bons candidatos para futura validação
experimental. Notavelmente, o Oscope identificou com sucesso o Mesp2, um gene oscilatório
bem descrito na literatura, mostrando assim a validade e o potencial desta abordagem teórica.
Em suma, este trabalho produziu uma lista de 296 genes potencialmente oscilatórios.
Com base na sua novidade e na função molecular anotada, foi proposta uma lista de seis genes
candidatos de particular relevância para validação experimental no futuro próximo,
nomeadamente: SRC, PTCH1, NOTCH2, YAP1, KDR, CTR9. Assim, as listas resultantes do
trabalho desta tese poderão agora guiar futuras experiências laboratoriais capazes de adicionar
novos interactores moleculares ao atual modelo do relógio molecular embrionário
- …