78 research outputs found

    Discovering time-lagged rules from microarray data using gene profile classifiers

    Get PDF
    Background: Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes.Results: This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations.Conclusions: A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation. © 2011 Gallo et al; licensee BioMed Central Ltd.Fil: Gallo, Cristian Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentin

    Microarray Data Mining and Gene Regulatory Network Analysis

    Get PDF
    The novel molecular biological technology, microarray, makes it feasible to obtain quantitative measurements of expression of thousands of genes present in a biological sample simultaneously. Genome-wide expression data generated from this technology are promising to uncover the implicit, previously unknown biological knowledge. In this study, several problems about microarray data mining techniques were investigated, including feature(gene) selection, classifier genes identification, generation of reference genetic interaction network for non-model organisms and gene regulatory network reconstruction using time-series gene expression data. The limitations of most of the existing computational models employed to infer gene regulatory network lie in that they either suffer from low accuracy or computational complexity. To overcome such limitations, the following strategies were proposed to integrate bioinformatics data mining techniques with existing GRN inference algorithms, which enables the discovery of novel biological knowledge. An integrated statistical and machine learning (ISML) pipeline was developed for feature selection and classifier genes identification to solve the challenges of the curse of dimensionality problem as well as the huge search space. Using the selected classifier genes as seeds, a scale-up technique is applied to search through major databases of genetic interaction networks, metabolic pathways, etc. By curating relevant genes and blasting genomic sequences of non-model organisms against well-studied genetic model organisms, a reference gene regulatory network for less-studied organisms was built and used both as prior knowledge and model validation for GRN reconstructions. Networks of gene interactions were inferred using a Dynamic Bayesian Network (DBN) approach and were analyzed for elucidating the dynamics caused by perturbations. Our proposed pipelines were applied to investigate molecular mechanisms for chemical-induced reversible neurotoxicity

    GeRNet: A Gene Regulatory Network Tool

    Get PDF
    Gene regulatory networks (GRNs) are crucial in every process of life since they govern the majority of the molecular processes. Therefore, the task of assembling these networks is highly important. In particular, the so called model-free ap-proaches have an advantage modeling the complexities of dynamic molecular networks, since most of the gene networks are hard to be mapped with accuracy by any other mathematical model. A highly abstract model-free approach, called rule-based approach, offers several advantages performing data-driven analysis; such as the requirement of the least amount of data. They also have an important ability to perform inferences: its simplicity allows the inference of large size mod-els with a higher speed of analysis. However, regarding these techniques, the re-construction of the relational structure of the network is partial, hence incomplete, for an effective biological analysis. This situation motivated us to explore the possibility of hybridizing with other approaches, such as biclustering techniques. This led to incorporate a biclustering tool that finds new relations between these nodes of the GRN. In this work we present a new software, called GeRNeT that integrates the algorithms of GRNCOP2 and BiHEA along a set of tools for interactive visualization, statistical analysis and ontological enrichment of the resulting GRNs. In this regard, results associated with Alzheimer disease datasets are pre-sented that show the usefulness of integrating both bioinformatics tools.Fil: Dussaut, Julieta Sol. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Gallo, Cristian Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Martínez, María Jimena. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentin

    Associating expression and genomic data using co-occurrence measures

    Get PDF
    Recent technological evolutions have led to an exponential increase in data in all the omics fields. It is expected that integration of these different data sources, will drastically enhance our knowledge of the biological mechanisms behind genomic diseases such as cancer. However, the integration of different omics data still remains a challenge. In this work we propose an intuitive workflow for the integrative analysis of expression, mutation and copy number data taken from the METABRIC study on breast cancer. First, we present evidence that the expression profile of many important breast cancer genes consists of two modes or regimes', which contain important clinical information. Then, we show how the co-occurrence of these expression regimes can be used as an association measure between genes and validate our findings on the TCGA-BRCA study. Finally, we demonstrate how these co-occurrence measures can also be applied to link expression regimes to genomic aberrations, providing a more complete, integrative view on breast cancer. As a case study, an integrative analysis of the identified MLPH-FOXA1 association is performed, illustrating that the obtained expression associations are intimately linked to the underlying genomic changes

    Extracting transcriptional regulatory information from DNA microarray expression data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2003.Includes bibliographical references.(cont.) As a model system, we have chosen the unicellular, photoautotrophic cyanobacteria Synechocystis sp. PCC6803 for study, as it is 1) fully sequenced, 2) has an easily manipulated input signal (light for photosynthesis), and 3) fixes carbon dioxide into the commercially interesting, biodegradable polymer polyhydroxyalkanoate (PHA). We have created DNA microarrays with [approximately]97% of the Synechocystis genome represented in duplicate to monitor the cellular transcriptional profile. These arrays are used in time-series experiments of differing light levels to measure dynamic transcriptional response to changing environmental conditions. We have developed networks of potential genetic regulatory interactions through time-series analysis based on the data from our studies. An algorithm for combining gene position information, clustering, and time-lagged correlations has been created to generate networks of hypothetical biological links. Analysis of these networks indicates that good correlation exists between the input signal and certain groups of photosynthesis- and metabolism-related genes. Furthermore, this analysis technique placed these in a temporal context, showing the sequence of potential effects from changes in the experimental conditions. This data and hypothetical interaction networks have been used to construct AutoRegressive with eXogenous input (ARX) models. These provide dynamic, state-space models for prediction of transcriptional profiles given a dynamically changing set of environmental perturbations...Recent technological developments allow all the genes of a species to be monitored simultaneously at the transcriptional level. This necessitates a more global approach to biology that includes consideration of complex interactions between many genes and other intracellular species. The metaphor of a cell as a miniature chemical plant with inputs, outputs, and controls gives chemical engineers a foothold in this type of analysis. Networks of interacting genes are fertile ground for the application of the methods developed by engineers for the analysis and monitoring of industrial chemical processes. The DNA microarray has been established as a tool for efficient collection of mRNA expression data for a large number of genes simultaneously. Although great strides have been made in the methodology and instrumentation of this technique, the development of computational tools needed to interpret the results have received relatively inadequate attention. Existing analyses, such a clustering techniques applied to static data from cells at many different states, provide insight into co-expression of genes and are an important basis for exploration of the cell's genetic programming. We propose that an even greater level of regulatory detail may be gained by dynamically changing experimental conditions (the input signal) and measuring the time-delayed response of the genes (the output signal). The addition of temporal information to DNA microarray experiments should suggest potential cause/effect relationships among genes with significant regulatory responses to the conditions of interest. This thesis aims to develop computational techniques to maximize the information gained from such dynamic experiments.by William A. Schmitt, Jr.Ph.D

    Diseño de algoritmos evolutivos híbridos optimizados para biclustering : Línea de investigación

    Get PDF
    El objetivo general de esta línea de investigación consiste en diseñar nuevas técnicas computacionales que ayuden a descubrir potenciales conexiones entre datos presentados en forma de matriz pertenecientes a distintos campos de aplicación. Más específicamente, se planea desarrollar una estrategia evolutiva hibridada con búsqueda local especialmente diseñada para bilcustering de datos. En tal sentido, se busca desarrollar una herramienta que pueda asistir a investigadores de distintas disciplinas en la inferencia de relaciones entre datos procedentes de grandes volúmenes de información.Eje: Agentes y Sistemas Inteligentes.Red de Universidades con Carreras en Informática (RedUNCI

    Model-Based Genomic/Proteomic Signal Processing in Cancer Diagnosis and Prediction

    Get PDF
    In recent years, high throughput measurement technologies (gene microarray, protein mass spectrum) have made it possible to simultaneously monitor the expression of thousands of genes or proteins. A topic of great interest is to study the difference of gene/protein expressions between normal and cancer subjects. In the literature, various data-driven methods have been proposed, i.e. clustering and machine learning methods. In this thesis, an alternative model-driven approach is proposed. The proposed dependence model focuses on the interactions among genes or proteins. We have shown that the dependence model is highly effective in the classification of normal and cancer data. Moreover, different from data-driven methods, the dependence model carries specific biological meanings, and it has the potential for the early prediction of cancer. The concept of dependence network is proposed based on the dependence model. The interactions and co-regulation relationships among genes or proteins are modeled by the dependence network, from which we are able to reliably identify biomarkers, important genes or proteins for cancer prediction and drug development. The analysis extends to cell cycle time-series, where one subject is measured at multiple time points during the cell cycle. Understanding the cell cycle will greatly improve our understanding of the mechanism of cancer development. In the cell cycle time-series, measurements are based on a population of cells which are supposed to be synchronized. However, continuous synchronization loss is observed due to the diversity of individual cell growth rates. Therefore, the time-series measurement is a distorted version of the single-cell expression. In this thesis, we propose a polynomial-model-based resynchronization scheme, which successfully removes the distortion. The time-series data is further analyzed to identify gene regulatory relationships. For the identification of regulatory relationships, existing literatures mainly study the relationship between several regulators and one regulated gene. In this thesis, we use the eigenvalue pattern of the dependence model to characterize several regulated genes, and propose a novel method that examines the relationship between several regulator and several regulated genes simultaneously

    Gene Network Biological Validity Based on Gene-Gene Interaction Relevance

    Get PDF
    In recent years, gene networks have become one of the most useful tools for modeling biological processes. Many inference gene network algorithms have been developed as techniques for extracting knowledge from gene expression data. Ensuring the reliability of the inferred gene relationships is a crucial task in any study in order to prove that the algorithms used are precise. Usually, this validation process can be carried out using prior biological knowledge. The metabolic pathways stored in KEGG are one of the most widely used knowledgeable sources for analyzing relationships between genes. This paper introduces a new methodology, GeneNetVal, to assess the biological validity of gene networks based on the relevance of the gene-gene interactions stored in KEGG metabolic pathways. Hence, a complete KEGG pathway conversion into a gene association network and a new matching distance based on gene-gene interaction relevance are proposed. The performance of GeneNetVal was established with three different experiments. Firstly, our proposal is tested in a comparative ROC analysis. Secondly, a randomness study is presented to show the behavior of GeneNetVal when the noise is increased in the input network. Finally, the ability of GeneNetVal to detect biological functionality of the network is shown

    Identifying the molecular components that matter: a statistical modelling approach to linking functional genomics data to cell physiology

    Get PDF
    Functional genomics technologies, in which thousands of mRNAs, proteins, or metabolites can be measured in single experiments, have contributed to reshape biological investigations. One of the most important issues in the analysis of the generated large datasets is the selection of relatively small sub-sets of variables that are predictive of the physiological state of a cell or tissue. In this thesis, a truly multivariate variable selection framework using diverse functional genomics data has been developed, characterized, and tested. This framework has also been used to prove that it is possible to predict the physiological state of the tumour from the molecular state of adjacent normal cells. This allows us to identify novel genes involved in cell to cell communication. Then, using a network inference technique networks representing cell-cell communication in prostate cancer have been inferred. The analysis of these networks has revealed interesting properties that suggests a crucial role of directional signals in controlling the interplay between normal and tumour cell to cell communication. Experimental verification performed in our laboratory has provided evidence that one of the identified genes could be a novel tumour suppressor gene. In conclusion, the findings and methods reported in this thesis have contributed to further understanding of cell to cell interaction and multivariate variable selection not only by applying and extending previous work, but also by proposing novel approaches that can be applied to any functional genomics data

    Validación de modelos genéticos en bioinformática: implementación y visualización

    Get PDF
    Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111Since the human genome was completely sequenced for the first time, the great scientific and technological advances in the biotechnology industry have greatly reduced the cost of experiments while significantly improving results. This has led to an exponential growth in the biological information available and, due to this huge amount of information, researchers are faced with mountains of data with only flakes of knowledge. Approaches as Knowledge Database Discovery (KDD) are used to generate models that allows researcher to gain knowledge about complex biological systems. Gene networks arose as a straightforward way of representing gene sets including their interactions. They are presented as a network structure where each node represents a gene or gene product (protein) while each edge denotes the relationship between the nodes at its ends. The concrete nature of each relationship and the meaning of its weight depend on the network architecture and the inference algorithm used. A gene network is an abstraction that facilitates the study of its underlying biological system. They are easy to visualize, and they are informative on their own. Gene networks have been successfully used in clinical diagnosis and a large number of inferred interactions have been confirmed experimentally, thus confirming their reliability. The inference of gene networks has also allowed a better understanding of fundamental processes that occur in living organisms such as development or nutrition and metabolic coordination. Research has focused on inferring these networks using different experimental and computational techniques, as well as analyzing those networks to extract knowledge. Also, a significant number of methods have been developed to validate the inferred networks in order to verify their quality and reliability. All the methodologies of gene network inference, analysis, and validation are based on algorithms and computer tools. Given the increasing importance and popularity of these computational approaches, it becomes increasingly critical to ensure that the software is usable and accessible, as these features provide the basis for the reproducibility of published biomedical research. Based on the existing need for automatic techniques of inference, analysis and validation of models for the study of interactions between genes and the deficiencies in existing techniques, this work presents different novel approaches for the inference, analysis and validation of genetic models, especially gene networks, with a special emphasis on the usability and accessibility of the proposed solutions.Universidad Pablo de Olavide de Sevilla. Escuela de Doctorad
    • …
    corecore