1,116 research outputs found

    A model-based optimization framework for the inference of regulatory interactions using time-course DNA microarray expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Proteins are the primary regulatory agents of transcription even though mRNA expression data alone, from systems like DNA microarrays, are widely used. In addition, the regulation process in genetic systems is inherently non-linear in nature, and most studies employ a time-course analysis of mRNA expression. These considerations should be taken into account in the development of methods for the inference of regulatory interactions in genetic networks.</p> <p>Results</p> <p>We use an S-system based model for the transcription and translation process. We propose an optimization-based regulatory network inference approach that uses time-varying data from DNA microarray analysis. Currently, this seems to be the only model-based method that can be used for the analysis of time-course "relative" expressions (expression ratios). We perform an analysis of the dynamic behavior of the system when the number of experimental samples available is varied, when there are different levels of noise in the data and when there are genes that are not considered by the experimenter. Our studies show that the principal factor affecting the ability of a method to infer interactions correctly is the similarity in the time profiles of some or all the genes. The less similar the profiles are to each other the easier it is to infer the interactions. We propose a heuristic method for resolving networks and show that it displays reasonable performance on a synthetic network. Finally, we validate our approach using real experimental data for a chosen subset of genes involved in the sporulation cascade of <it>Bacillus anthracis</it>. We show that the method captures most of the important known interactions between the chosen genes.</p> <p>Conclusion</p> <p>The performance of any inference method for regulatory interactions between genes depends on the noise in the data, the existence of unknown genes affecting the network genes, and the similarity in the time profiles of some or all genes. Though subject to these issues, the inference method proposed in this paper would be useful because of its ability to infer important interactions, the fact that it can be used with time-course DNA microarray data and because it is based on a non-linear model of the process that explicitly accounts for the regulatory role of proteins.</p

    Improved gene expression programming to solve the inverse problem for ordinary differential equations

    Get PDF
    Many complex systems in the real world evolve with time. These dynamic systems are often modeled by ordinary differential equations in mathematics. The inverse problem of ordinary differential equations is to convert the observed data of a physical system into a mathematical model in terms of ordinary differential equations. Then the modelay be used to predict the future behavior of the physical system being modeled. Genetic programming has been taken as a solver of this inverse problem. Similar to genetic programming, gene expression programming could do the same job since it has a similar ability of establishing the model of ordinary differential systems. Nevertheless, such research is seldom studied before. This paper is one of the first attempts to apply gene expression programming for solving the inverse problem of ordinary differential equations. Based on a statistic observation of traditional gene expression programming, an improvement is made in our algorithm, that is, genetic operators should act more often on the dominant part of genes than on the recessive part. This may help maintain population diversity and also speed up the convergence of the algorithm. Experiments show that this improved algorithm performs much better than genetic programming and traditional gene expression programming in terms of running time and prediction precisio

    Developing methods for the context-specific reconstruction of metabolic models of cancer cells

    Get PDF
    Dissertação de mestrado em BioinformáticaThe recent advances in genome sequencing technologies and other high-throughput methodologies allowed the identification and quantification of individual cell components. These efforts led to the development of genome-scale metabolic models (GSMMs), not only for humans but also for several other organisms. These models have been used to predict cellular metabolic phenotypes under a variety of physiological conditions and contexts, proving to be useful in tasks such as drug discovery, biomarker identification and interactions between hosts and pathogens. Therefore, these models provide a useful tool for targeting diseases such as cancer, Alzheimer or tuberculosis. However, the usefulness of GSSMs is highly dependent on their capabilities to predict phenotypes in the array of different cell types that compose the human body, making the development of tissue/context-specific models mandatory. To address this issue, several methods have been proposed to integrate omics data, such as transcriptomics or proteomics, to improve the phenotype prediction abilities of GSSMs. Despite these efforts, these methods still have some limitations. In most cases, their usage is locked behind commercially licensed software platforms, or not available in a user-friendly fashion, thus restricting their use to users with programming or command-line knowledge. In this work, an open-source tool was developed for the reconstruction of tissue/context-specific models based on a generic template GSMM and the integration of omics data. The Tissue-Specific Model Reconstruction (TSM-Rec) tool was developed under the Python programming language and features the FASTCORE algorithm for the reconstruction of tissue/context-specific metabolic models. Its functionalities include the loading of omics data from a variety of omics databases, a set of filtering and transformation methods to adjust the data for integration with a template metabolic model, and finally the reconstruction of tissue/context-specific metabolic models. To evaluate the functionality of the developed tool, a cancer related case-study was carried. Using omics data from 314 glioma patients, the TSM-Rec tool was used to reconstruct metabolic models of different grade gliomas. A total of three models were generated, corresponding to grade II, III and IV gliomas. These models were analysed regarding their differences and similarities in reactions and pathways. This comparison highlighted biological processes common to all glioma grades, and pathways that are more prominent in each glioma model. The results show that the tool developed during this work can be useful for the reconstruction of cancer metabolic models, in a search for insights into cancer metabolism and possible approaches towards drug-target discovery.Os avanços recentes nas tecnologias de sequenciação de genomas e noutras metodologias experimentais de alto rendimento permitiram a identificação e quantificação dos diversos componentes celulares. Estes esforços levaram ao desenvolvimento de Modelos Metabólicos à Escala Genómica (MMEG) não só de humanos, mas também de diversos organismos. Estes modelos têm sido utilizados para a previsão de fenótipos metabólicos sob uma variedade de contextos e condições fisiológicas, mostrando a sua utilidade em áreas como a descoberta de fármacos, a identificação de biomarcadores ou interações entre hóspede e patógeno. Desta forma, estes modelos revelam-se ferramentas úteis para o estudo de doenças como o cancro, Alzheimer ou a tuberculose. Contudo, a utilidade dos MMEG está altamente dependente das suas capacidades de previsão de fenótipos nos diversostipos celulares que compõem o corpo humano, tornando o desenvolvimento de modelos específicos de tecidos uma tarefa obrigatória. Para resolver este problema, vários métodos têm proposto a integração de dados ómicos como os de transcriptómica ou proteómica para melhorar as capacidades preditivas dos MMEG. Apesar disso, estes métodos ainda sofrem de algumas limitações. Na maioria dos casos o seu uso está confinado a plataformas ou softwares com licenças comerciais, ou não está disponível numa ferramenta de fácil uso, limitando a sua utilização a utilizadores com conhecimentos de programação ou de linha de comandos. Neste trabalho, foi desenvolvida uma ferramenta de acesso livre para a reconstrução de modelos metabólicos específicos para tecidos tendo por base um MMEG genérico e a integração de dados ómicos. A ferramenta TSM-Rec (Tissue-Specific Model Reconstruction), foi desenvolvida na linguagem de programação Python e recorre ao algoritmo FASTCORE para efetuar a reconstrução de modelos metabólicos específicos. As suas funcionalidades permitem a leitura de dados ómicos de diversas bases de dados ómicas, a filtragem e transformação dos mesmos para permitir a sua integração com um modelo metabólico genérico e por fim, a reconstrução de modelos metabólicos específicos. De forma a avaliar o funcionamento da ferramenta desenvolvida, esta foi aplicada num caso de estudo de cancro. Recorrendo a dados ómicos de 314 pacientes com glioma, usou-se a ferramenta TSM-Rec para a reconstrução de modelos metabólicos de gliomas de diferentes graus. No total, foram desenvolvidos três modelos correspondentes a gliomas de grau II, grau III e grau IV. Estes modelos foram analisados no sentido de perceber as diferenças e as similaridades entre as reações e as vias metabólicas envolvidas em cada um dos modelos. Esta comparação permitiu isolar processos biológicos comuns a todos os graus de glioma, assim como vias metabólicas que se destacam em cada um dos graus. Os resultados obtidos demonstram que a ferramenta desenvolvida pode ser útil para a reconstrução de modelos metabólicos de cancro, na procura de um melhor conhecimento do metabolismo do cancro e possíveis abordagens para a descoberta de fármacos

    PPINGUIN: Peptide Profiling Guided Identification of Proteins improves quantitation of iTRAQ ratios

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent development of novel technologies paved the way for quantitative proteomics. One of the most important among them is iTRAQ, employing isobaric tags for relative or absolute quantitation. Despite large progress in technology development, still many challenges remain for derivation and interpretation of quantitative results. One of these challenges is the consistent assignment of peptides to proteins.</p> <p>Results</p> <p>We have developed Peptide Profiling Guided Identification of Proteins (PPINGUIN), a statistical analysis workflow for iTRAQ data addressing the problem of ambiguous peptide quantitations. Motivated by the assumption that peptides uniquely derived from the same protein are correlated, our method employs clustering as a very early step in data processing prior to protein inference. Our method increases experimental reproducibility and decreases variability of quantitations of peptides assigned to the same protein. Giving further support to our method, application to a type 2 diabetes dataset identifies a list of protein candidates that is in very good agreement with previously performed transcriptomics meta analysis. Making use of quantitative properties of signal patterns identified, PPINGUIN can reveal new isoform candidates.</p> <p>Conclusions</p> <p>Regarding the increasing importance of quantitative proteomics we think that this method will be useful in practical applications like model fitting or functional enrichment analysis. We recommend to use this method if quantitation is a major objective of research.</p

    An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

    Get PDF
    Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy

    Study of digital signal processing tools to infer gene regulatory networks from microarrays

    Get PDF
    [ANGLÈS] Since the mid-1990's, the field of genomic signal processing has exploded due to the development of DNA microarray technology, which made possible the measurement of mRNA expression of thousands of genes in parallel. Researchers had developed a vast body of knowledge in classification methods. The scientific community has developed a broad knowledge of the individual parts involved in the operation of a cell, but we still do not understand how these individual parts interact. For this reason a new type of analysis of the microarray data called Pathways analysis has been developed. This approach considers that genes work together in cascades and do not act for themselves in a biological system. The activity of the genes in a cell is controlled by the gene regulatory networks, which consist of the union and interconnection of the various pathways. This thesis is placed in the field of computer systems and signal processing applied to biology and aims to study and develop methods to infer the relationship of genes in a large-scale gene network topology where regulation is not known, and must be inferred from experimental data. First, we present a review and a comparison of the different methods in the state of the art that have tried to solve this challenge with different approaches: Gene networks based in co-expression, information-theoretic approach, bayesian networks, and finally the one based on differential equations. Secondly, we present an exhaustive study of two selected techniques, the Z-score and Zavlanos algorithms, in order to analyze their strengths and drawbacks. The chosen methods have been tested on two public datasets: the SOS pathway and a synthetic dataset simulated by computer. The proposed approach obtains good identification results, confirming the goodness of the approach. And finally, we present an analysis of the ability of the inferred network to predict the behavior of the system to an external perturbation. Also a new approach to boost the identification performance is presented. It is based on an ensemble decision paradigm. It is a preliminary idea but even though, we have found some promising results that demonstrate the potential of the approach.[CASTELLÀ] Desde mediados de los noventa, el campo de la genómica fue revolucionado debido al desarrollo de la tecnología de los DNA microarrays, el cual hizo posible la medición de la expresión de mRNA de miles de genes en paralelo. Los investigadores han desarrollado un vasto conocimiento en los métodos de clasificación. Y aunque la comunidad científica tiene un amplio conocimiento de las distintas partes implicadas en el funcionamiento de una célula, todavía no han logrado entender cómo estas partes individuales interactúan. Por esta razón, un nuevo tipo de análisis de los datos de microarrays llamado análisis de rutas metabólicas se está desarrollando. Este enfoque considera que los genes trabajan conjuntamente y que no actúan por sí mismos en un sistema biológico. La actividad de los genes en una célula está controlada por las redes reguladoras de genes, que consisten en la unión y la interconexión de las diversas rutas metabólicas. Esta tesis se sitúa en el campo del procesamiento de señal aplicada a la biología y tiene como objetivo estudiar y desarrollar métodos para inferir la relación de los genes en una topología de genes a gran escala donde la regulación es desconocida, y debe ser inferida a partir de datos experimentales. En primer lugar, se presenta una revisión y una comparación de los diferentes métodos en el estado del arte, que han tratado de resolver este problema con diferentes enfoques: las redes de genes basadas en la co-expresión, la teoría de la información, las redes bayesianas, y finalmente uno basado en ecuaciones diferenciales. En segundo lugar, se presenta un estudio exhaustivo de las dos técnicas seleccionadas, los algoritmos Z-score y de Zavlanos, con el fin de analizar sus puntos fuertes y débiles. Los métodos elegidos han sido probados en dos conjuntos de datos públicos: el SOS pathway y un conjunto de datos sintéticos simulados por ordenador. El método propuesto permite obtener buenos resultados de identificación, lo que confirma la bondad del enfoque escogido. Y, por último, se presenta un análisis de la capacidad para predecir el comportamiento del sistema ante una perturbación externa de la red inferida. Además, se aplica un nuevo enfoque para mejorar la identificación. Está basado en un paradigma de decisión conjunta. Es una idea preliminar, pero a pesar de ello, se han encontrado algunos resultados prometedores que demuestran el potencial de este enfoque.[CATALÀ] Des de mitjans dels anys noranta, el camp de la genòmica va ser revolucionat gràcies al desenvolupament de la tecnologia dels DNA microarrays, la qual va fer possible el mesurament de l'expressió de mRNA de milers de gens en paral·lel. Els investigadors han desenvolupat un vast coneixement en els mètodes de classificació i encara que la comunitat científica té un ampli coneixement de les diferents parts implicades en el funcionament d'una cèl·lula, encara no han aconseguit entendre com aquestes parts individuals interactuen. Per això, un nou tipus d'anàlisi de les dades de microarrays anomenat anàlisi de rutes metabòliques s'està desenvolupant. Aquesta tècnica considera que els gens treballen conjuntament i que no actuen per si mateixos a un sistema biològic. L'activitat dels gens en una cèl·lula està controlada per les xarxes reguladores de gens, que consisteixen en la unió i la interconnexió de les diverses rutes metabòliques. Aquesta tesi se situa en el camp de la processament del senyal aplicat a la biologia i té com a objectiu estudiar i desenvolupar mètodes per inferir la relació dels gens en una topologia de gens a gran escala on la regulació és desconeguda, i ha de ser inferida a partir de dades experimentals. En primer lloc, es presenta una revisió i una comparació dels diferents mètodes presents a l'estat de l'art, que han tractat de resoldre aquest problema amb diferents enfocaments: les xarxes de gens basats en la coexpressió, la teoria de la informació, les xarxes bayesianes, i finalment un basat en equacions diferencials. En segon lloc, es presenta un estudi exhaustiu de les dues tècniques seleccionades, els algoritmes Z-score i de Zavlanos, amb la finalitat d'analitzar els seus punts forts i febles. Els mètodes escollits han estat testats amb dos conjunts de dades públiques: el SOS Pathway i un conjunt de dades sintètiques simulades per ordinador. El mètode proposat permet obtindre bons resultats d'identificació, el que confirma la bondat de la tècnica escollida. I, finalment, es presenta una anàlisi de la capacitat de predir el comportament del sistema davant d'una pertorbació externa de la xarxa inferida. A més, es presenta una nova tècnica per millorar la identificació. Es basa en un paradigma de decisió conjunta. És una idea preliminar, però tot i així, s'han trobat alguns resultats prometedors que demostren el potencial de la idea

    Investigation of the Regulatory Roles of Micrornas by Systems Biology Approaches

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Numerical evidence for relevance of disorder in a Poland-Scheraga DNA denaturation model with self-avoidance: Scaling behavior of average quantities

    Full text link
    We study numerically the effect of sequence heterogeneity on the thermodynamic properties of a Poland-Scheraga model for DNA denaturation taking into account self-avoidance, i.e. with exponent c_p=2.15 for the loop length probability distribution. In complement to previous on-lattice Monte Carlo like studies, we consider here off-lattice numerical calculations for large sequence lengths, relying on efficient algorithmic methods. We investigate finite size effects with the definition of an appropriate intrinsic length scale x, depending on the parameters of the model. Based on the occurrence of large enough rare regions, for a given sequence length N, this study provides a qualitative picture for the finite size behavior, suggesting that the effect of disorder could be sensed only with sequence lengths diverging exponentially with x. We further look in detail at average quantities for the particular case x=1.3, ensuring through this parameter choice the correspondence between the off-lattice and the on-lattice studies. Taken together, the various results can be cast in a coherent picture with a crossover between a nearly pure system like behavior for small sizes N < 1000, as observed in the on-lattice simulations, and the apparent asymptotic behavior indicative of disorder relevance, with an (average) correlation length exponent \nu_r >= 2/d (=2).Comment: Latex, 33 pages with 15 postscript figure
    corecore