3,080 research outputs found

    Estimation and Detection of Multivariate Gene Regulatory Relationships

    Get PDF
    The Coefficient of Determination (CoD) plays an important role in Genomics problems, for instance, in the inference of gene regulatory networks from gene- expression data. However, the inference theory about CoD has not been investigated systematically. In this dissertation, we study the inference of discrete CoD from both frequentist and Bayesian perspectives, with its applications to system identification problems in Genomics. From a frequentist viewpoint, we provide a theoretical framework for CoD estimation by introducing nonparametric CoD estimators and parametric maximum-likelihood (ML) CoD estimators based on static and dynamical Boolean models. Inference algorithms are developed to discover gene regulatory relationships, and numerical examples are provided to validate preferable performance of the ML approach with access to sufficient prior knowledge. To make the applications of the CoD independent of user-selectable thresholds, we describe rigorous multiple testing procedures to investigate significant regulatory relation- ships among genes using the discrete CoD, and to discover canalyzing genes using the intrinsically multivariate prediction (IMP) criterion. We develop practical statistic tools that are open to the scientific community. On the other hand, we propose a Bayesian framework for the inference of the CoD across a parametrized family of joint distributions between target and predictors. Examples of applications of the Bayesian approach are provided against those of nonparametric and parametric approaches by using synthetic data. We have found that, with applications to system identification problems in Genomics, both parametric and Bayesian CoD estimation approaches outperform the nonparametric approaches. Hence, we conclude that parametric and Bayesian estimation approaches are preferred when we have partial knowledge about gene regulation. On the other hand, we have shown that the two proposed statistical testing frameworks can detect well-known gene regulation and canalyzing genes like p53 and DUSP1 from real data sets, respectively. This indicates that our methodology could serve as a promising tool for the detection of potential gene regulatory relationships and canalyzing genes. In one word, this dissertation is intended to serve as foundation for a detailed study of applications of CoD estimation in Genomics and related fields

    Reconstructing Generalized Logical Networks of Transcriptional Regulation in Mouse Brain from Temporal Gene Expression Data

    Get PDF
    Gene expression time course data can be used not only to detect differentially expressed genes but also to find temporal associations among genes. The prsoblem of reconstructing generalized logical networks to account for temporal dependencies among genes and environmental stimuli from transcriptomic data is addressed. A network reconstruction algorithm was developed that uses statistical significance as a criterion for network selection to avoid false-positive interactions arising from pure chance. The multinomial hypothesis testing-based network reconstruction allows for explicit specification of the false-positive rate, unique from all extant network inference algorithms. The method is superior to dynamic Bayesian network modeling in a simulation study. Temporal gene expression data from the brains of alcohol-treated mice in an analysis of the molecular response to alcohol are used for modeling. Genes from major neuronal pathways are identified as putative components of the alcohol response mechanism. Nine of these genes have associations with alcohol reported in literature. Several other potentially relevant genes, compatible with independent results from literature mining, may play a role in the response to alcohol. Additional, previously unknown gene interactions were discovered that, subject to biological verification, may offer new clues in the search for the elusive molecular mechanisms of alcoholism

    Inferring Gene Regulatory Networks from Time Series Microarray Data

    Get PDF
    The innovations and improvements in high-throughput genomic technologies, such as DNA microarray, make it possible for biologists to simultaneously measure dependencies and regulations among genes on a genome-wide scale and provide us genetic information. An important objective of the functional genomics is to understand the controlling mechanism of the expression of these genes and encode the knowledge into gene regulatory network (GRN). To achieve this, computational and statistical algorithms are especially needed. Inference of GRN is a very challenging task for computational biologists because the degree of freedom of the parameters is redundant. Various computational approaches have been proposed for modeling gene regulatory networks, such as Boolean network, differential equations and Bayesian network. There is no so called golden method which can generally give us the best performance for any data set. The research goal is to improve inference accuracy and reduce computational complexity. One of the problems in reconstructing GRN is how to deal with the high dimensionality and short time course gene expression data. In this work, some existing inference algorithms are compared and the limitations lie in that they either suffer from low inference accuracy or computational complexity. To overcome such difficulties, a new approach based on state space model and Expectation-Maximization (EM) algorithms is proposed to model the dynamic system of gene regulation and infer gene regulatory networks. In our model, GRN is represented by a state space model that incorporates noises and has the ability to capture more various biological aspects, such as hidden or missing variables. An EM algorithm is used to estimate the parameters based on the given state space functions and the gene interaction matrix is derived by decomposing the observation matrix using singular value decomposition, and then it is used to infer GRN. The new model is validated using synthetic data sets before applying it to real biological data sets. The results reveal that the developed model can infer the gene regulatory networks from large scale gene expression data and significantly reduce the computational time complexity without losing much inference accuracy compared to dynamic Bayesian network

    A combined sensitivity analysis and kriging surrogate modeling for early validation of health indicators

    Get PDF
    To increase the dependability of complex systems, one solution is to assess their state of health continuously through the monitoring of variables sensitive to potential degradation modes. When computed in an operating environment, these variables, known as health indicators, are subject to many uncertainties. Hence, the stochastic nature of health assessment combined with the lack of data in design stages makes it difficult to evaluate the efficiency of a health indicator before the system enters into service. This paper introduces a method for early validation of health indicators during the design stages of a system development process. This method uses physics-based modeling and uncertainties propagation to create simulated stochastic data. However, because of the large number of parameters defining the model and its computation duration, the necessary runtime for uncertainties propagation is prohibitive. Thus, kriging is used to obtain low computation time estimations of the model outputs. Moreover, sensitivity analysis techniques are performed upstream to determine the hierarchization of the model parameters and to reduce the dimension of the input space. The validation is based on three types of numerical key performance indicators corresponding to the detection, identification and prognostic processes. After having introduced and formalized the framework of uncertain systems modeling and the different performance metrics, the issues of sensitivity analysis and surrogate modeling are addressed. The method is subsequently applied to the validation of a set of health indicators for the monitoring of an aircraft engine's pumping unit

    Estimation and Detection of Multivariate Gene Regulatory Relationships

    Get PDF
    The Coefficient of Determination (CoD) plays an important role in Genomics problems, for instance, in the inference of gene regulatory networks from gene- expression data. However, the inference theory about CoD has not been investigated systematically. In this dissertation, we study the inference of discrete CoD from both frequentist and Bayesian perspectives, with its applications to system identification problems in Genomics. From a frequentist viewpoint, we provide a theoretical framework for CoD estimation by introducing nonparametric CoD estimators and parametric maximum-likelihood (ML) CoD estimators based on static and dynamical Boolean models. Inference algorithms are developed to discover gene regulatory relationships, and numerical examples are provided to validate preferable performance of the ML approach with access to sufficient prior knowledge. To make the applications of the CoD independent of user-selectable thresholds, we describe rigorous multiple testing procedures to investigate significant regulatory relation- ships among genes using the discrete CoD, and to discover canalyzing genes using the intrinsically multivariate prediction (IMP) criterion. We develop practical statistic tools that are open to the scientific community. On the other hand, we propose a Bayesian framework for the inference of the CoD across a parametrized family of joint distributions between target and predictors. Examples of applications of the Bayesian approach are provided against those of nonparametric and parametric approaches by using synthetic data. We have found that, with applications to system identification problems in Genomics, both parametric and Bayesian CoD estimation approaches outperform the nonparametric approaches. Hence, we conclude that parametric and Bayesian estimation approaches are preferred when we have partial knowledge about gene regulation. On the other hand, we have shown that the two proposed statistical testing frameworks can detect well-known gene regulation and canalyzing genes like p53 and DUSP1 from real data sets, respectively. This indicates that our methodology could serve as a promising tool for the detection of potential gene regulatory relationships and canalyzing genes. In one word, this dissertation is intended to serve as foundation for a detailed study of applications of CoD estimation in Genomics and related fields

    Some Formal Solutions in Side-channel Cryptanalysis - An Introduction

    Get PDF
    We propose to revisit Side-channel Cryptanalysis from the point of view, for instance, of C. E. Shannon: The calculation of a posteriori probabilities is the generalized problem of cryptanalysis. So, our goal will be to provide analytic formulae for the marginal posterior probability mass functions for the targets of those attacks. Since we are concerned with the probabilities of single and perfectly determined cases, we need above all to place ourselves in a probabilistic system enjoying an epistemic “interpretation”. We select Probability as Logic, the most suitable system for our purpose. With this powerful and flexible system at hand, we first solve two independent problems for known, non-chosen messages: the determination of side-channel leakage times (generalized for high-order attacks) and the determination of the target, given those leakage times. The first problem belongs to Hypotheses Testing Theory and admits a formal solution in terms of Bayes Factors in the parametric framework. The calculation of those factors requires marginalizing over all possible values of the target, so that this new procedure has no equivalent in frequentist Statistics and we indicate how it could be proved to outperform previous procedures more and more, as the target space size increases. We present preliminary experimental results and give some clues on how to extend this solution to the nonparametric framework. The second problem is a classical Parameter Estimation problem with many hyperparameters. It also admits a unique maximum a posteriori solution under 0-1 loss function within Decision Theory. When it is not possible to solve both problems independently, we must solve them simultaneously in order to get general solutions for Side-channel Cryptanalysis on symmetric block ciphers, at least. Taking benefit of the duality between Hypotheses Testing and Parameter Estimation in our system of inference, we transform the determination of the generalized leakage times into a parameter estimation problem, in order to fall back into a global parameter estimation problem. Generally speaking, it appears that (marginal) side-channel parametric leakage models are in fact averages between attack and “non-attack” models and, more generally between many conditional models, so that likelihoods can not be frequency sampling distributions. Then, we give the marginal posterior probability mass function for the targets of the most general known-messages attacks: “correlation” attacks, template attacks, high-order attacks, multi-decision functions attacks, multi-attack models attacks and multi-“non-attack” models attacks. Essentially, it remains to explain how to assign joint prior and discrete direct probability distributions by logical inspection, to extent this approach to the nonparametric framework and other cryptographic primitives, to deal with analytic, symbolic, numerical and computational implementation issues and especially to derive formal adaptive chosen-messages attacks

    A Time Series Analysis Method Using Hidden Variables for Gene Network Reconstruction

    Get PDF
    The DNA microarray technology can be applied to obtain time series data which contains thousands of genes and tens of time points. When confront the great amount of data points a fast and effective method must be constructed to extract useful information. The assumption that the interactions between genes are static in the time series data is made. After made the assumption how to reconstruct those interactions becomes a difficulty problem. Since the underlying interactions between genes are complicated, which involve transcription, translation and protein-protein interaction, to construct a model from physicochemistry is almost impossible/effortless. The popular methods constructed from statistical or mathematical principles are discussed. Basically says, those methods are trying to minimize (maximize) some criteria to obtain values of parameters in those models. In this thesis we mainly focus on linear equation models and how to construct the gene network from those models. One difficulty for reconstructed models is large amount of genes and small amount of time points. For the purpose of decreasing the number of parameters in linear equation model, some new linear equation models with hidden variables are introduced. Those models can effectively decrease the number of parameters and increase the inference accuracy. In comparison, the famous Boolean Network and Probability Boolean Network are introduced and used to run the simulation

    Predicting birth-rates through German micro-census data: a comparison of probit and Boolean regression

    Get PDF
    This paper investigates the complex interrelationships of qualitative socio-economic variables in the context of Boolean Regression. The data forming the basis for this investigation are from the German Micro-census waves of 1996 2002 and comprise about 400 000 observations. Boolean Regression is used to predict how birth events depend on the socio-economic characteristics of women and their male partners. Boolean Regression is compared to Probit. The data set is split into two halves in order to determine which method yields more accurate predictions. It turns out that Probit is superior, if a given socio-economic type is substantiated by less than about 30 observations, whereas Boolean Regression is superior to Probit, if a given socio-economic type is verified by more than about 30 observations. Therefore a "hybrid" estimation method, combining Probit and Boolean Regression, is proposed and used in the remainder of the paper. Different methods of interpreting the results of the estimations are introduced, relying mainly on simulation techniques. With respect to the reasons for the prevailing low German fertility rates, it is evident that these could be decisively higher if people had higher incomes and earned more with relative ease. From a methodological perspective, the paper demonstrates that Scientific Use Files of socio-economic data comprising hundred thousands or even millions of observations, and which have been made available recently, are the natural field of application for Boolean Regression. Possible consequences for future social and economic research are discussed. --
    corecore