1,731 research outputs found

    Fault Detection and Diagnosis in Gene Regulatory Networks and Optimal Bayesian Classification of Metagenomic Data

    Get PDF
    It is well known that the molecular basis of many diseases, particularly cancer, resides in the loss of regulatory power in critical genomic pathways due to DNA mutations. We propose a methodology for model-based fault detection and diagnosis for stochastic Boolean dynamical systems indirectly observed through a single time series of transcriptomic measurements using Next Generation Sequencing (NGS) data. The fault detection consists of an innovations filter followed by a fault certification step, and requires no knowledge about the system faults. The innovations filter uses the optimal Boolean state estimator, called the Boolean Kalman Filter (BKF). We propose an additional step of fault diagnosis based on a multiple model adaptive estimation (MMAE) method consisting of a bank of BKFs running in parallel. The efficacy of the proposed methodology is demonstrated via numerical experiments using a p53-MDM2 negative feedback loop Boolean network. The results indicate the proposed method is promising in monitoring biological changes at the transcriptomic level. Genomic applications in the life sciences experimented an explosive growth with the advent of high-throughput measurement technologies, which are capable of delivering fast and relatively inexpensive profiles of gene and protein activity on a genome-wide or proteome-wide scale. For the study of microbial classification, we propose a Bayesian method for the classification of r16S sequencing pro- files of bacterial abundancies, by using a Dirichlet-Multinomial-Poisson model for microbial community samples. The proposed approach is compared to the kernel SVM, Random Forest and MetaPhyl classification rules as a function of varying sample size, classification difficulty, using synthetic data and real data sets. The proposed Bayesian classifier clearly displays the best performance over different values of between and within class variances that defines the difficulty of the classification

    Bayesian Optimization in Multi-Information Source and Large-Scale Systems

    Get PDF
    The advancements in science and technology in recent years have extended the scale of engineering problems. Discovery of new materials with desirable properties, drug discovery for treat-ment of disease, design of complex aerospace systems containing interactive subsystems, conducting experimental design of complex manufacturing processes, designing complex transportation systems all are examples of complex systems. The significant uncertainty and lack of knowledge about the underlying model due to the complexity necessitate the use of data for analyzing these systems. However, a huge time/economical expense involved in data gathering process avoids ac-quiring large amount of data for analyzing these systems. This dissertation is mainly focused on enabling design and decision making in complex uncertain systems. Design problems are pervasive in scientific and industrial endeavors: scientists design experiments to gain insights into physical and social phenomena, engineers design machines to execute tasks more efficiently, pharmaceutical researchers design new drugs to fight disease, and environ-mentalists design sensor networks to monitor ecological systems. All these design problems are fraught with choices, choices that are often complex and high-dimensional, with interactions that make them difficult for individuals to reason about. Bayesian optimization techniques have been successfully employed for experimental design of these complex systems. In many applications across computational science and engineering, engineers, scientists and decision-makers might have access to a system of interest through several models. These models, often referred to as “information sources", may encompass different resolutions, physics, and modeling assumptions, resulting in different “fidelity" or “skill" with respect to the quantities of interest. Examples of that include different finite-element models in design of complex mechanical structures, and various tools for analyzing DNA and protein sequence data in bioinformatics. Huge computation of the expensive models avoids excessive evaluations across design space. On the other hand, less expensive models fail to represent the objective function accurately. Thus, it is highly desirable to determine which experiment from which model should be conducted at each time point. We have developed a multi-information source Bayesian optimization framework capable of simultaneous selection of design input and information source, handling constraints, and making the balance between information gain and computational cost. The application of the proposed framework has been demonstrated on two different critical problems in engineering: 1) optimization of dual-phase steel to maximize its strength-normalized strain hardening rate in materials science; 2) optimization of NACA 0012 airfoil in aerospace. The design problems are often defined over a large input space, demanding large number of experiments for yielding a proper performance. This is not practical in many real-world problems, due to the budget limitation and data expenses. However, the objective function (i.e., experiment’s outcome) in many cases might not change with the same rate in various directions. We have introduced an adaptive dimensionality reduction Bayesian optimization framework that exponentially reduces the exploration region of the existing techniques. The proposed framework is capable of identifying a small subset of linear combinations of the design inputs that matter the most relative to the objective function and taking advantage of the objective function representation in this lower dimension, but with richer information. A significant increase in the rate of optimization process has been demonstrated on an important problem in aerospace regarding aerostructural design of an aircraft wing modeled based on the NASA Common Research Model (CRM)

    Inferring Gene Regulatory Networks from Time Series Microarray Data

    Get PDF
    The innovations and improvements in high-throughput genomic technologies, such as DNA microarray, make it possible for biologists to simultaneously measure dependencies and regulations among genes on a genome-wide scale and provide us genetic information. An important objective of the functional genomics is to understand the controlling mechanism of the expression of these genes and encode the knowledge into gene regulatory network (GRN). To achieve this, computational and statistical algorithms are especially needed. Inference of GRN is a very challenging task for computational biologists because the degree of freedom of the parameters is redundant. Various computational approaches have been proposed for modeling gene regulatory networks, such as Boolean network, differential equations and Bayesian network. There is no so called golden method which can generally give us the best performance for any data set. The research goal is to improve inference accuracy and reduce computational complexity. One of the problems in reconstructing GRN is how to deal with the high dimensionality and short time course gene expression data. In this work, some existing inference algorithms are compared and the limitations lie in that they either suffer from low inference accuracy or computational complexity. To overcome such difficulties, a new approach based on state space model and Expectation-Maximization (EM) algorithms is proposed to model the dynamic system of gene regulation and infer gene regulatory networks. In our model, GRN is represented by a state space model that incorporates noises and has the ability to capture more various biological aspects, such as hidden or missing variables. An EM algorithm is used to estimate the parameters based on the given state space functions and the gene interaction matrix is derived by decomposing the observation matrix using singular value decomposition, and then it is used to infer GRN. The new model is validated using synthetic data sets before applying it to real biological data sets. The results reveal that the developed model can infer the gene regulatory networks from large scale gene expression data and significantly reduce the computational time complexity without losing much inference accuracy compared to dynamic Bayesian network
    • …
    corecore