125 research outputs found

    Application of Logic Synthesis Toward the Inference and Control of Gene Regulatory Networks

    Get PDF
    In the quest to understand cell behavior and cure genetic diseases such as cancer, the fundamental approach being taken is undergoing a gradual change. It is becoming more acceptable to view these diseases as an engineering problem, and systems engineering approaches are being deployed to tackle genetic diseases. In this light, we believe that logic synthesis techniques can play a very important role. Several techniques from the field of logic synthesis can be adapted to assist in the arguably huge effort of modeling cell behavior, inferring biological networks, and controlling genetic diseases. Genes interact with other genes in a Gene Regulatory Network (GRN) and can be modeled as a Boolean Network (BN) or equivalently as a Finite State Machine (FSM). As the expression of genes deter- mine cell behavior, important problems include (i) inferring the GRN from observed gene expression data from biological measurements, and (ii) using the inferred GRN to explain how genetic diseases occur and determine the ”best” therapy towards treatment of disease. We report results on the application of logic synthesis techniques that we have developed to address both these problems. In the first technique, we present Boolean Satisfiability (SAT) based approaches to infer the predictor (logical support) of each gene that regulates melanoma, using gene expression data from patients who are suffering from the disease. From the output of such a tool, biologists can construct targeted experiments to understand the logic functions that regulate a particular target gene. Our second technique builds upon the first, in which we use a logic synthesis technique; implemented using SAT, to determine gene regulating functions for predictors and gene expression data. This technique determines a BN (or family of BNs) to describe the GRN and is validated on a synthetic network and the p53 network. The first two techniques assume binary valued gene expression data. In the third technique, we utilize continuous (analog) expression data, and present an algorithm to infer and rank predictors using modified Zhegalkin polynomials. We demonstrate our method to rank predictors for genes in the mutated mammalian and melanoma networks. The final technique assumes that the GRN is known, and uses weighted partial Max-SAT (WPMS) towards cancer therapy. In this technique, the GRN is assumed to be known. Cancer is modeled using a stuck-at fault model, and ATPG techniques are used to characterize genes leading to cancer and select drugs to treat cancer. To steer the GRN state towards a desirable healthy state, the optimal selection of drugs is formulated using WPMS. Our techniques can be used to find a set of drugs with the least side-effects, and is demonstrated in the context of growth factor pathways for colon cancer

    Using data-driven rules to predict mortality in severe community acquired pneumonia

    Get PDF
    Prediction of patient-centered outcomes in hospitals is useful for performance benchmarking, resource allocation, and guidance regarding active treatment and withdrawal of care. Yet, their use by clinicians is limited by the complexity of available tools and amount of data required. We propose to use Disjunctive Normal Forms as a novel approach to predict hospital and 90-day mortality from instance-based patient data, comprising demographic, genetic, and physiologic information in a large cohort of patients admitted with severe community acquired pneumonia. We develop two algorithms to efficiently learn Disjunctive Normal Forms, which yield easy-to-interpret rules that explicitly map data to the outcome of interest. Disjunctive Normal Forms achieve higher prediction performance quality compared to a set of state-of-the-art machine learning models, and unveils insights unavailable with standard methods. Disjunctive Normal Forms constitute an intuitive set of prediction rules that could be easily implemented to predict outcomes and guide criteria-based clinical decision making and clinical trial execution, and thus of greater practical usefulness than currently available prediction tools. The Java implementation of the tool JavaDNF will be publicly available. © 2014 Wu et al

    Genetic Algorithms and the Satisfiability of Large-Scale Boolean Expressions.

    Get PDF
    The two new genetic methods overpopulation and bitwise expected value are introduced. In overpopulation a temporary population of size Mn (M 3˘e\u3e 1) is created using genetic operators and the n children with the highest estimated fitness values are selected as the next generation. The rest are discarded. Bitwise expected value (bev) is the fitness estimation function used. Overpopulation and bitwise expected value are applied to the NP-complete problem 3SAT (a special form of Satisfiability in which the boolean expression consists of the conjunction of an arbitrary number of clauses where each clause consists of the disjunction of 3 boolean variables) with excellent empirical results when compared to the performance of the standard genetic algorithm. Overpopulation increases the cost of producing each generation due to the overhead required to maintain the larger temporary population but results in many fewer generations to solution. Using bitwise expected value as a fitness estimator causes the algorithm to take slightly more generations to solution but is much faster to calculate than the fitness function, leading to a decrease in wall-clock time to solution. Theoretical justification for the success of overpopulation is seen as a result of the generalization of the schema growth equation. Bitwise expected value is viewed as an analogy to the Building Block Hypothesis. Empirical evidence of high correlation between bev and the fitness function is presented. We also introduce the target problem concept, in which a difficult problem is transformed into a well-known problem for which a good genetic method of solution is known. As an example of the target problem concept a transformation from the Traveling Salesman Problem to Satisfiability is demonstrated. Overpopulation and bitwise expected value are applied to the resulting boolean expression, with good results. An interesting convergence property is observed

    Analytical, Theoretical and Empirical Advances in Genome-Scale Algorithmics

    Get PDF
    Ever-increasing amounts of complex biological data continue to come on line daily. Examples include proteomic, transcriptomic, genomic and metabolomic data generated by a plethora of high-throughput methods. Accordingly, fast and effective data processing techniques are more and more in demand. This issue is addressed in this dissertation through an investigation of various algorithmic alternatives and enhancements to routine and traditional procedures in common use. In the analysis of gene co-expression data, for example, differential measures of entropy and variation are studied as augmentations to mere differential expression. These novel metrics are shown to help elucidate disease-related genes in wide assortments of case/control data. In a more theoretical spirit, limits on the worst-case behavior of density based clustering methods are studied. It is proved, for instance, that the well-known paraclique algorithm, under proper tuning, can be guaranteed never to produce subgraphs with density less than 2/3. Transformational approaches to efficient algorithm design are also considered. Classic graph search problems are mapped to and from well-studied versions of satisfiability and integer linear programming. In so doing, regions of the input space are classified for which such transforms are effective alternatives to direct graph optimizations. In all these efforts, practical implementations are emphasized in order to advance the boundary of effective computation

    Network-based modelling for omics data

    Get PDF

    Empirical Hardness of Finding Optimal Bayesian Network Structures: Algorithm Selection and Runtime Prediction

    Get PDF
    Various algorithms have been proposed for finding a Bayesian network structure that is guaranteed to maximize a given scoring function. Implementations of state-of-the-art algorithms, solvers, for this Bayesian network structure learning problem rely on adaptive search strategies, such as branch-and-bound and integer linear programming techniques. Thus, the time requirements of the solvers are not well characterized by simple functions of the instance size. Furthermore, no single solver dominates the others in speed. Given a problem instance, it is thus a priori unclear which solver will perform best and how fast it will solve the instance. We show that for a given solver the hardness of a problem instance can be efficiently predicted based on a collection of non-trivial features which go beyond the basic parameters of instance size. Specifically, we train and test statistical models on empirical data, based on the largest evaluation of state-of-the-art exact solvers to date. We demonstrate that we can predict the runtimes to a reasonable degree of accuracy. These predictions enable effective selection of solvers that perform well in terms of runtimes on a particular instance. Thus, this work contributes a highly efficient portfolio solver that makes use of several individual solvers.Peer reviewe
    corecore