17 research outputs found

    Using Prior Knowledge in the Design of Classifiers

    Get PDF
    Small samples are commonplace in genomic/proteomic classification, the result being inadequate classifier design and poor error estimation. A promising approach to alleviate the problem is the use of prior knowledge. On the other hand, it is known that a huge amount of information is encoded and represented by biological signaling pathways. This dissertation is concerned with the problem of classifier design by utilizing both the available prior knowledge and training data. Specifically, this dissertation utilizes the concrete notion of regularization in signal processing and statistics to combine prior knowledge with different data-based or data-ignorant criteria. In the first part, we address optimal discrete classification where prior knowledge is restricted to an uncertainty class of feature distributions absent a prior distribution on the uncertainty class, a problem that arises directly for biological classification using pathway information: labeling future observations obtained in the steady state by utilizing both the available prior knowledge and the training data. An optimization-based paradigm for utilizing prior knowledge is proposed to design better performing classifiers when sample sizes are limited. We derive approximate expressions for the first and second moments of the true error rate of the proposed classifier under the assumption of two widely used models for the uncertainty classes: E-contamination and p-point classes. We examine the proposed paradigm on networks containing NF-k B pathways, where it shows significant improvement compared to data-driven methods. In the second part of this dissertation, we focus on Bayesian classification. Although the problem of designing the optimal Bayesian classifier , assuming some known prior distributions, has been fully addressed, a critical issue still remains: how to incorporate biological knowledge into the prior distribution. For genomic/proteomic, the most common kind of knowledge is in the form of signaling pathways. Thus, it behooves us to nd methods of transforming pathway knowledge into knowledge of the feature-label distribution governing the classi cation problem. In order to incorporate the available prior knowledge, the interactions in the pathways are first quantifi ed from a Bayesian perspective. Then, we address the problem of prior probability construction by proposing a series of optimization paradigms that utilize the incomplete prior information contained in pathways (both topological and regulatory). The optimization paradigms are derived for both Gaussian case with Normal-inverse-Wishart prior and discrete classi cation with Dirichlet prior. Simulation results, using both synthetic and real pathways, show that the proposed paradigms yield improved classi ers that outperform traditional classi ers which use only training data

    Probabilistic reconstruction of the tumor progression process in gene regulatory networks in the presence of uncertainty

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Accumulation of gene mutations in cells is known to be responsible for tumor progression, driving it from benign states to malignant states. However, previous studies have shown that the detailed sequence of gene mutations, or the steps in tumor progression, may vary from tumor to tumor, making it difficult to infer the exact path that a given type of tumor may have taken.</p> <p>Results</p> <p>In this paper, we propose an effective probabilistic algorithm for reconstructing the tumor progression process based on partial knowledge of the underlying gene regulatory network and the steady state distribution of the gene expression values in a given tumor. We take the BNp (Boolean networks with pertubation) framework to model the gene regulatory networks. We assume that the true network is not exactly known but we are given an uncertainty class of networks that contains the true network. This network uncertainty class arises from our partial knowledge of the true network, typically represented as a set of local pathways that are embedded in the global network. Given the SSD of the cancerous network, we aim to simultaneously identify the true normal (healthy) network and the set of gene mutations that drove the network into the cancerous state. This is achieved by analyzing the effect of gene mutation on the SSD of a gene regulatory network. At each step, the proposed algorithm reduces the uncertainty class by keeping only those networks whose SSDs get close enough to the cancerous SSD as a result of additional gene mutation. These steps are repeated until we can find the best candidate for the true network and the most probable path of tumor progression.</p> <p>Conclusions</p> <p>Simulation results based on both synthetic networks and networks constructed from actual pathway knowledge show that the proposed algorithm can identify the normal network and the actual path of tumor progression with high probability. The algorithm is also robust to model mismatch and allows us to control the trade-off between efficiency and accuracy.</p

    Optimal Bayesian Kalman Filtering With Prior Update

    No full text
    corecore