433 research outputs found

    Design of Experiments for Model Discrimination Hybridising Analytical and Data-Driven Approaches

    Get PDF
    Healthcare companies must submit pharmaceutical drugs or medical devices to regulatory bodies before marketing new technology. Regulatory bodies frequently require transparent and interpretable computational modelling to justify a new healthcare technology, but researchers may have several competing models for a biological system and too little data to discriminate between the models. In design of experiments for model discrimination, the goal is to design maximally informative physical experiments in order to discriminate between rival predictive models. Prior work has focused either on analytical approaches, which cannot manage all functions, or on datadriven approaches, which may have computational difficulties or lack interpretable marginal predictive distributions. We develop a methodology introducing Gaussian process surrogates in lieu of the original mechanistic models. We thereby extend existing design and model discrimination methods developed for analytical models to cases of non-analytical models in a computationally efficient manner

    GPdoemd: a python package for design of experiments for model discrimination

    Get PDF
    GPdoemd is an open-source python package for design of experiments for model discrimination that uses Gaussian process surrogate models to approximate and maximise the divergence between marginal predictive distributions of rival mechanistic models. GPdoemd uses the divergence prediction to suggest a maximally informative next experiment

    Design of dynamic experiments for black-box model discrimination

    Get PDF
    Diverse domains of science and engineering require and use mechanistic mathematical models, e.g. systems of differential algebraic equations. Such models often contain uncertain parameters to be estimated from data. Consider a dynamic model discrimination setting where we wish to chose: (i) what is the best mechanistic, time-varying model and (ii) what are the best model parameter estimates. These tasks are often termed model discrimination/selection/validation/verification. Typically, several rival mechanistic models can explain data, so we incorporate available data and also run new experiments to gather more data. Design of dynamic experiments for model discrimination helps optimally collect data. For rival mechanistic models where we have access to gradient information, we extend existing methods to incorporate a wider range of problem uncertainty and show that our proposed approach is equivalent to historical approaches when limiting the types of considered uncertainty. We also consider rival mechanistic models as dynamic black boxes that we can evaluate, e.g. by running legacy code, but where gradient or other advanced information is unavailable. We replace these black-box models with Gaussian process surrogate models and thereby extend the model discrimination setting to additionally incorporate rival black-box model. We also explore the consequences of using Gaussian process surrogates to approximate gradient-based methods

    Global expression mapping of mammalian genomes

    Get PDF
    he aim of genome projects is to decipher all the information contained within the DNA of an organism and to study the way this information is processed in physiological processes. It is believed that more than 95% of the information content of the mammalian genome is represented in the protein coding sequences that make up only approximately 2% of the DNA sequence. Consequently much effort is being invested in the study of coding sequences in the form of cDNA analysis. This thesis is concerned with the development of a new strategy for a highly parallel approach to analyse entire cDNA libraries. The strategy is based upon generating sufficient sequence information to identify uniquely more than 100,000 cDNA clones by hybridisation with short oligonucleotides, typically 7 - 10 mers. Each oligonucleotide is hybridised to all cDNA clones in parallel and under stringent conditions positively identifies a subset (3 - 10%) of clones. Oligonucleotides are designed in such a way that each will positively identify a different subset of clones and statistical simulations estimate that approximately 200 such hybridisation events are required to identify uniquely upto 100,000 cDNA sequences. Such a fingerprint can be generated from many cDNA libraries constructed from different tissue mRNAs and will not only lead to the identification of most sequecnes expressed from the genome but also indicate the level of expression by determining the number of times any given sequence is represented across different cDNA libraries. A human foetal brain cDNA library has been constructed and 100,000 clones arrayed into microtitre plates and on nylon membranes. All the required technological developments have been carried out successfully and are presented. In excess of 200 oligonucleotide hybridisations have been performed on a subset of 32,000 cDNA clones and 1,000 sequenced control clones. A detailed analysis of the data on the control clones is presented and the implications for cDNA fingerprinting discussed

    On the Multivariate Analysis of Animal Networks

    Get PDF

    Bayesian Test Design for Fault Detection and Isolation in Systems with Uncertainty

    Get PDF
    Methods for Fault Detection and Isolation (FDI) in systems with uncertainty have been studied extensively due to the increasing value and complexity of the maintenance and operation of modern Cyber-Physical Systems (CPS). CPS are characterized by nonlinearity, environmental and system uncertainty, fault complexity and highly non-linear fault propagation, which require advanced fault detection and isolation algorithms. Therefore, modern efforts develop active FDI (methods that require system reconfiguration) based on information theory to design tests rich in information for fault assessment. Information-based criteria for test design are often deployed as a Frequentist Optimal Experimental Design (FOED) problem, which utilizes the information matrix of the system. D- and Ds-optimality criteria for the information matrix have been used extensively in the literature since they usually calculate more robust test designs, which are less likely to be susceptible to uncertainty. However, FOED methods provide only locally informative tests, as they find optimal solutions around a neighborhood of an anticipated set of values for system uncertainty and fault severity. On the other hand, Bayesian Optimal Experimental Design (BOED) overcomes the issue of local optimality by exploring the entire parameter space of a system. BOED can, thus, provide robust test designs for active FDI. The literature on BOED for FDI is limited and mostly examines the case of normally distributed parameter priors. In some cases, such as in newly installed systems, a more generalized inference can be derived by using uniform distributions as parameter priors, when existing knowledge about the parameters is limited. In BOED, an optimal design can be found by maximizing an expected utility based on observed data. There is a plethora of utility functions, but the choice of utility function impacts the robustness of the solution and the computational cost of BOED. For instance, BOED that is based on the Fisher Information matrix can lead to an alphabetical criterion such as D- and Ds-optimality for the objective function of the BOED, but this also increases the computational cost for optimization since these criteria involve sensitivity analysis with the system model. On the other hand, when an observation-based method such as the Kullback-Leibler divergence from posterior to prior is used to make an inference on parameters, the expected utility calculations involve nested Monte Carlo calculations which, in turn, affect computation time. The challenge in these approaches is to find an adequate but relatively low Monte Carlo sampling rate, without introducing a significant bias on the result. Theory shows that for normally distributed parameter priors, the Kullback-Leibler divergence expected utility reduces to a Bayesian D-optimality. Similarly, Bayesian Ds-optimality can be used when the parameter priors are normally distributed. In this thesis, we prove the validity of the theory on a three-tank system using normally and uniformly distributed parameter priors to compare the Bayesian D-optimal design criterion and the Kullback-Leibler divergence expected utility. Nevertheless, there is no observation-based metric similar to Bayesian Ds-optimality when the parameter priors are not normally distributed. The main objective of this thesis is to derive an observation-based utility function similar to the Ds-optimality that can be used even when the requirement for normally distributed priors is not met. We begin our presentation with a formalistic comparison of FOED and BOED for different objective metrics. We focus on the impact different utility functions have on the optimal design and their computation time. The value of BOED is illustrated using a variation of the benchmark three-tank system as a case study. At the same time, we present the deterministic variance of the optimal design for different utility functions for this case study. The performance of the various utility functions of BOED and the corresponding FOED optimal designs are compared in terms of Hellinger distance. Hellinger distance is a bounded distribution metric between 0 and 1, where 0 indicates a complete overlap of the distributions and 1 indicates the absence of common points between the distributions. Analysis of the Hellinger distances calculated for the benchmark system shows that BOED designs can better separate the distributions of system measurements and, consequently, can classify the fault scenarios and the no-fault case with less uncertainty. When a uniform distribution is used as a parameter prior, the observation-based utility functions give better designs than FOED and Bayesian D-optimality, which use the Fisher information matrix. The observation-based method, similar to Ds-optimality, finds a better design than the observation-based method similar to D-optimality, but it is computationally more expensive. The computational cost can be lowered by reducing the Monte Carlo sampling, but, if the sampling rate is reduced significantly, an uneven solution plane is created affecting the FDI test design and assessment. Based on the results of this analysis, future research should focus on decreasing the computational cost without affecting the test design robustness

    Selective single molecule sensing in nanopore

    Get PDF
    Nanopores have emerged as one of the powerful tool for single molecule detection as it offers advantages such as label free and minimal sample preparation; this stimulates many exciting applications in biophysics and molecular sensing. The sensing principle can be achieved by electrophoretically driving biomolecules in solution through a nanometer pore. The ability to detect and measure abundance of many proteins precisely and simultaneously is an important and predictive biological research. To date, quantification of multiple proteins in buffer or purified samples have been successfully demonstrated in ELISA or tagged with fluorescence probes to detect specific molecules however, they rely on sophisticated and expensive diagnostics platform, as well as long assay time which hinder the process of simple and parallel screening multiple proteins at the single molecule level. Perhaps the biggest challenge for nanopore protein detection is the lack of selectivity, making it challenging to differentiate between multiple analytes; let alone trying to obtain meaningful data from complex biological samples such as human serum. Therefore, there is a need to develop strategies whereby such detection modalities can be used with unprocessed biological samples where thousands of different background proteins exists, often at much higher concentrations than the target analyte making detection exceptionally challenging. We present in this thesis some novel strategies that can improve the selectivity of the nanopore, allowing efficient detection and analysis of biomolecules in solutions accurately. Specifically, detection of multiple proteins via the use of aptamers attached onto a DNA carrier and in particular utility in detecting in biological sample in a low costs, scalable and sensitive manner. We were able to detect multiplex proteins, differentiate different protein size as well as accurately locate the proteins bound to the carrier without the need for extensive sample preparation and amplification, allowing direction sensing of proteins in unmodified samples. This thesis also introduces a process where having a control over the pore dimensions to ensure the dimensions or probe molecules match the pore size allowing good signal to noise ratio and enhancement in sensing ability. We show that Al2O3 atomic layer deposition (ALD) modified nanopipettes is capable to reduce the pore diameter down to 7.5 nm while allowing batch production of reproducible pipettes. Importantly, the sensing abilities were not affected by the ALD deposition. The other strategies demonstrate precise opening of the nanopore by electroetching the graphene nanoflakes (GNFs) that coated the nanopipette. The pore opening process enable in situ nanopore size control while performing DNA translocation, broadly functioning the nanopore devices. Overall the combined findings from the above strategies provided an incredible insight on the sensing and molecular biophysics of proteins and DNA at the single molecule level. The use of aptamer modified carrier increases the sensitivity and selectivity of the nanopore platform and enable potential applications for sensing of multiple proteins biomarker with single molecule sensitivity.Open Acces

    Genetic algorithm-neural network: feature extraction for bioinformatics data.

    Get PDF
    With the advance of gene expression data in the bioinformatics field, the questions which frequently arise, for both computer and medical scientists, are which genes are significantly involved in discriminating cancer classes and which genes are significant with respect to a specific cancer pathology. Numerous computational analysis models have been developed to identify informative genes from the microarray data, however, the integrity of the reported genes is still uncertain. This is mainly due to the misconception of the objectives of microarray study. Furthermore, the application of various preprocessing techniques in the microarray data has jeopardised the quality of the microarray data. As a result, the integrity of the findings has been compromised by the improper use of techniques and the ill-conceived objectives of the study. This research proposes an innovative hybridised model based on genetic algorithms (GAs) and artificial neural networks (ANNs), to extract the highly differentially expressed genes for a specific cancer pathology. The proposed method can efficiently extract the informative genes from the original data set and this has reduced the gene variability errors incurred by the preprocessing techniques. The novelty of the research comes from two perspectives. Firstly, the research emphasises on extracting informative features from a high dimensional and highly complex data set, rather than to improve classification results. Secondly, the use of ANN to compute the fitness function of GA which is rare in the context of feature extraction. Two benchmark microarray data have been taken to research the prominent genes expressed in the tumour development and the results show that the genes respond to different stages of tumourigenesis (i.e. different fitness precision levels) which may be useful for early malignancy detection. The extraction ability of the proposed model is validated based on the expected results in the synthetic data sets. In addition, two bioassay data have been used to examine the efficiency of the proposed model to extract significant features from the large, imbalanced and multiple data representation bioassay data
    corecore