3,864 research outputs found

    Bridging the ensemble Kalman and particle filter

    Full text link
    In many applications of Monte Carlo nonlinear filtering, the propagation step is computationally expensive, and hence, the sample size is limited. With small sample sizes, the update step becomes crucial. Particle filtering suffers from the well-known problem of sample degeneracy. Ensemble Kalman filtering avoids this, at the expense of treating non-Gaussian features of the forecast distribution incorrectly. Here we introduce a procedure which makes a continuous transition indexed by gamma in [0,1] between the ensemble and the particle filter update. We propose automatic choices of the parameter gamma such that the update stays as close as possible to the particle filter update subject to avoiding degeneracy. In various examples, we show that this procedure leads to updates which are able to handle non-Gaussian features of the prediction sample even in high-dimensional situations

    Performance improvement via bagging in probabilistic prediction of chaotic time series using similarity of attractors and LOOCV predictable horizon

    Get PDF
    Recently, we have presented a method of probabilistic prediction of chaotic time series. The method employs learning machines involving strong learners capable of making predictions with desirably long predictable horizons, where, however, usual ensemble mean for making representative prediction is not effective when there are predictions with shorter predictable horizons. Thus, the method selects a representative prediction from the predictions generated by a number of learning machines involving strong learners as follows: first, it obtains plausible predictions holding large similarity of attractors with the training time series and then selects the representative prediction with the largest predictable horizon estimated via LOOCV (leave-one-out cross-validation). The method is also capable of providing average and/or safe estimation of predictable horizon of the representative prediction. We have used CAN2s (competitive associative nets) for learning piecewise linear approximation of nonlinear function as strong learners in our previous study, and this paper employs bagging (bootstrap aggregating) to improve the performance, which enables us to analyze the validity and the effectiveness of the method

    Hierarchical Clustering of Ensemble Prediction Using LOOCV Predictable Horizon for Chaotic Time Series

    Get PDF
    Recently, we have presented a method of ensemble prediction of chaotic time series. The method employs strong learners capable of making predictions with small error, where usual ensemble mean does not work well owing to the long term unpredictability of chaotic time series. Thus, we have developed a method to select a representative prediction from a set of plausible predictions by means of using LOOCV (leave-one-out cross-validation) measure to estimate predictable horizon. Although we have shown the effectiveness of the method, it sometimes fails to select the representative prediction with long predictable horizon. In order to cope with this problem, this paper presents a method to select multiple candidates of representative prediction by means of employing hierarchical K-means clustering with K = 2. From numerical experiments, we show the effectiveness of the method and an analysis of the property of LOOCV predictable horizon.The 2017 IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2017), November 27 to December 1, 2017, Honolulu, Hawaii, US

    Probabilistic Prediction of Chaotic Time Series Using Similarity of Attractors and LOOCV Predictable Horizons for Obtaining Plausible Predictions

    Get PDF
    This paper presents a method for probabilistic prediction of chaotic time series. So far, we have developed several model selection methods for chaotic time series prediction, but the methods cannot estimate the predictable horizon of predicted time series. Instead of using model selection methods employing the estimation of mean square prediction error (MSE), we present a method to obtain a probabilistic prediction which provides a prediction of time series and the estimation of predictable horizon. The method obtains a set of plausible predictions by means of using the similarity of attractors of training time series and the time series predicted by a number of learning machines with different parameter values, and then obtains a smaller set of more plausible predictions with longer predictable horizons estimated by LOOCV (leave-one-out cross-validation) method. The effectiveness and the properties of the present method are shown by means of analyzing the result of numerical experiments.22nd International Conference, ICONIP 2015, November 9-12, 2015, Istanbul, Turke

    NOVEL ALGORITHMS AND TOOLS FOR LIGAND-BASED DRUG DESIGN

    Get PDF
    Computer-aided drug design (CADD) has become an indispensible component in modern drug discovery projects. The prediction of physicochemical properties and pharmacological properties of candidate compounds effectively increases the probability for drug candidates to pass latter phases of clinic trials. Ligand-based virtual screening exhibits advantages over structure-based drug design, in terms of its wide applicability and high computational efficiency. The established chemical repositories and reported bioassays form a gigantic knowledgebase to derive quantitative structure-activity relationship (QSAR) and structure-property relationship (QSPR). In addition, the rapid advance of machine learning techniques suggests new solutions for data-mining huge compound databases. In this thesis, a novel ligand classification algorithm, Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), was reported for the prediction of diverse categorical pharmacological properties. LiCABEDS was successfully applied to model 5-HT1A ligand functionality, ligand selectivity of cannabinoid receptor subtypes, and blood-brain-barrier (BBB) passage. LiCABEDS was implemented and integrated with graphical user interface, data import/export, automated model training/ prediction, and project management. Besides, a non-linear ligand classifier was proposed, using a novel Topomer kernel function in support vector machine. With the emphasis on green high-performance computing, graphics processing units are alternative platforms for computationally expensive tasks. A novel GPU algorithm was designed and implemented in order to accelerate the calculation of chemical similarities with dense-format molecular fingerprints. Finally, a compound acquisition algorithm was reported to construct structurally diverse screening library in order to enhance hit rates in high-throughput screening

    Discovery of Novel Glycogen Synthase Kinase-3beta Inhibitors: Molecular Modeling, Virtual Screening, and Biological Evaluation

    Get PDF
    Glycogen synthase kinase-3 (GSK-3) is a multifunctional serine/threonine protein kinase which is engaged in a variety of signaling pathways, regulating a wide range of cellular processes. Due to its distinct regulation mechanism and unique substrate specificity in the molecular pathogenesis of human diseases, GSK-3 is one of the most attractive therapeutic targets for the unmet treatment of pathologies, including type-II diabetes, cancers, inflammation, and neurodegenerative disease. Recent advances in drug discovery targeting GSK-3 involved extensive computational modeling techniques. Both ligand/structure-based approaches have been well explored to design ATP-competitive inhibitors. Molecular modeling plus dynamics simulations can provide insight into the protein-substrate and protein-protein interactions at substrate binding pocket and C-lobe hydrophobic groove, which will benefit the discovery of non-ATP-competitive inhibitors. To identify structurally novel and diverse compounds that effectively inhibit GSK-3â, we performed virtual screening by implementing a mixed ligand/structure-based approach, which included pharmacophore modeling, diversity analysis, and ensemble docking. The sensitivities of different docking protocols to the induced-fit effects at the ATP-competitive binding pocket of GSK-3â have been explored. An enrichment study was employed to verify the robustness of ensemble docking compared to individual docking in terms of retrieving active compounds from a decoy dataset. A total of 24 structurally diverse compounds obtained from the virtual screening experiment underwent biological validation. The bioassay results shothat 15 out of the 24 hit compounds are indeed GSK-3â inhibitors, and among them, one compound exhibiting sub-micromolar inhibitory activity is a reasonable starting point for further optimization. To further identify structurally novel GSK-3â inhibitors, we performed virtual screening by implementing another mixed ligand-based/structure-based approach, which included quantitative structure-activity relationship (QSAR) analysis and docking prediction. To integrate and analyze complex data sets from multiple experimental sources, we drafted and validated hierarchical QSAR, which adopts a multi-level structure to take data heterogeneity into account. A collection of 728 GSK-3 inhibitors with diverse structural scaffolds were obtained from published papers of 7 research groups based on different experimental protocols. Support vector machines and random forests were implemented with wrapper-based feature selection algorithms in order to construct predictive learning models. The best models for each single group of compounds were then selected, based on both internal and external validation, and used to build the final hierarchical QSAR model. The predictive performance of the hierarchical QSAR model can be demonstrated by an overall R2 of 0.752 for the 141 compounds in the test set. The compounds obtained from the virtual screening experiment underwent biological validation. The bioassay results confirmed that 2 hit compounds are indeed GSK-3â inhibitors exhibiting sub-micromolar inhibitory activity, and therefore validated hierarchical QSAR as an effective approach to be used in virtual screening experiments. We have successfully implemented a variant of supervised learning algorithm, named multiple-instance learning, in order to predict bioactive conformers of a given molecule which are responsible for the observed biological activity. The implementation requires instance-based embedding, and joint feature selection and classification. The goal of the present project is to implement multiple-instance learning in drug activity prediction, and subsequently to identify the bioactive conformers for each molecule. The proposed approach was proven not to suffer from overfitting and to be highly competitive with classical predictive models, so it is very powerful for drug activity prediction. The approach was also validated as a useful method for pursuit of bioactive conformers

    Physically inspired methods and development of data-driven predictive systems.

    Get PDF
    Traditionally building of predictive models is perceived as a combination of both science and art. Although the designer of a predictive system effectively follows a prescribed procedure, his domain knowledge as well as expertise and intuition in the field of machine learning are often irreplaceable. However, in many practical situations it is possible to build well–performing predictive systems by following a rigorous methodology and offsetting not only the lack of domain knowledge but also partial lack of expertise and intuition, by computational power. The generalised predictive model development cycle discussed in this thesis is an example of such methodology, which despite being computationally expensive, has been successfully applied to real–world problems. The proposed predictive system design cycle is a purely data–driven approach. The quality of data used to build the system is thus of crucial importance. In practice however, the data is rarely perfect. Common problems include missing values, high dimensionality or very limited amount of labelled exemplars. In order to address these issues, this work investigated and exploited inspirations coming from physics. The novel use of well–established physical models in the form of potential fields, has resulted in derivation of a comprehensive Electrostatic Field Classification Framework for supervised and semi–supervised learning from incomplete data. Although the computational power constantly becomes cheaper and more accessible, it is not infinite. Therefore efficient techniques able to exploit finite amount of predictive information content of the data and limit the computational requirements of the resource–hungry predictive system design procedure are very desirable. In designing such techniques this work once again investigated and exploited inspirations coming from physics. By using an analogy with a set of interacting particles and the resulting Information Theoretic Learning framework, the Density Preserving Sampling technique has been derived. This technique acts as a computationally efficient alternative for cross–validation, which fits well within the proposed methodology. All methods derived in this thesis have been thoroughly tested on a number of benchmark datasets. The proposed generalised predictive model design cycle has been successfully applied to two real–world environmental problems, in which a comparative study of Density Preserving Sampling and cross–validation has also been performed confirming great potential of the proposed methods

    ADVANCES IN SYSTEM RELIABILITY-BASED DESIGN AND PROGNOSTICS AND HEALTH MANAGEMENT (PHM) FOR SYSTEM RESILIENCE ANALYSIS AND DESIGN

    Get PDF
    Failures of engineered systems can lead to significant economic and societal losses. Despite tremendous efforts (e.g., $200 billion annually) denoted to reliability and maintenance, unexpected catastrophic failures still occurs. To minimize the losses, reliability of engineered systems must be ensured throughout their life-cycle amidst uncertain operational condition and manufacturing variability. In most engineered systems, the required system reliability level under adverse events is achieved by adding system redundancies and/or conducting system reliability-based design optimization (RBDO). However, a high level of system redundancy increases a system's life-cycle cost (LCC) and system RBDO cannot ensure the system reliability when unexpected loading/environmental conditions are applied and unexpected system failures are developed. In contrast, a new design paradigm, referred to as resilience-driven system design, can ensure highly reliable system designs under any loading/environmental conditions and system failures while considerably reducing systems' LCC. In order to facilitate the development of formal methodologies for this design paradigm, this research aims at advancing two essential and co-related research areas: Research Thrust 1 - system RBDO and Research Thrust 2 - system prognostics and health management (PHM). In Research Thrust 1, reliability analyses under uncertainty will be carried out in both component and system levels against critical failure mechanisms. In Research Thrust 2, highly accurate and robust PHM systems will be designed for engineered systems with a single or multiple time-scale(s). To demonstrate the effectiveness of the proposed system RBDO and PHM techniques, multiple engineering case studies will be presented and discussed. Following the development of Research Thrusts 1 and 2, Research Thrust 3 - resilience-driven system design will establish a theoretical basis and design framework of engineering resilience in a mathematical and statistical context, where engineering resilience will be formulated in terms of system reliability and restoration and the proposed design framework will be demonstrated with a simplified aircraft control actuator design problem
    • …
    corecore