56,445 research outputs found

    Cross-entropy optimisation of importance sampling parameters for statistical model checking

    Get PDF
    Statistical model checking avoids the exponential growth of states associated with probabilistic model checking by estimating properties from multiple executions of a system and by giving results within confidence bounds. Rare properties are often very important but pose a particular challenge for simulation-based approaches, hence a key objective under these circumstances is to reduce the number and length of simulations necessary to produce a given level of confidence. Importance sampling is a well-established technique that achieves this, however to maintain the advantages of statistical model checking it is necessary to find good importance sampling distributions without considering the entire state space. Motivated by the above, we present a simple algorithm that uses the notion of cross-entropy to find the optimal parameters for an importance sampling distribution. In contrast to previous work, our algorithm uses a low dimensional vector of parameters to define this distribution and thus avoids the often intractable explicit representation of a transition matrix. We show that our parametrisation leads to a unique optimum and can produce many orders of magnitude improvement in simulation efficiency. We demonstrate the efficacy of our methodology by applying it to models from reliability engineering and biochemistry.Comment: 16 pages, 8 figures, LNCS styl

    Simulation and inference algorithms for stochastic biochemical reaction networks: from basic concepts to state-of-the-art

    Full text link
    Stochasticity is a key characteristic of intracellular processes such as gene regulation and chemical signalling. Therefore, characterising stochastic effects in biochemical systems is essential to understand the complex dynamics of living things. Mathematical idealisations of biochemically reacting systems must be able to capture stochastic phenomena. While robust theory exists to describe such stochastic models, the computational challenges in exploring these models can be a significant burden in practice since realistic models are analytically intractable. Determining the expected behaviour and variability of a stochastic biochemical reaction network requires many probabilistic simulations of its evolution. Using a biochemical reaction network model to assist in the interpretation of time course data from a biological experiment is an even greater challenge due to the intractability of the likelihood function for determining observation probabilities. These computational challenges have been subjects of active research for over four decades. In this review, we present an accessible discussion of the major historical developments and state-of-the-art computational techniques relevant to simulation and inference problems for stochastic biochemical reaction network models. Detailed algorithms for particularly important methods are described and complemented with MATLAB implementations. As a result, this review provides a practical and accessible introduction to computational methods for stochastic models within the life sciences community

    Making inferences with small numbers of training sets

    Get PDF
    A potential methodological problem with empirical studies that assess project effort prediction system is discussed. Frequently, a hold-out strategy is deployed so that the data set is split into a training and a validation set. Inferences are then made concerning the relative accuracy of the different prediction techniques under examination. This is typically done on very small numbers of sampled training sets. It is shown that such studies can lead to almost random results (particularly where relatively small effects are being studied). To illustrate this problem, two data sets are analysed using a configuration problem for case-based prediction and results generated from 100 training sets. This enables results to be produced with quantified confidence limits. From this it is concluded that in both cases using less than five training sets leads to untrustworthy results, and ideally more than 20 sets should be deployed. Unfortunately, this raises a question over a number of empirical validations of prediction techniques, and so it is suggested that further research is needed as a matter of urgency

    Constraining the Mass Profiles of Stellar Systems: Schwarzschild Modeling of Discrete Velocity Datasets

    Full text link
    (ABRIDGED) We present a new Schwarzschild orbit-superposition code designed to model discrete datasets composed of velocities of individual kinematic tracers in a dynamical system. This constitutes an extension of previous implementations that can only address continuous data in the form of (the moments of) velocity distributions, thus avoiding potentially important losses of information due to data binning. Furthermore, the code can handle any combination of available velocity components, i.e., only line-of-sight velocities, only proper motions, or a combination of both. It can also handle a combination of discrete and continuous data. The code finds the distribution function (DF, a function of the three integrals of motion E, Lz, and I3) that best reproduces the available kinematic and photometric observations in a given axisymmetric gravitational potential. The fully numerical approach ensures considerable freedom on the form of the DF f(E,Lz,I3). This allows a very general modeling of the orbital structure, thus avoiding restrictive assumptions about the degree of (an)isotropy of the orbits. We describe the implementation of the discrete code and present a series of tests of its performance based on the modeling of simulated datasets generated from a known DF. We find that the discrete Schwarzschild code recovers the original orbital structure, M/L ratios, and inclination of the input datasets to satisfactory accuracy, as quantified by various statistics. The code will be valuable, e.g., for modeling stellar motions in Galactic globular clusters, and those of individual stars, planetary nebulae, or globular clusters in nearby galaxies. This can shed new light on the total mass distributions of these systems, with central black holes and dark matter halos being of particular interest.Comment: ApJ, in press; 51 pages, 11 figures; manuscript revised following comments by refere

    SIMULATED MAXIMUM LIKELIHOOD FOR DOUBLE-BOUNDED REFERENDUM MODELS

    Get PDF
    Although joint estimation of referendum-type contingent value (CV) survey responses using maximum-likelihood models is preferred to single-equation estimation, it has been largely disregarded because estimation involves evaluating multivariate normal probabilities. New developments in the construction of probability simulators have addressed this problem, and simulated maximum likelihood (SML) for multiple-good models is now possible. This analysis applies SML for a three-good model under a double-bounded questioning format. Results indicate joint estimation substantially improves the variances of the parameters and willingness-to-pay estimates.Research Methods/ Statistical Methods,
    • …
    corecore