697 research outputs found

    Fast methods for training Gaussian processes on large data sets

    Get PDF
    Gaussian process regression (GPR) is a non-parametric Bayesian technique for interpolating or fitting data. The main barrier to further uptake of this powerful tool rests in the computational costs associated with the matrices which arise when dealing with large data sets. Here, we derive some simple results which we have found useful for speeding up the learning stage in the GPR algorithm, and especially for performing Bayesian model comparison between different covariance functions. We apply our techniques to both synthetic and real data and quantify the speed-up relative to using nested sampling to numerically evaluate model evidences.Comment: Fixed missing reference

    Comparison between Suitable Priors for Additive Bayesian Networks

    Full text link
    Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior - like a too weakly informative one - is used, data separation and data sparsity lead to issues in the model selection process. In this work a simulation study between two weakly and a strongly informative priors is presented. As weakly informative prior we use a zero mean Gaussian prior with a large variance, currently implemented in the R-package abn. The second prior belongs to the Student's t-distribution, specifically designed for logistic regressions and, finally, the strongly informative prior is again Gaussian with mean equal to true parameter value and a small variance. We compare the impact of these priors on the accuracy of the learned additive Bayesian network in function of different parameters. We create a simulation study to illustrate Lindley's paradox based on the prior choice. We then conclude by highlighting the good performance of the informative Student's t-prior and the limited impact of the Lindley's paradox. Finally, suggestions for further developments are provided.Comment: 8 pages, 4 figure

    Four reasons to prefer Bayesian analyses over significance testing

    Get PDF
    Inference using significance testing and Bayes factors is compared and contrasted in five case studies based on real research. The first study illustrates that the methods will often agree, both in motivating researchers to conclude that H1 is supported better than H0, and the other way round, that H0 is better supported than H1. The next four, however, show that the methods will also often disagree. In these cases, the aim of the paper will be to motivate the sensible evidential conclusion, and then see which approach matches those intuitions. Specifically, it is shown that a high-powered non-significant result is consistent with no evidence for H0 over H1 worth mentioning, which a Bayes factor can show, and, conversely, that a low-powered non-significant result is consistent with substantial evidence for H0 over H1, again indicated by Bayesian analyses. The fourth study illustrates that a high-powered significant result may not amount to any evidence for H1 over H0, matching the Bayesian conclusion. Finally, the fifth study illustrates that different theories can be evidentially supported to different degrees by the same data; a fact that P-values cannot reflect but Bayes factors can. It is argued that appropriate conclusions match the Bayesian inferences, but not those based on significance testing, where they disagree

    Risk-Driven Design Processes: Balancing Efficiency with Resilience in Product Design

    Get PDF
    Current design methods and approaches focus on increasing the efficiency of the product design system by, for example, eliminating waste and focusing on value creation. However, continuing failures in the development of complex, large scale products and systems point towards weaknesses in the existing approaches. We argue that product development organizations are hindered by the many uncertainties that are inherent in the process. Common management heuristics ignore uncertainty and thus overly simplify the decision making process. Creating transparency regarding uncertainties and the associated risks (i.e. effect of uncertainties on design objectives) is not seen as an explicit priority. Consequently organizations are unable to balance risk and return in their development choices. Product development processes do not emphasize reduction of risks, particularly those risks that are apparent early in the process. In addition, the resilience of the PD system, i.e. its ability to deliver on-target results under uncertainty, is not deliberately designed to match the level of residual uncertainty. This chapter introduces the notion of Risk-Driven Design and its four principles of 1. Creating transparency regarding design risks; 2. Risk-driven decision making; 3. Minimizing uncertainty; and 4. Creating resilience.Massachusetts Institute of Technology. Lean Advancement InitiativeCenter for Clean Water and Clean Energy at MIT and KFUP

    Estimating the Under-Five Mortality Rate Using a Bayesian Hierarchical Time Series Model

    Get PDF
    Background: Millennium Development Goal 4 calls for a reduction in the under-five mortality rate by two-thirds between 1990 and 2015, which corresponds to an annual rate of decline of 4.4%. The United Nations Inter-Agency Group for Child Mortality Estimation estimates under-five mortality in every country to measure progress. For the majority of countries, the estimates within a country are based on the assumption of a piece-wise constant rate of decline. Methods and Findings: This paper proposes an alternative method to estimate under-five mortality, such that the underlying rate of change is allowed to vary smoothly over time using a time series model. Information about the average rate of decline and changes therein is exchanged between countries using a Bayesian hierarchical model. Cross-validation exercises suggest that the proposed model provides credible bounds for the under-five mortality rate that are reasonably well calibrated during the observation period. The alternative estimates suggest smoother trends in under-five mortality and give new insights into changes in the rate of decline within countries. Conclusions: The proposed model offers an alternative modeling approach for obtaining estimates of under-five mortality which removes the restriction of a piece-wise linear rate of decline and introduces hierarchy to exchange information between countries. The newly proposed estimates of the rate of decline in under-5 mortality and the uncertaint

    A frequentist framework of inductive reasoning

    Full text link
    Reacting against the limitation of statistics to decision procedures, R. A. Fisher proposed for inductive reasoning the use of the fiducial distribution, a parameter-space distribution of epistemological probability transferred directly from limiting relative frequencies rather than computed according to the Bayes update rule. The proposal is developed as follows using the confidence measure of a scalar parameter of interest. (With the restriction to one-dimensional parameter space, a confidence measure is essentially a fiducial probability distribution free of complications involving ancillary statistics.) A betting game establishes a sense in which confidence measures are the only reliable inferential probability distributions. The equality between the probabilities encoded in a confidence measure and the coverage rates of the corresponding confidence intervals ensures that the measure's rule for assigning confidence levels to hypotheses is uniquely minimax in the game. Although a confidence measure can be computed without any prior distribution, previous knowledge can be incorporated into confidence-based reasoning. To adjust a p-value or confidence interval for prior information, the confidence measure from the observed data can be combined with one or more independent confidence measures representing previous agent opinion. (The former confidence measure may correspond to a posterior distribution with frequentist matching of coverage probabilities.) The representation of subjective knowledge in terms of confidence measures rather than prior probability distributions preserves approximate frequentist validity.Comment: major revisio

    The gravitino coupling to broken gauge theories applied to the MSSM

    Full text link
    We consider gravitino couplings in theories with broken gauge symmetries. In particular, we compute the single gravitino production cross section in W+ W- fusion processes. Despite recent claims to the contrary, we show that this process is always subdominant to gluon fusion processes in the high energy limit. The full calculation is performed numerically; however, we give analytic expressions for the cross section in the supersymmetric and electroweak limits. We also confirm these results with the use of the effective theory of goldstino interactions.Comment: 26 pages, 4 figure

    Bias correction and Bayesian analysis of aggregate counts in SAGE libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tag-based techniques, such as SAGE, are commonly used to sample the mRNA pool of an organism's transcriptome. Incomplete digestion during the tag formation process may allow for multiple tags to be generated from a given mRNA transcript. The probability of forming a tag varies with its relative location. As a result, the observed tag counts represent a biased sample of the actual transcript pool. In SAGE this bias can be avoided by ignoring all but the 3' most tag but will discard a large fraction of the observed data. Taking this bias into account should allow more of the available data to be used leading to increased statistical power.</p> <p>Results</p> <p>Three new hierarchical models, which directly embed a model for the variation in tag formation probability, are proposed and their associated Bayesian inference algorithms are developed. These models may be applied to libraries at both the tag and aggregate level. Simulation experiments and analysis of real data are used to contrast the accuracy of the various methods. The consequences of tag formation bias are discussed in the context of testing differential expression. A description is given as to how these algorithms can be applied in that context.</p> <p>Conclusions</p> <p>Several Bayesian inference algorithms that account for tag formation effects are compared with the DPB algorithm providing clear evidence of superior performance. The accuracy of inferences when using a particular non-informative prior is found to depend on the expression level of a given gene. The multivariate nature of the approach easily allows both univariate and joint tests of differential expression. Calculations demonstrate the potential for false positive and negative findings due to variation in tag formation probabilities across samples when testing for differential expression.</p
    corecore