289 research outputs found

    Exact Approaches for Bias Detection and Avoidance with Small, Sparse, or Correlated Categorical Data

    Get PDF
    Every day, traditional statistical methodology are used world wide to study a variety of topics and provides insight regarding countless subjects. Each technique is based on a distinct set of assumptions to ensure valid results. Additionally, many statistical approaches rely on large sample behavior and may collapse or degenerate in the presence of small, spare, or correlated data. This dissertation details several advancements to detect these conditions, avoid their consequences, and analyze data in a different way to yield trustworthy results. One of the most commonly used modeling techniques for outcomes with only two possible categorical values (eg. live/die, pass/fail, better/worse, ect.) is logistic regression. While some potential complications with this approach are widely known, many investigators are unaware that their particular data does not meet the foundational assumptions, since they are not easy to verify. We have developed a routine for determining if a researcher should be concerned about potential bias in logistic regression results, so they can take steps to mitigate the bias or use a different procedure altogether to model the data. Correlated data may arise from common situations such as multi-site medical studies, research on family units, or investigations on student achievement within classrooms. In these circumstance the associations between cluster members must be included in any statistical analysis testing the hypothesis of a connection be-tween two variables in order for results to be valid. Previously investigators had to choose between using a method intended for small or sparse data while assuming independence between observations or a method that allowed for correlation between observations, while requiring large samples to be reliable. We present a new method that allows for small, clustered samples to be assessed for a relationship between a two-level predictor (eg. treatment/control) and a categorical outcome (eg. low/medium/high)

    Deterministic stream-sampling for probabilistic programming: semantics and verification

    Get PDF
    Probabilistic programming languages rely fundamentally on some notion of sampling, and this is doubly true for probabilistic programming languages which perform Bayesian inference using Monte Carlo techniques. Verifying samplers - proving that they generate samples from the correct distribution - is crucial to the use of probabilistic programming languages for statistical modelling and inference. However, the typical denotational semantics of probabilistic programs is incompatible with deterministic notions of sampling. This is problematic, considering that most statistical inference is performed using pseudorandom number generators.We present a higher-order probabilistic programming language centred on the notion of samplers and sampler operations. We give this language an operational and denotational semantics in terms of continuous maps between topological spaces. Our language also supports discontinuous operations, such as comparisons between reals, by using the type system to track discontinuities. This feature might be of independent interest, for example in the context of differentiable programming.Using this language, we develop tools for the formal verification of sampler correctness. We present an equational calculus to reason about equivalence of samplers, and a sound calculus to prove semantic correctness of samplers, i.e. that a sampler correctly targets a given measure by construction

    Understanding Travel Behaviour: Some Appealing Research Directions

    Get PDF
    This paper presents one researchers perception of selective emphases in the body of travel behaviour research which have had and/or may in the future have a non-marginal impact on the way that research activity is undertaken. Some of the contributions are well established and have moved from state of the art to state of practice; other efforts are relatively new and maturing in their role as paradigms of thought. The contributions can broadly be grouped into four classes of research: decision paradigms, in particular the interpretation of the choice process within a broad activity framework, and the recognition that agents making decisions do not always operate in a perfectly competitive market; releasing the analytical formalism of the choice/decision process from the restrictive IIA paradigm of the great majority of applied travel choice modelling - moving to nested structures, free variance and correlation among alternatives, random taste weights, accommodating unobserved heterogeneity and mixed 'logits'; combining sources of preference and choice data, including joint analysis of market and experimental choice data, interfaces between attitudinal and behavioural data, and generalising valuation to valuation functions; and advances in the study of the dynamics of traveller behaviour, especially the timing of change and its importance in establishing hurdle dates for forecasting traffic and revenue for infrastructure projects

    Deterministic stream-sampling for probabilistic programming: semantics and verification

    Full text link
    Probabilistic programming languages rely fundamentally on some notion of sampling, and this is doubly true for probabilistic programming languages which perform Bayesian inference using Monte Carlo techniques. Verifying samplers - proving that they generate samples from the correct distribution - is crucial to the use of probabilistic programming languages for statistical modelling and inference. However, the typical denotational semantics of probabilistic programs is incompatible with deterministic notions of sampling. This is problematic, considering that most statistical inference is performed using pseudorandom number generators. We present a higher-order probabilistic programming language centred on the notion of samplers and sampler operations. We give this language an operational and denotational semantics in terms of continuous maps between topological spaces. Our language also supports discontinuous operations, such as comparisons between reals, by using the type system to track discontinuities. This feature might be of independent interest, for example in the context of differentiable programming. Using this language, we develop tools for the formal verification of sampler correctness. We present an equational calculus to reason about equivalence of samplers, and a sound calculus to prove semantic correctness of samplers, i.e. that a sampler correctly targets a given measure by construction.Comment: Extended version of LiCS 2023 pape

    A joint probability approach for the confluence flood frequency analysis

    Get PDF
    The flood frequency analysis at or nearby the confluence of two tributaries is of interest because it is necessary for the design of the highway drainage structures. However, The shortage of the hydrological data at the confluence point makes the flood estimation challenging. This thesis presents a practical procedure for the flood frequency analysis at the confluence of two streams by multivariate simulation of the annual peak flow of the tributaries based on joint probability and Monte Carlo simulation. Copulas are introduced to identify the joint probability. The results of two case studies are compared with the flood estimated by the univariate flood frequency analysis based on the observation data. The results are also compared with the ones by the National Flood Frequency program developed by United State Geological Survey. The results by the proposed model are very close to ones by the unvariate flood frequency analysis

    Computational applications in stochastic operations research

    Get PDF
    Several computational applications in stochastic operations research are presented, where, for each application, a computational engine is used to achieve results that are otherwise overly tedious by hand calculations, or in some cases mathematically intractable. Algorithms and code are developed and implemented with specific emphasis placed on achieving exact results and substantiated via Monte Carlo simulation. The code for each application is provided in the software language utilized and algorithms are available for coding in another environment. The topics include univariate and bivariate nonparametric random variate generation using a piecewise-linear cumulative distribution, deriving exact statistical process control chart constants for non-normal sampling, testing probability distribution conformance to Benford\u27s law, and transient analysis of M/M/s queueing systems. The nonparametric random variate generation chapters provide the modeler with a method of generating univariate and bivariate samples when only observed data is available. The method is completely nonparametric and is capable of mimicking multimodal joint distributions. The algorithm is black-box, where no decisions are required from the modeler in generating variates for simulation. The statistical process control chart constant chapter develops constants for select non-normal distributions, and provides tabulated results for researchers who have identified a given process as non-normal The constants derived are bias correction factors for the sample range and sample standard deviation. The Benford conformance testing chapter offers the Kolmogorov-Smirnov test as an alternative to the standard chi-square goodness-of-fit test when testing whether leading digits of a data set are distributed according to Benford\u27s law. The alternative test has the advantage of being an exact test for all sample sizes, removing the usual sample size restriction involved with the chi-square goodness-of-fit test. The transient queueing analysis chapter develops and automates the construction of the sojourn time distribution for the nth customer in an M/M/s queue with k customers initially present at time 0 (k ≥ 0) without the usual limit on traffic intensity, rho \u3c 1, providing an avenue to conduct transient analysis on various measures of performance for a given initial number of customers in the system. It also develops and automates the construction of the sojourn time joint probability distribution function for pairs of customers, allowing the calculation of the exact covariance between customer sojourn times

    Vol. 4, No. 1 (Full Issue)

    Get PDF

    ROBUST PARAMETER DESIGN IN COMPLEX ENGINEERING SYSTEMS:

    Get PDF
    Many industrial firms seek the systematic reduction of variability as a primary means for reducing production cost and material waste without sacrificing product quality or process efficiency. Despite notable advancements in quality-based estimation and optimization approaches aimed at achieving this goal, various gaps remain between current methodologies and observed in modern industrial environments. In many cases, models rely on assumptions that either limit their usefulness or diminish the reliability of the estimated results. This includes instances where models are generalized to a specific set of assumed process conditions, which constrains their applicability against a wider array of industrial problems. However, such generalizations often do not hold in practice. If the realities are ignored, the derived estimates can be misleading and, once applied to optimization schemes, can result in suboptimal solutions and dubious recommendations to decision makers. The goal of this research is to develop improved quality models that more fully explore innate process conditions, rely less on theoretical assumptions, and have extensions to an array of more realistic industrial environments. Several key areas are addressed in which further research can reinforce foundations, extend existing knowledge and applications, and narrow the gap between academia and industry. These include the integration of a more comprehensive approach to data analysis, the development of conditions-based approaches to tier-one and tier-two estimation, achieving cost robustness in the face of dynamic process variability, the development of new strategies for eliminating variability at the source, and the integration of trade-off analyses that balance the need for enhanced precision against associated costs. Pursuant to a detailed literature review, various quality models are proposed, and numerical examples are used to validate their use

    Algorithms for operations on probability distributions in a computer algebra system

    Get PDF
    In mathematics and statistics, the desire to eliminate mathematical tedium and facilitate exploration has lead to computer algebra systems. These computer algebra systems allow students and researchers to perform more of their work at a conceptual level. The design of generic algorithms for tedious computations allows modelers to push current modeling boundaries outward more quickly.;Probability theory, with its many theorems and symbolic manipulations of random variables is a discipline in which automation of certain processes is highly practical, functional, and efficient. There are many existing statistical software packages, such as SPSS, SAS, and S-Plus, that have numeric tools for statistical applications. There is a potential for a probability package analogous to these statistical packages for manipulation of random variables. The software package being developed as part of this dissertation, referred to as A Probability Programming Language (APPL) is a random variable manipulator and is proposed to fill a technology gap that exists in probability theory.;My research involves developing algorithms for the manipulation of discrete random variables. By defining data structures for random variables and writing algorithms for implementing common operations, more interesting and mathematically intractable probability problems can be solved, including those not attempted in undergraduate statistics courses because they were deemed too mechanically arduous. Algorithms for calculating the probability density function of order statistics, transformations, convolutions, products, and minimums/maximums of independent discrete random variables are included in this dissertation
    • …
    corecore