116 research outputs found

    Causality and Association: The Statistical and Legal Approaches

    Get PDF
    This paper discusses different needs and approaches to establishing ``causation'' that are relevant in legal cases involving statistical input based on epidemiological (or more generally observational or population-based) information. We distinguish between three versions of ``cause'': the first involves negligence in providing or allowing exposure, the second involves ``cause'' as it is shown through a scientifically proved increased risk of an outcome from the exposure in a population, and the third considers ``cause'' as it might apply to an individual plaintiff based on the first two. The population-oriented ``cause'' is that commonly addressed by statisticians, and we propose a variation on the Bradford Hill approach to testing such causality in an observational framework, and discuss how such a systematic series of tests might be considered in a legal context. We review some current legal approaches to using probabilistic statements, and link these with the scientific methodology as developed here. In particular, we provide an approach both to the idea of individual outcomes being caused on a balance of probabilities, and to the idea of material contribution to such outcomes. Statistical terminology and legal usage of terms such as ``proof on the balance of probabilities'' or ``causation'' can easily become confused, largely due to similar language describing dissimilar concepts; we conclude, however, that a careful analysis can identify and separate those areas in which a legal decision alone is required and those areas in which scientific approaches are useful.Comment: Published in at http://dx.doi.org/10.1214/07-STS234 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Lung cancer and passive smoking: reconciling the biochemical and epidemiological approaches.

    Get PDF
    The accurate determination of exposure to environmental tobacco smoke is notoriously difficult. There have been to date two approaches to determining this exposure in the study of association of passive smoking and lung cancer: the biochemical approach, using cotinine in the main as a marker, and the epidemiological approach. Typically results of the former have yielded much lower relative risk than the latter, and have tended to be ignored in favour of the latter, although there has been considerable debate as to the logical basis for this. We settle this question by showing that, using the epidemiologically based meta-analysis technique of Wald et al. (1986), and misclassification models in the EPA Draft Review (1990), one arrives using all current studies at a result which is virtually identical with the biochemically-based conclusions of Darby and Pike (1988) or Repace and Lowry (1990). The conduct of this meta-analysis itself raises a number of important methodological questions, including the validity of inclusion of studies, the use of estimates adjusted for covariates, and the statistical significance of estimates based on meta-analysis of the epidemiological data. The best estimate of relative risk from spousal smoking is shown to be approximately 1.05-1.10, based on either of these approaches; but it is suggested that considerable extra work is needed to establish whether this is significantly raised

    Time series prediction via aggregation : an oracle bound including numerical cost

    Full text link
    We address the problem of forecasting a time series meeting the Causal Bernoulli Shift model, using a parametric set of predictors. The aggregation technique provides a predictor with well established and quite satisfying theoretical properties expressed by an oracle inequality for the prediction risk. The numerical computation of the aggregated predictor usually relies on a Markov chain Monte Carlo method whose convergence should be evaluated. In particular, it is crucial to bound the number of simulations needed to achieve a numerical precision of the same order as the prediction risk. In this direction we present a fairly general result which can be seen as an oracle inequality including the numerical cost of the predictor computation. The numerical cost appears by letting the oracle inequality depend on the number of simulations required in the Monte Carlo approximation. Some numerical experiments are then carried out to support our findings

    A Bayesian Network-based customer satisfaction model: a tool for management decisions in railway transport

    Get PDF
    We formalise and present an innovative general approach for developing complex system models from survey data by applying Bayesian Networks. The challenges and approaches to converting survey data into usable probability forms are explained and a general approach for integrating expert knowledge (judgements) into Bayesian complex system models is presented. The structural complexities of the Bayesian complex system modelling process, based on various decision contexts, are also explained along with a solution. A novel application of Bayesian complex system models as a management tool for decision making is demonstrated using a railway transport case study. Customer satisfaction, which is a Key Performance Indicator in public transport management, is modelled using data from customer surveys conducted by Queensland Rail, Australia

    Accelerating MCMC Algorithms

    Get PDF
    Markov chain Monte Carlo algorithms are used to simulate from complex statistical distributions by way of a local exploration of these distributions. This local feature avoids heavy requests on understanding the nature of the target, but it also potentially induces a lengthy exploration of this target, with a requirement on the number of simulations that grows with the dimension of the problem and with the complexity of the data behind it. Several techniques are available towards accelerating the convergence of these Monte Carlo algorithms, either at the exploration level (as in tempering, Hamiltonian Monte Carlo and partly deterministic methods) or at the exploitation level (with Rao-Blackwellisation and scalable methods).Comment: This is a survey paper, submitted WIREs Computational Statistics, to with 6 figure

    Spatial and temporal patterns of locally-acquired dengue transmission in Northern Queensland, Australia, 1993-2012

    Get PDF
    Background: Dengue has been a major public health concern in Australia since it re-emerged in Queensland in 1992–1993. We explored spatio-temporal characteristics of locally-acquired dengue cases in northern tropical Queensland, Australia during the period 1993–2012.Methods: Locally-acquired notified cases of dengue were collected for northern tropical Queensland from 1993 to 2012. Descriptive spatial and temporal analyses were conducted using geographic information system tools and geostatistical techniques. Results: 2,398 locally-acquired dengue cases were recorded in northern tropical Queensland during the study period. The areas affected by the dengue cases exhibited spatial and temporal variation over the study period. Notified cases of dengue occurred more frequently in autumn. Mapping of dengue by statistical local areas (census units) reveals the presence of substantial spatio-temporal variation over time and place. Statistically significant differences in dengue incidence rates among males and females (with more cases in females) (χ2 = 15.17, d.f. = 1, p<0.01). Differences were observed among age groups, but these were not statistically significant. There was a significant positive spatial autocorrelation of dengue incidence for the four sub-periods, with the Moran's I statistic ranging from 0.011 to 0.463 (p<0.01). Semi-variogram analysis and smoothed maps created from interpolation techniques indicate that the pattern of spatial autocorrelation was not homogeneous across the northern Queensland.Conclusions: Tropical areas are potential high-risk areas for mosquito-borne diseases such as dengue. This study demonstrated that the locally-acquired dengue cases have exhibited a spatial and temporal variation over the past twenty years in northern tropical Queensland, Australia. Therefore, this study provides an impetus for further investigation of clusters and risk factors in these high-risk areas

    Declining Orangutan Encounter Rates from Wallace to the Present Suggest the Species Was Once More Abundant

    Get PDF
    BACKGROUND: Bornean orangutans (Pongo pygmaeus) currently occur at low densities and seeing a wild one is a rare event. Compared to present low encounter rates of orangutans, it is striking how many orangutan each day historic collectors like Alfred Russel Wallace were able to shoot continuously over weeks or even months. Does that indicate that some 150 years ago encounter rates with orangutans, or their densities, were higher than now? METHODOLOGY/PRINCIPAL FINDINGS: We test this hypothesis by quantifying encounter rates obtained from hunting accounts, museum collections, and recent field studies, and analysing whether there is a declining trend over time. Logistic regression analyses of our data support such a decline on Borneo between the mid-19th century and the present. Even when controlled for variation in the size of survey and hunting teams and the durations of expeditions, mean daily encounter rates appear to have declined about 6-fold in areas with little or no forest disturbance. CONCLUSIONS/SIGNIFICANCE: This finding has potential consequences for our understanding of orangutans, because it suggests that Bornean orangutans once occurred at higher densities. We explore potential explanations-habitat loss and degradation, hunting, and disease-and conclude that hunting fits the observed patterns best. This suggests that hunting has been underestimated as a key causal factor of orangutan density and distribution, and that species population declines have been more severe than previously estimated based on habitat loss only. Our findings may require us to rethink the biology of orangutans, with much of our ecological understanding possibly being based on field studies of animals living at lower densities than they did historically. Our approach of quantifying species encounter rates from historic data demonstrates that this method can yield valuable information about the ecology and population density of species in the past, providing new insight into species' conservation needs

    Convergence of marine megafauna movement patterns in coastal and open oceans

    Get PDF
    The extent of increasing anthropogenic impacts on large marine vertebrates partly depends on the animals’ movement patterns. Effective conservation requires identification of the key drivers of movement including intrinsic properties and extrinsic constraints associated with the dynamic nature of the environments the animals inhabit. However, the relative importance of intrinsic versus extrinsic factors remains elusive. We analyze a global dataset of ∼2.8 million locations from >2,600 tracked individuals across 50 marine vertebrates evolutionarily separated by millions of years and using different locomotion modes (fly, swim, walk/paddle). Strikingly, movement patterns show a remarkable convergence, being strongly conserved across species and independent of body length and mass, despite these traits ranging over 10 orders of magnitude among the species studied. This represents a fundamental difference between marine and terrestrial vertebrates not previously identified, likely linked to the reduced costs of locomotion in water. Movement patterns were primarily explained by the interaction between species-specific traits and the habitat(s) they move through, resulting in complex movement patterns when moving close to coasts compared with more predictable patterns when moving in open oceans. This distinct difference may be associated with greater complexity within coastal microhabitats, highlighting a critical role of preferred habitat in shaping marine vertebrate global movements. Efforts to develop understanding of the characteristics of vertebrate movement should consider the habitat(s) through which they move to identify how movement patterns will alter with forecasted severe ocean changes, such as reduced Arctic sea ice cover, sea level rise, and declining oxygen content

    Spatio-Temporal Patterns of Barmah Forest Virus Disease in Queensland, Australia

    Get PDF
    Background Barmah Forest virus (BFV) disease is a common and wide-spread mosquito-borne disease in Australia. This study investigated the spatio-temporal patterns of BFV disease in Queensland, Australia using geographical information system (GIS) tools and geostatistical analysis. Methods/Principal Findings We calculated the incidence rates and standardised incidence rates of BFV disease. Moran's I statistic was used to assess the spatial autocorrelation of BFV incidences. Spatial dynamics of BFV disease was examined using semi-variogram analysis. Interpolation techniques were applied to visualise and display the spatial distribution of BFV disease in statistical local areas (SLAs) throughout Queensland. Mapping of BFV disease by SLAs reveals the presence of substantial spatio-temporal variation over time. Statistically significant differences in BFV incidence rates were identified among age groups (χ2 = 7587, df = 7327,p<0.01). There was a significant positive spatial autocorrelation of BFV incidence for all four periods, with the Moran's I statistic ranging from 0.1506 to 0.2901 (p<0.01). Semi-variogram analysis and smoothed maps created from interpolation techniques indicate that the pattern of spatial autocorrelation was not homogeneous across the state. Conclusions/Significance This is the first study to examine spatial and temporal variation in the incidence rates of BFV disease across Queensland using GIS and geostatistics. The BFV transmission varied with age and gender, which may be due to exposure rates or behavioural risk factors. There are differences in the spatio-temporal patterns of BFV disease which may be related to local socio-ecological and environmental factors. These research findings may have implications in the BFV disease control and prevention programs in Queensland

    A Survey of Bayesian Statistical Approaches for Big Data

    Full text link
    The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a review of published studies that present Bayesian statistical approaches specifically for Big Data and discuss the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data
    corecore