751 research outputs found

    Calculating and understanding the value of any type of match evidence when there are potential testing errors

    Get PDF
    It is well known that Bayes’ theorem (with likelihood ratios) can be used to calculate the impact of evidence, such as a ‘match’ of some feature of a person. Typically the feature of interest is the DNA profile, but the method applies in principle to any feature of a person or object, including not just DNA, fingerprints, or footprints, but also more basic features such as skin colour, height, hair colour or even name. Notwithstanding concerns about the extensiveness of databases of such features, a serious challenge to the use of Bayes in such legal contexts is that its standard formulaic representations are not readily understandable to non-statisticians. Attempts to get round this problem usually involve representations based around some variation of an event tree. While this approach works well in explaining the most trivial instance of Bayes’ theorem (involving a single hypothesis and a single piece of evidence) it does not scale up to realistic situations. In particular, even with a single piece of match evidence, if we wish to incorporate the possibility that there are potential errors (both false positives and false negatives) introduced at any stage in the investigative process, matters become very complex. As a result we have observed expert witnesses (in different areas of speciality) routinely ignore the possibility of errors when presenting their evidence. To counter this, we produce what we believe is the first full probabilistic solution of the simple case of generic match evidence incorporating both classes of testing errors. Unfortunately, the resultant event tree solution is too complex for intuitive comprehension. And, crucially, the event tree also fails to represent the causal information that underpins the argument. In contrast, we also present a simple-to-construct graphical Bayesian Network (BN) solution that automatically performs the calculations and may also be intuitively simpler to understand. Although there have been multiple previous applications of BNs for analysing forensic evidence—including very detailed models for the DNA matching problem, these models have not widely penetrated the expert witness community. Nor have they addressed the basic generic match problem incorporating the two types of testing error. Hence we believe our basic BN solution provides an important mechanism for convincing experts—and eventually the legal community—that it is possible to rigorously analyse and communicate the full impact of match evidence on a case, in the presence of possible error

    Are lawnmowers a greater risk than terrorists?

    Get PDF
    In December 2017 the Royal Statistical Society announced the winner of its “International Statistic of the Year”. The statistic was simply "69" which it said was "the annual number of Americans killed, on average, by lawnmowers - compared to two Americans killed annually, on average, by immigrant Jihadist terrorists". Contrary to the statement in the Royal Statistical Society citation, the figures directly comparing numbers killed by lawnmower with those killed by Jihadist terrorists, do NOT ‘highlight misunderstandings of risk’ or ‘illuminate the bigger picture’. They do the exact opposite as we explain here

    Criminally Incompetent Academic Misinterpretation of Criminal Data-and how the Media Pushed the Fake News

    Get PDF
    On 17 Jan 2018 multiple news sources (e.g. see here, here, and here) ran a story about a new research paper ‎ that claims to expose both the inaccuracies and racial bias in one of the most common algorithms used by parole boards to predict recidivism (i.e. whether or not a defendant will re-offend). The research paper was written by the world famous computer scientist Hany Farid (along with a student Julia Dressel). But the real story here is that the paper’s accusation of racial bias (specifically that the algorithm is biased against black people) is based on a fundamental misunderstanding of causation and statistics. The algorithm is no more ‘biased’ against black people than it is biased against white single parents, ‎ old people, people living in Beattyville Kentucky, or women called ‘Amber’. In fact, as we show in this brief article, if you choose any factor that correlates with poverty you will inevitably replicate the statistical ‘bias’ claimed in the paper. And if you accept the validity of the claims in the paper then you must also accept, for example, that a charity which uses poverty as a factor to identify and help homeless people is being racist because it is biased against white people (and also, interestingly, Indian Americans). The fact that the article was published and that none of the media running the story realise that they are pushing fake news is what is most important here. Depressingly, many similar research studies involving the same kind of misinterpretation of statistics result in popular media articles that push a false narrative of one kind or another

    Risk Aggregation in the presence of Discrete Causally Connected Random Variables

    Get PDF
    Risk aggregation is a popular method used to estimate the sum of a collection of financial assets or events, where each asset or event is modelled as a random variable. Applications include insurance, operational risk, stress testing, and sensitivity analysis. In practice the sum of a set of random variables involves the use of two well-known mathematical operations: n-fold convolution (for a fixed number n) and N-fold convolution, defined as the compound sum of a frequency distribution N and a severity distribution, where the number of constant n-fold convolutions is determined by N. Where the severity and frequency variables are independent, and continuous, currently numerical solutions such as, Panjer’s recursion, Fast Fourier transforms and Monte Carlo simulation produce acceptable results. However, they have not been designed to cope with new modelling challenges that require hybrid models containing discrete explanatory (regime switching) variables or where discrete and continuous variables are inter-dependent and may influence the severity and frequency in complex, non-linear, ways. This paper de-scribes a Bayesian Factorization and Elimination (BFE) algorithm that performs convo

    Simpson's Paradox and the implications for medical trials.

    Get PDF
    This paper describes Simpson's paradox, and explains its serious implications for randomised control trials. In particular, we show that for any number of variables we can simulate the result of a controlled trial which uniformly points to one conclusion (such as 'drug is effective') for every possible combination of the variable states, but when a previously unobserved confounding variable is included every possible combination of the variables state points to the opposite conclusion ('drug is not effective'). In other words no matter how many variables are considered, and no matter how 'conclusive' the result, one cannot conclude the result is truly 'valid' since there is theoretically an unobserved confounding variable that could completely reverse the result

    Moving from big data and machine learning to smart data and causal modelling: a simple example from consumer research and marketing

    Get PDF
    Provides a simple illustration of why the "pure machine learning from big data" approach is inevitably inadequate without expert judgement. Uses a causal Bayesian networ
    • 

    corecore