751 research outputs found
Calculating and understanding the value of any type of match evidence when there are potential testing errors
It is well known that Bayesâ theorem (with likelihood ratios) can be used to calculate the impact of evidence, such as a âmatchâ of some feature of a person. Typically the feature of interest is the DNA profile, but the method applies in principle to any feature of a person or object, including not just DNA, fingerprints, or footprints, but also more basic features such as skin colour, height, hair colour or even name. Notwithstanding concerns about the extensiveness of databases of such features, a serious challenge to the use of Bayes in such legal contexts is that its standard formulaic representations are not readily understandable to non-statisticians. Attempts to get round this problem usually involve representations based around some variation of an event tree. While this approach works well in explaining the most trivial instance of Bayesâ theorem (involving a single hypothesis and a single piece of evidence) it does not scale up to realistic situations. In particular, even with a single piece of match evidence, if we wish to incorporate the possibility that there are potential errors (both false positives and false negatives) introduced at any stage in the investigative process, matters become very complex. As a result we have observed expert witnesses (in different areas of speciality) routinely ignore the possibility of errors when presenting their evidence. To counter this, we produce what we believe is the first full probabilistic solution of the simple case of generic match evidence incorporating both classes of testing errors. Unfortunately, the resultant event tree solution is too complex for intuitive comprehension. And, crucially, the event tree also fails to represent the causal information that underpins the argument. In contrast, we also present a simple-to-construct graphical Bayesian Network (BN) solution that automatically performs the calculations and may also be intuitively simpler to understand. Although there have been multiple previous applications of BNs for analysing forensic evidenceâincluding very detailed models for the DNA matching problem, these models have not widely penetrated the expert witness community. Nor have they addressed the basic generic match problem incorporating the two types of testing error. Hence we believe our basic BN solution provides an important mechanism for convincing expertsâand eventually the legal communityâthat it is possible to rigorously analyse and communicate the full impact of match evidence on a case, in the presence of possible error
Are lawnmowers a greater risk than terrorists?
In December 2017 the Royal Statistical Society announced the winner of its âInternational Statistic of the Yearâ. The statistic was simply "69" which it said was "the annual number of Americans killed, on average, by lawnmowers - compared to two Americans killed annually, on average, by immigrant Jihadist terrorists". Contrary to the statement in the Royal Statistical Society citation, the figures directly comparing numbers killed by lawnmower with those killed by Jihadist terrorists, do NOT âhighlight misunderstandings of riskâ or âilluminate the bigger pictureâ. They do the exact opposite as we explain here
Criminally Incompetent Academic Misinterpretation of Criminal Data-and how the Media Pushed the Fake News
On 17 Jan 2018 multiple news sources (e.g. see here, here, and here) ran a story about a new research paper â that claims to expose both the inaccuracies and racial bias in one of the most common algorithms used by parole boards to predict recidivism (i.e. whether or not a defendant will re-offend). The research paper was written by the world famous computer scientist Hany Farid (along with a student Julia Dressel). But the real story here is that the paperâs accusation of racial bias (specifically that the algorithm is biased against black people) is based on a fundamental misunderstanding of causation and statistics. The algorithm is no more âbiasedâ against black people than it is biased against white single parents, â old people, people living in Beattyville Kentucky, or women called âAmberâ. In fact, as we show in this brief article, if you choose any factor that correlates with poverty you will inevitably replicate the statistical âbiasâ claimed in the paper. And if you accept the validity of the claims in the paper then you must also accept, for example, that a charity which uses poverty as a factor to identify and help homeless people is being racist because it is biased against white people (and also, interestingly, Indian Americans). The fact that the article was published and that none of the media running the story realise that they are pushing fake news is what is most important here. Depressingly, many similar research studies involving the same kind of misinterpretation of statistics result in popular media articles that push a false narrative of one kind or another
Risk Aggregation in the presence of Discrete Causally Connected Random Variables
Risk aggregation is a popular method used to estimate the sum of a collection of financial assets or events, where each asset or event is modelled as a random variable. Applications include insurance, operational risk, stress testing, and sensitivity analysis. In practice the sum of a set of random variables involves the use of two well-known mathematical operations: n-fold convolution (for a fixed number n) and N-fold convolution, defined as the compound sum of a frequency distribution N and a severity distribution, where the number of constant n-fold convolutions is determined by N. Where the severity and frequency variables are independent, and continuous, currently numerical solutions such as, Panjerâs recursion, Fast Fourier transforms and Monte Carlo simulation produce acceptable results. However, they have not been designed to cope with new modelling challenges that require hybrid models containing discrete explanatory (regime switching) variables or where discrete and continuous variables are inter-dependent and may influence the severity and frequency in complex, non-linear, ways. This paper de-scribes a Bayesian Factorization and Elimination (BFE) algorithm that performs convo
Simpson's Paradox and the implications for medical trials.
This paper describes Simpson's paradox, and explains its serious implications
for randomised control trials. In particular, we show that for any number of
variables we can simulate the result of a controlled trial which uniformly
points to one conclusion (such as 'drug is effective') for every possible
combination of the variable states, but when a previously unobserved
confounding variable is included every possible combination of the variables
state points to the opposite conclusion ('drug is not effective'). In other
words no matter how many variables are considered, and no matter how
'conclusive' the result, one cannot conclude the result is truly 'valid' since
there is theoretically an unobserved confounding variable that could completely
reverse the result
Moving from big data and machine learning to smart data and causal modelling: a simple example from consumer research and marketing
Provides a simple illustration of why the "pure machine learning from big data" approach is inevitably inadequate without expert judgement. Uses a causal Bayesian networ
- âŠ