50 research outputs found

    Lab Retriever: a software tool for calculating likelihood ratios incorporating a probability of drop-out for forensic DNA profiles

    Get PDF
    BACKGROUND: Technological advances have enabled the analysis of very small amounts of DNA in forensic cases. However, the DNA profiles from such evidence are frequently incomplete and can contain contributions from multiple individuals. The complexity of such samples confounds the assessment of the statistical weight of such evidence. One approach to account for this uncertainty is to use a likelihood ratio framework to compare the probability of the evidence profile under different scenarios. While researchers favor the likelihood ratio framework, few open-source software solutions with a graphical user interface implementing these calculations are available for practicing forensic scientists. RESULTS: To address this need, we developed Lab Retriever, an open-source, freely available program that forensic scientists can use to calculate likelihood ratios for complex DNA profiles. Lab Retriever adds a graphical user interface, written primarily in JavaScript, on top of a C++ implementation of the previously published R code of Balding. We redesigned parts of the original Balding algorithm to improve computational speed. In addition to incorporating a probability of allelic drop-out and other critical parameters, Lab Retriever computes likelihood ratios for hypotheses that can include up to four unknown contributors to a mixed sample. These computations are completed nearly instantaneously on a modern PC or Mac computer. CONCLUSIONS: Lab Retriever provides a practical software solution to forensic scientists who wish to assess the statistical weight of evidence for complex DNA profiles. Executable versions of the program are freely available for Mac OSX and Windows operating systems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0740-8) contains supplementary material, which is available to authorized users

    Exploratory data analysis for the interpretation of low template DNA mixtures

    No full text
    The interpretation of DNA mixtures has proven to be a complex problem in forensic genetics. In particular, low template DNA samples, where alleles can be missing (allele drop-out), or where alleles unrelated to the crime-sample are amplified (allele drop-in), cannot be analysed with classical approaches such as random man not excluded or random match probability. Drop-out, drop-in, stutters and other PCR-related stochastic effects, create uncertainty about the composition of the crime-sample, making it difficult to attach a weight of evidence when (a) reference sample(s) is (are) compared to the crime-sample. In this paper, we use a probabilistic model to calculate likelihood ratios when there is uncertainty about the composition of the crime-sample. This model is essentially exploratory in the sense that it allows the exploration of LRs when two key-parameters, drop-out and drop-in are varied within their plausible ranges of variation. We build on the work of Curran et al. [8], and improve their probabilistic model to allow more flexibility in the way the model parameters are applied. Two new main modifications are brought to their model: (i) different drop-out probabilities can be applied to different contributors, and (ii) different parameters can be used under the prosecution and the defence hypotheses. We illustrate how the LRs can be explored when the drop-out and drop-in parameters are varied, and suggest the use of Monte Carlo simulations to derive plausible ranges for the probability of drop-out. Although the model is suited for both high and low template samples, we illustrate the advantages of the exploratory approach through two DNA mixtures (involving two and at least three individuals) with low template components. © 2012 Elsevier Ireland Ltd. All rights reserved

    Explaining Predictions from Tree-based Boosting Ensembles

    No full text
    Understanding how "black-box" models arrive at their predictions has sparked significant interest from both within and outside the AI community. Our work focuses on doing this by generating local explanations about individual predictions for tree-based ensembles, specifically Gradient Boosting Decision Trees (GBDTs). Given a correctly predicted instance in the training set, we wish to generate a counterfactual explanation for this instance, that is, the minimal perturbation of this instance such that the prediction flips to the opposite class. Most existing methods for counterfactual explanations are (1) model-agnostic, so they do not take into account the structure of the original model, and/or (2) involve building a surrogate model on top of the original model, which is not guaranteed to represent the original model accurately. There exists a method specifically for random forests; we wish to extend this method for GBDTs. This involves accounting for (1) the sequential dependency between trees and (2) training on the negative gradients instead of the original labels

    Actionable Interpretability through Optimizable Counterfactual Explanations for Tree Ensembles

    No full text
    Counterfactual explanations help users understand why machine learned models make certain decisions, and more specifically, how these decisions can be changed. In this work, we frame the problem of finding counterfactual explanations – the minimal perturbation to an input such that the prediction changes – as an optimization task. Previously, optimization techniques for generating counterfactual examples could only be applied to differentiable models, or alternatively via query access to the model by estimating gradients from randomly sampled perturbations. In order to accommodate non-differentiable models such as tree ensembles, we propose using probabilistic model approximations in the optimization framework. We introduce a novel approximation technique that is effective for finding counterfactual explanations while also closely approximating the original model. Our results show that our method is able to produce counterfactual examples that are closer to the original instance in terms of Euclidean, Cosine, and Manhattan distance compared to other methods specifically designed for tree ensembles.<br/
    corecore