8 research outputs found

    A Characterization of Bias Introduced into Forensic Source Identification when there is a Subpopulation Structure in the Relevant Source Population.

    Get PDF
    In forensic source identification the forensic expert is responsible for providing a summary of the evidence that allows for a decision maker to make a logical and coherent decision concerning the source of some trace evidence of interest. The academic consensus is usually that this summary should take the form of a likelihood ratio (LR) that summarizes the likelihood of the trace evidence arising under two competing propositions. These competing propositions are usually referred to as the prosecution’s proposition, that the specified source is the actual source of the trace evidence, and the defense’s proposition, that another source in a relevant background population is the actual source of the trace evidence. When a relevant background population has a subpopulation structure, the rates of misleading evidence of the LR will tend to vary within the subpopulations, sometimes to an alarming degree. Our preliminary work concerning synthetic and real data indicates that the rates of misleading evidence are different among subpopulations of different sizes, which can lead to a systematic bias when using a LR to present evidence. In this presentation we will summarize our preliminary results for characterizing this bias

    Session 10: Student Speed Presentations

    No full text
    Student speed presentations: 2D respiratory sound analysis to detect lung abnormalities - Rafia Alice A Characterization of Bias Introduced into Forensic Source Identification when there is a Subpopulation Structure -Dylan Borchert A novel approach to detect COVID-19 fake news by mining biomedical information from news articles - Jordan Smith Two-Stage Approach for Forensic Handwriting Analysis - Ashlan SimpsonModels for Predicting Maximum Potential Intensity of Tropical Cyclones - Iftekhar Chowdhur

    Estimation of Parameters of the Truncated Normal Distribution with Unknown Bounds

    No full text
    The expectation-maximization (EM) algorithm is a commonly used iterative algorithm for providing parameter estimates of distributions for truncated samples when the truncation points or number of missing observations are known. There is also literature for estimating the unknown bounds of truncated distributions. However, there are no works that accommodate both parameter and bound estimation. In this work, we propose a methodology and an iterative algorithm known as an expectation-solution (ES) algorithm to estimate the location, scale, and truncation parameters of the truncated normal distribution. A preliminary simulation study illustrates the utility of this methodology

    Probabilistic Foundations for the Use of the Logistic Regression Bayes Factor in Forensic Source Identification

    No full text
    In comparison to likelihood ratios (LRs), Bayes factors (BFs) have the advantage that uncertainty in model parameter values is taken into account in a logical and coherent manner. In forensic literature, it is common to calculate BFs for generative models. It is also common to calculate LRs for discriminative models, for example using maximum likelihood (ML) estimates of logistic regression parameters. In this report, we present an approach to calculate BFs when using logistic regression as a model to discriminate between two classes. In logistic regression, the log of the LR between the two classes follows a functional form. We will focus on the case where this functional form is linear. This is equivalent to the log of the posterior odds of group membership following a linear model. We propose the calculation of the BF utilizing the posterior odds ratio, as well as using the LR function in the context of Ommen and Saunders, 2021. Using a database of simulated observations generated under two different models, we can obtain a posterior distribution for the parameters of the logistic regression, and use this distribution to obtain the posterior odds of group membership for a new observation with unknown membership. This posterior odds ratio can then be divided by the prior odds ratio to obtain the corresponding BF. An important note is that by constructing the database with a prespecified number of observations under each model, we are fixing the base rates. This removes the Bernoulli sampling process of the labels used to construct the likelihood function for the logistic regression, which will be discussed in the context of McLachlan, 2004. As a result, our discriminative model is an approximation to the latent generative models of the two classes. We study the convergence of the BF to the LR for two different BF calculations, and show that for large sample sizes they both converge. Also, we compare the calculated BFs of the two approaches to a reference BF, LR, and the plug-in estimate of the LR

    Identifying Subpopulations of a Hierarchical Structured Data using a Semi-Supervised Mixture Modeling Approach

    No full text
    The field of forensic statistics offers a unique hierarchical data structure in which a population is composed of several subpopulations of sources and a sample is collected from each source. This subpopulation structure creates a hierarchical layer. We propose using a semi-supervised mixture modeling approach to model the subpopulation structure which leverages the fact that we know the collection of samples came from the same, yet unknown, source. A simulation study based on a famous glass data was conducted and shows this method performs better than other unsupervised approaches which have been previously used in practice

    Studying Algorithmic Bias in Forensic Source Identification Problems

    No full text
    Abstract: This study focuses on forensic source identification, focusing on the hierarchical latent structures that can occur within the relevant source population when using likelihood ratio approaches. We will study systematic algorithmic bias that can occur as measured by Rates of Misleading Evidence in favor of the Prosecutor (RMEP) and the Rate of Misleading Evidence in favor of the Defense (RMED) for each of the subpopulations when the subpopulation structure is not accounted for. This will be done through an extensive simulation study which will identify and characterize subpopulations and quantify forensic evidence. We will be considering varying factors such as the number of subpopulations, mixture proportions, within-source variation, and dimensionality. These parameters add a nuanced layer to the investigation, providing insights into how variations in subpopulation characteristics impact the forensic likelihood ratio. The research illuminates the complex interplay of these factors, enhancing our understanding of forensic evidence assessment and making decisions in hierarchically structured data

    An Alpha-based Prescreening Methodology for a Common but Unknown Source Likelihood Ratio with Different Subpopulation Structures

    No full text
    Prescreening is a commonly used methodology in which the forensic examiner includes sources from the background population that meet a certain degree of similarity to the given piece of evidence. The goal of prescreening is to find the sources closest to the given piece of evidence in an alternative source population for further analysis. This paper discusses the behavior of an α−\alpha-based prescreening methodology in the form of a Hotelling T2T^2 test on the background population for a common but unknown source likelihood ratio. An extensive simulation study with synthetic and real data were conducted. We find that prescreening helps give an accurate estimate of the likelihood ratio when there is a subpopulation structure in the alternative source population
    corecore