22 research outputs found
Preliminary assessment of three quantitative approaches for estimating time-since-deposition from autofluorescence and morphological profiles of cell populations from forensic biological samples.
Determining when DNA recovered from a crime scene transferred from its biological source, i.e., a sample's 'time-since-deposition' (TSD), can provide critical context for biological evidence. Yet, there remains no analytical techniques for TSD that are validated for forensic casework. In this study, we investigate whether morphological and autofluorescence measurements of forensically-relevant cell populations generated with Imaging Flow Cytometry (IFC) can be used to predict the TSD of 'touch' or trace biological samples. To this end, three different prediction frameworks for estimating the number of day(s) for TSD were evaluated: the elastic net, gradient boosting machines (GBM), and generalized linear mixed model (GLMM) LASSO. Additionally, we transformed these continuous predictions into a series of binary classifiers to evaluate the potential utility for forensic casework. Results showed that GBM and GLMM-LASSO showed the highest accuracy, with mean absolute error estimates in a hold-out test set of 29 and 21 days, respectively. Binary classifiers for these models correctly binned 94-96% and 98-99% of the age estimates as over/under 7 or 180 days, respectively. This suggests that predicted TSD using IFC measurements coupled to one or, possibly, a combination binary classification decision rules, may provide probative information for trace biological samples encountered during forensic casework
Proportions of properly classified timepoints using a series of binary cutoff values for the hold-out test set and the donor/timepoint set.
Proportions of properly classified timepoints using a series of binary cutoff values for the hold-out test set and the donor/timepoint set.</p
Mean absolute prediction error in the test set.
Determining when DNA recovered from a crime scene transferred from its biological source, i.e., a sample’s ‘time-since-deposition’ (TSD), can provide critical context for biological evidence. Yet, there remains no analytical techniques for TSD that are validated for forensic casework. In this study, we investigate whether morphological and autofluorescence measurements of forensically-relevant cell populations generated with Imaging Flow Cytometry (IFC) can be used to predict the TSD of ‘touch’ or trace biological samples. To this end, three different prediction frameworks for estimating the number of day(s) for TSD were evaluated: the elastic net, gradient boosting machines (GBM), and generalized linear mixed model (GLMM) LASSO. Additionally, we transformed these continuous predictions into a series of binary classifiers to evaluate the potential utility for forensic casework. Results showed that GBM and GLMM-LASSO showed the highest accuracy, with mean absolute error estimates in a hold-out test set of 29 and 21 days, respectively. Binary classifiers for these models correctly binned 94–96% and 98–99% of the age estimates as over/under 7 or 180 days, respectively. This suggests that predicted TSD using IFC measurements coupled to one or, possibly, a combination binary classification decision rules, may provide probative information for trace biological samples encountered during forensic casework.</div
Classifier performance for each hold-out donor/timepoint cell population, using GBM models.
Count: Number of cells that properly classify. Prop: Proportion of cells that properly classify. Total: Total cell count for the donor population. (DOCX)</p
Flowchart of ML analysis framework.
Determining when DNA recovered from a crime scene transferred from its biological source, i.e., a sample’s ‘time-since-deposition’ (TSD), can provide critical context for biological evidence. Yet, there remains no analytical techniques for TSD that are validated for forensic casework. In this study, we investigate whether morphological and autofluorescence measurements of forensically-relevant cell populations generated with Imaging Flow Cytometry (IFC) can be used to predict the TSD of ‘touch’ or trace biological samples. To this end, three different prediction frameworks for estimating the number of day(s) for TSD were evaluated: the elastic net, gradient boosting machines (GBM), and generalized linear mixed model (GLMM) LASSO. Additionally, we transformed these continuous predictions into a series of binary classifiers to evaluate the potential utility for forensic casework. Results showed that GBM and GLMM-LASSO showed the highest accuracy, with mean absolute error estimates in a hold-out test set of 29 and 21 days, respectively. Binary classifiers for these models correctly binned 94–96% and 98–99% of the age estimates as over/under 7 or 180 days, respectively. This suggests that predicted TSD using IFC measurements coupled to one or, possibly, a combination binary classification decision rules, may provide probative information for trace biological samples encountered during forensic casework.</div
Classifier performance for each hold-out donor cell population, using GBM models.
Count: Number of cells that properly classify. Prop: Proportion of cells that properly classify. Total: Total cell count for the donor population. (DOCX)</p
Classifier performance for each hold-out donor cell population, using GLMM models.
Count: Number of cells that properly classify. Prop: Proportion of cells that properly classify. Total: Total cell count for the donor population. (DOCX)</p
Classifier performance for each hold-out donor/timepoint cell population, using GLMM models.
Count: Number of cells that properly classify. Prop: Proportion of cells that properly classify. Total: Total cell count for the donor population. (DOCX)</p
Absolute prediction error scatterplots with density curves for GLMM model.
Median absolute error for various bins shown in red points.</p
Number of observations in the hold-out and training sets for each donor/timepoint.
Samples By Time: Donor/timepoint combination. N Observations: Total number of observations for a given donor/timepoint. N Test: Number of observations in the hold-out test set. N Train: Number of observations in the training set. (DOCX)</p