6,149 research outputs found

    A Signal Model for Forensic DNA Mixtures

    Get PDF
    For forensic purposes, short tandem repeat allele signals are used as DNA fingerprints. The interpretation of signals measured from samples has traditionally been conducted by applying thresholding. More quantitative approaches have recently been developed, but not for the purposes of identifying an appropriate signal model. By analyzing data from 643 single person samples, we develop such a signal model. Three standard classes of two-parameter distributions, one symmetric (normal) and two right-skewed (gamma and log-normal), were investigated for their ability to adequately describe the data. Our analysis suggests that additive noise is well modeled via the log-normal distribution class and that variability in peak heights is well described by the gamma distribution class. This is a crucial step towards the development of principled techniques for mixed sample signal deconvolution

    My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing

    Get PDF
    Forensic scientists are currently investigating how to transition from capillary electrophoresis (CE) to massive parallel sequencing (MPS) for analysis of forensic DNA profiles. MPS offers several advantages over CE such as virtually unlimited multiplexy of loci, combining both short tandem repeat (STR) and single nucleotide polymorphism (SNP) loci, small amplicons without constraints of size separation, more discrimination power, deep mixture resolution and sample multiplexing. We present our bioinformatic framework My-Forensic-Loci-queries (MyFLq) for analysis of MPS forensic data. For allele calling, the framework uses a MySQL reference allele database with automatically determined regions of interest (ROIs) by a generic maximal flanking algorithm which makes it possible to use any STR or SNP forensic locus. Python scripts were designed to automatically make allele calls starting from raw MPS data. We also present a method to assess the usefulness and overall performance of a forensic locus with respect to MPS, as well as methods to estimate whether an unknown allele, which sequence is not present in the MySQL database, is in fact a new allele or a sequencing error. The MyFLq framework was applied to an Illumina MiSeq dataset of a forensic Illumina amplicon library, generated from multilocus STR polymerase chain reaction (PCR) on both single contributor samples and multiple person DNA mixtures. Although the multilocus PCR was not yet optimized for MPS in terms of amplicon length or locus selection, the results show excellent results for most loci. The results show a high signal-to-noise ratio, correct allele calls, and a low limit of detection for minor DNA contributors in mixed DNA samples. Technically, forensic MPS affords great promise for routine implementation in forensic genomics. The method is also applicable to adjacent disciplines such as mitochondrial DNA research

    Surface-enhanced Raman spectroscopy for the forensic analysis of vaginal fluid

    Full text link
    Vaginal fluid is most often found at crime scenes where a sexual assault has taken place or on clothing or other items collected from sexual assault victims or perpetrators. Because the victim is generally known in these cases, detection of vaginal fluid is not a matter of individual identification, as it might be for semen identification. Instead, linkages can be made between victim and suspect if the sexual assault was carried out digitally or with a foreign object (e.g., bottle, pool cue, cigarette, handle of a hammer or other tool, etc.). If such an object is only analyzed for DNA and the victim is identified, the suspect may claim that the victim’s DNA is present because she handled and/or is the owner of the object and not because it was used to sexually assault her; identification of vaginal fluid residue would alleviate such uncertainty. Most of the research conducted thus far regarding methods for the identification of vaginal fluid involves mRNA biomarkers and identification of various bacterial strains.1-3 However, these approaches require extensive sample preparation and laboratory analysis and have not fully explored the genomic differences among all body fluid RNAs. No existing methods of vaginal fluid identification incorporate both high specificity and rapid analysis.4 Therefore, a new rapid detection method is required. Surface-enhanced Raman spectroscopy (SERS) is an emerging technique with high sensitivity for the forensic analysis of various body fluids. This technique has the potential to improve current vaginal fluid identification techniques due to its ease-of-use, rapid analysis time, portability, and non-destructive nature. For this experiment, all vaginal fluid samples were collected from anonymous donors by saturation of a cotton swab via vaginal insertion. Samples were analyzed on gold nanoparticle chips.4 This nanostructured metal substrate is essential for the large signal-enhancement effect of SERS and also quenches any background fluorescence that sometimes interferes with normal Raman spectroscopy measurements.5 Vaginal fluid SERS signal variation of a single sample over a six-month period was evaluated under both ambient and frozen storage conditions. Vaginal fluid samples were also taken from 10 individuals over the course of a single menstrual cycle. Four samples collected at one-week intervals were obtained from each individual and analyzed using SERS. The SERS vaginal fluid signals showed very little variation as a function of time and storage conditions, indicating that the spectral pattern of vaginal fluid is not likely to change over time. The samples analyzed over the span of one menstrual cycle showed slight intra-donor differences, however, the overall spectral patterns remained consistent and reproducible. When cycle spectra were compared between individuals, very little donor-to-donor variation was observed indicating the potential for a universal vaginal fluid signature spectrum. A cross-validated, partial least squares – discriminant analysis (PLS-DA) model was built to classify all body fluids, where vaginal fluid was identified with 95.0% sensitivity and 96.6% specificity, which indicates that the spectral pattern of vaginal fluid was successfully distinguished from semen and blood. Thus, SERS has a high potential for application in the field of forensic science for vaginal fluid analysis

    A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs

    Full text link
    Analysis of DNA samples is an important step in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis. The use of single nucleotide polymorphisms (SNPs) has utility for analysis of complex DNA mixtures. The use of tens of thousands of SNPs loci for analysis poses significant computational challenges because the forensic analysis scales by the product of the loci count and number of DNA samples to be analyzed. In this paper, we discuss the implementation of a DNA sequence comparison algorithm by re-casting the algorithm in terms of linear algebra primitives. By developing an overloaded matrix multiplication approach to DNA comparisons, we can leverage advances in GPU hardware and algoithms for Dense Generalized Matrix-Multiply (DGEMM) to speed up DNA sample comparisons. We show that it is possible to compare 2048 unknown DNA samples with 20 million known samples in under 6 seconds using a NVIDIA K80 GPU.Comment: Accepted for publication at the 2017 IEEE High Performance Extreme Computing conferenc

    Analysis of heat-induced DNA damage during PCR and verification, validation and comparative analysis of two PCR megaplexes

    Full text link
    Biological evidence collected at crime scenes are often subjected to forensic deoxyribonucleic acid (DNA) testing. During forensic DNA testing the DNA from the evidence and known samples are extracted, purified, amplified using Polymerase Chain Reaction (PCR), and analyzed using capillary electrophoresis (CE). In order to appropriately compare the profile of the suspect to the evidence, it is essential that interpretation parameters and optimized processing schemes are established. This study endeavors to accomplish this by: first, evaluating whether the PCR temperature cycling is detrimental to the amplification process; and second, by establishing and comparing interpretation parameters for two commonly employed short tandem repeat (STR) megaplexes. To evaluate the effects of temperature cycling on downstream signal, a dynamic systems model was developed, validated, and used to test the effects of temperature on DNA damage and the subsequent fluorescence signal. Though DNA is generally thought to be a stable molecule, heat-induced damage does occur. Specifically, this model assesses the damage to the guanine and cytosine bases during temperature cycling. The model conducts the amplification of a single locus during PCR and generates the peak height observed after capillary electrophoresis. The model was designed to assess not only the effects of heat-induced DNA damage but to also incorporate variability in PCR efficiency. The simulated data indicate that heat-induced DNA damage does not significantly reduce the allelic signal. Also, although changes in PCR efficiency introduce variability in the peak heights at all targets, the peak heights observed with and without heat-induced DNA damage are not significantly different. In fact, the variation in PCR efficiency has a larger effect on the number of amplicons produced than does the heat-induced DNA damage. The second part of this study compares two PCR amplification megaplexes, PowerPlex® Fusion and GlobalFiler®, by evaluating their sensitivities, limits of detection, presence of artifacts, heterozygous peak balance, and ability to amplify minor contributors in DNA mixtures. Analysis of single source samples using weighted least squares regression analysis indicates that PowerPlex® Fusion has greater analytical sensitivities and lower limits of detection at comparable dye channels, and both kits display similar heterozygous balance. However, the GlobalFiler® processing scheme produced fewer artifacts for the various single source samples analyzed, particularly at higher target amounts. Also, analysis of two and three person DNA mixtures indicates that both megaplexes perform equally well when detection of the minor contributor is the criterion

    Effect of multiple allelic drop-outs in forensic RMNE calculations

    Get PDF
    Technological advances such as massively parallel sequencing enable increasing amounts of genetic information to be obtained from increasingly challenging samples. Certainly on low template, degraded and multi-contributor samples, drop-outs will increase in number for many profiles simply by analyzing more loci, making it difficult to probabilistically assess how many drop-outs have occurred and at which loci they might have occurred. Previously we developed a Random Man Not Excluded (RMNE) method that can take into account allelic drop-out while avoiding detailed estimations of the probability that drop-outs have occurred, nor making assumptions about at which loci these drop-outs might have occurred. The number of alleles that have dropped out, does not need to be exactly known. Here we report a generic Python algorithm to calculate the RMNE probabilities for any given number of loci. The number of allowed drop-outs can be set between 0 and twice the number of analyzed loci. The source code has been made available on https://github.com/fvnieuwe/rmne. An online web-based RMNE calculation tool has been made available on http://forensic.ugent.be/rmne. The tool can calculate these RMNE probabilities from a custom list of probabilities of the observed and non-observed alleles from any given number of loci. Using this tool, we explored the effect of allowing allelic drop-outs on the evidential value of random forensic profiles with a varying number of loci. Our results give insight into how the number of allowed drop-outs affects the evidential value of a profile and how drop-out can be managed in the RMNE approach

    The examination of baseline noise and the impact on the interpretation of low-template DNA samples

    Full text link
    It is common practice for DNA STR profiles to be analyzed using an analytical threshold (AT), but as more low template DNA (LT-DNA) samples are tested it has become evident that these thresholds do not adequately separate signal from noise. In order to confidently examine LT-DNA samples, the behavior and characteristics of the background noise of STR profiles must be better understood. Thus, the background noise of single source LT-DNA STR profiles were examined to characterize the noise distribution and determine how it changes with DNA template mass and injection time. Current noise models typically assume the noise is independent of fragment size but, given the tendency of the baseline noise to increase with template amount, it is important to establish whether the baseline noise is randomly found throughout the capillary electrophoresis (CE) run or whether it is situated in specific regions of the electropherogram. While it has been shown that the baseline noise of negative samples does not behave similarly to the baseline noise of profiles generated using optimal levels of DNA, the ATs determined using negative samples have shown to be similar to those developed with near-zero, low template mass samples. The distinction between low-template samples, where the noise is consistent regardless of target mass, and standard samples could be made at approximately 0.063 ng for samples amplified using the Identifiler^TM Plus amplification kit (29 cycle protocol), and injected for 5 and 10 seconds. At amplification target masses greater than 0.063 ng, the average noise peak height increased and began to plateau between 0.5 and 1.0 ng for samples injected for 5 and 10 seconds. To examine the time dependent nature of the baseline noise, the baselines of over 400 profiles were combined onto one axis for each target mass and each injection time. Areas of reproducibly higher noise peak heights were identified as areas of potential non-specific amplified product. When the samples were injected for five seconds, the baseline noise did not appear to be time dependent. However, when the samples were injected for either 10 or 20 seconds, there were three areas that exhibited an increase in noise; these areas were identified at 118 bases in green, 231 bases in yellow, and 106 bases in red. If a probabilistic analysis or AT is to be employed for DNA interpretation, consideration must be given as to how the validation or calibration samples are prepared. Ideally the validation data should include all the variation seen within typical samples. To this end, a study was performed to examine possible sources of variation in the baseline noise within the electropherogram. Specifically, three samples were prepared at seven target masses using four different kit lots, four capillary lots, in four amplification batches or four injection batches. The distribution of the noise peak heights in the blue and green channels for samples with variable capillary lots, amplifications, and injections were similar, but the distribution of the noise heights for samples with variable kit lots was shifted. This shift in the distribution of the samples with variable kit lots was due to the average peak height of the individual kit lots varying by approximately two. The yellow and red channels showed a general agreement between the distributions of the samples run with variable kit lots, amplifications, and injections, but the samples run with various capillary lots had a distribution shifted to the left. When the distribution of the noise height for each capillary was examined, the average peak height variation was less than two RFU between capillary lots. Use of a probabilistic method requires an accurate description of the distribution of the baseline noise. Three distributions were tested: Gaussian, log-normal, and Poisson. The Poisson distribution did not approximate the noise distributions well. The log-normal distribution was a better approximation than the Gaussian resulting in a smaller sum of the residuals squared. It was also shown that the distributions impacted the probability that a peak was noise; though how significant of an impact this difference makes on the final probability of an entire STR profile was not determined and may be of interest for future studies

    Characterizing double-back stutter in low to multi-copy number regimes in forensically relevant STR loci

    Full text link
    Modern DNA analysis is possible due to the discovery of repeating microsatellite regions in DNA and successful implementation of the polymerase chain reaction (PCR) in laboratories. PCR amplification chemistries that contain short tandem repeat (STR) loci are sensitive. As a result, the discrimination power within human identification sciences has increased in recent years. Despite these advances, cellular admixtures are commonly collected, and the resultant “DNA mixture profile” is difficult to interpret as it is often encumbered by low-signals and allele drop-out. Regularly detected PCR artifacts can further complicate interpretation. One commonly encountered artifact is stutter, the result of strand slippage during PCR. Stutter can be of two types: forward and reverse. Reverse stutter (or back stutter) is the most prevalent and is one repeat unit shorter (n - 1) than the template strand. In contrast, forward stutter is one repeat unit longer (n + 1). If a reverse stutter amplicon is produced there is the distinct possibility that a stutter product of stutter may occur. This artifact is usually referred to as double-back stutter (DBS) or n - 2 stutter. Recently there has been renewed interest in examining signal approaching baseline levels. As the sensitivity of the process improves, so does the probability of detecting DBS. Therefore, studies that examine the peak height distributions, rarity, stutter signal-to-noise distances and the general impact of DBS on the signal are warranted. Models simulating PCR, and the entire forensic DNA process, have been created by this laboratory. The work presented herein builds upon a preexisting model; specifically, the dynamic model was extended such that DNA profiles consisting of 21 autosomal STRs, consistent with the GlobalFilerTM multiplex, are simulated. Furthermore, this expansion incorporated a three-type Galton-Watson branching process allowing DBS to be added to the simulated electropherogram (EPG). The in silico model was used to simulate the amplification of a 1:43 and 1:73 mixture at a total DNA concentration of 0.3 and 0.5 ng, respectively. We chose these extreme mixture ratios because the signal from these minor contributors would be most susceptible to DBS effects from the major contributor. A total of 1200 alleles from each contributor were simulated at each target, and effects of DBS on the signal from the minor contributor were characterized. At 0.3 and 0.5 ng both the noise and stutter signal histograms are right-skewed and a Kolmogorov-Smirnov (KS) test indicates that the noise and DBS were significantly different (p-value < 4x10-6). The average peak height of DBS for all loci in both scenarios were less than 50 RFU (Relative Fluorescence Units), and the DBS ratios ranged from 0.29 to 2.15% of the main allele, with the median ratios less than 0.5%. A per locus analytical threshold (AT) was calculated for both the 0.3 and 0.5 ng targets using two k-values: 3 and 4. The k-value is chosen based on the Type I risk assessment, wherein increasing the k-value increases AT. The percentage of DBS peaks greater than AT when k = 3 for the mixtures amplified at 0.3 and 0.5 ng ranged from 0 to 7.08% and 0 to 10.50%, respectively. Interestingly, when k = 4 the percentage of DBS peaks greater than AT for 0.3 and 0.5 ng reduced to 0 to 1.08% and 0 to 0.17%, respectively. This suggests that modeling DBS in continuous systems may not be necessary if the laboratory continues to rely on a system that requires an AT of sufficient strength. However, with the advent of Bayesian or machine learning-based approaches to analyzing EPGs, thus removing AT in its entirety, a complete understanding of the prevalence of DBS is necessary. This work shows that DBS from an extreme major using our laboratory protocols is not likely to be in the same signal regime as the signal from alleles; however, it does show that signal from DBS is significantly different from noise. Therefore, the software expert pair should be carefully considered during the validation stage and laboratories should consider DBS during interpretation, especially if enhanced post-PCR parameters are implemented into the forensic laboratory process
    • …
    corecore