7 research outputs found

    Epigenetic age estimation in saliva and in buccal cells

    Get PDF
    Age estimation based on epigenetic markers is a DNA intelligence tool with the potential to provide relevant information for criminal investigations, as well as to improve the inference of age-dependent physical characteristics such as male pattern baldness or hair color. Age prediction models have been developed based on different tissues, including saliva and buccal cells, which show different methylation patterns as they are composed of different cell populations. On many occasions in a criminal investigation, the origin of a sample or the proportion of tissues is not known with certainty, for example the provenance of cigarette butts, so use of combined models can provide lower prediction errors. In the present study, two tissue-specific and seven age-correlated CpG sites were selected from publicly available data from the Illumina HumanMethylation 450 BeadChip and bibliographic searches, to help build a tissue-dependent, and an age-prediction model, respectively. For the development of both models, a total of 184 samples (N = 91 saliva and N = 93 buccal cells) ranging from 21 to 86 years old were used. Validation of the models was performed using either k-fold cross-validation and an additional set of 184 samples (N = 93 saliva and N = 91 buccal cells, 21–86 years old). The tissue prediction model was developed using two CpG sites (HUNK and RUNX1) based on logistic regression that produced a correct classification rate for saliva and buccal swab samples of 88.59 % for the training set, and 83.69 % for the testing set. Despite these high success rates, a combined age prediction model was developed covering both saliva and buccal cells, using seven CpG sites (cg10501210, LHFPL4, ELOVL2, PDE4C, HOXC4, OTUD7A and EDARADD) based on multivariate quantile regression giving a median absolute error (MAE): ± 3.54 years and a correct classification rate ( %CP±PI) of 76.08 % for the training set, and an MAE of ± 3.66 years and a %CP±PI of 71.19 % for the testing set. The addition of tissue-of origin as a co-variate to the model was assessed, but no improvement was detected in age predictions. Finally, considering the limitations usually faced by forensic DNA analyses, the robustness of the model and the minimum recommended amount of input DNA for bisulfite conversion were evaluated, considering up to 10 ng of genomic DNA for reproducible results. The final multivariate quantile regression age predictor based on the models we developed has been placed in the open-access Snipper forensic classification websiteThis project was funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (Modalidade B, ED481B 2018/010) by a postdoctorate grant awarded to AFA. MVL is supported by the Ministerio de Educación, Cultura y Ciencia, Spain (PID2019-107876RB-I00).M.d.l.P. is supported by a post-doctorate grant funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (ED481D-2021-008). J.R. is supported by the “Programa de axudas á etapa predoutoral” funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (ED481A-2020/039)S

    Development and evaluations of the ancestry informative markers of the VISAGE Enhanced Tool for Appearance and Ancestry

    Get PDF
    The VISAGE Enhanced Tool for Appearance and Ancestry (ET) has been designed to combine markers for the prediction of bio-geographical ancestry plus a range of externally visible characteristics into a single massively parallel sequencing (MPS) assay. We describe the development of the ancestry panel markers used in ET, and the enhanced analyses they provide compared to previous MPS-based forensic ancestry assays. As well as established autosomal single nucleotide polymorphisms (SNPs) that differentiate sub-Saharan African, European, East Asian, South Asian, Native American, and Oceanian populations, ET includes autosomal SNPs able to efficiently differentiate populations from Middle East regions [...]The study was supported by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 740580 within the framework of the VISible Attributes through GEnomics (VISAGE) Project and Consortium. M.d.l.P. is supported by a post-doctorate grant funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (ED481D-2021–008). J.R. is supported by the “Programa de axudas á etapa predoutoral” funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (ED481A-2020/039). C.P., A.F.A., A.M.M., M.d.l.P., M.V.L. and the work to compile ancestry informative tri-allelic SNPs and microhaplotypes are supported by MAPA, ‘Multiple Allele Polymorphism Analysis’ (BIO2016–78525-R), a research project funded by the Spanish Research State Agency (AEI) and co-financed with ERDF funds. The population studies by S.O. at University of Santiago de Compostela, were financed by the Fundação de Apoio a Pesquisa do Distrito Federal (FAPDF), BrazilS

    Eurasiaplex-2: Shifting the focus to SNPs with high population specificity increases the power of forensic ancestry marker sets

    No full text
    To compile a new South Asian-informative panel of forensic ancestry SNPs, we changed the strategy for selecting the most powerful markers for this purpose by targeting polymorphisms with near absolute specificity – when the South Asian-informative allele identified is absent from all other populations or present at frequencies below 0.001 (one in a thousand). More than 120 candidate SNPs were identified from 1000 Genomes datasets satisfying an allele frequency screen of ≥ 0.1 (10 % or more) allele frequency in South Asians, and ≤ 0.001 (0.1 % or less) in African, East Asian, and European populations. From the candidate pool of markers, a final panel of 36 SNPs, widely distributed across most autosomes, were selected that had allele frequencies in the five 1000 Genomes South Asian populations ranging from 0.4 to 0.15. Slightly lower average allele frequencies, but consistent patterns of informativeness were observed in gnomAD South Asian datasets used to validate the 1000 Genomes variant annotations. We named the panel of 36 South Asian-specific SNPs Eurasiaplex-2, and the informativeness of the panel was evaluated by compiling worldwide population data from 4097 samples in four genome variation databases that largely complement the global sampling of 1000 Genomes. Consistent patterns of allele frequency distribution, which were specific to South Asia, were observed in all populations in, or closely sited to, the Indian sub-continent. Pakistani populations from the HGDP-CEPH panel had markedly lower allele frequencies, highlighting the need to develop a statistical system to evaluate the ancestry inference value of counting the number of population-specific alleles present in an individualM.d.l.P. is supported by a post-doctorate grant funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (ED481D-2021-008). J.R. is supported by the “Programa de axudas á etapa predoutoral” funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (ED481A-2020-039)S

    A common epigenetic clock from childhood to old age

    No full text
    Forensic age estimation is a DNA intelligence tool that forms an important part of Forensic DNA Phenotyping. Criminal cases with no suspects or with unsuccessful matches in searches on DNA databases; human identification analyses in mass disasters; anthropological studies or legal disputes; all benefit from age estimation to gain investigative leads. Several age prediction models have been developed to date based on DNA methylation. Although different DNA methylation technologies as well as diverse statistical methods have been proposed, most of them are based on blood samples and mainly restricted to adult age ranges. In the current study, we present an extended age prediction model based on 895 evenly distributed Spanish DNA blood samples from 2 to 104 years old. DNA methylation levels were detected using Agena Bioscience EpiTYPER® technology for a total of seven CpG sites located at seven genomic regions: ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429 (GRCh38). The accuracy of the age prediction system was tested by comparing three statistical methods: quantile regression (QR), quantile regression neural network (QRNN) and quantile regression support vector machine (QRSVM). The most accurate predictions were obtained when using QRNN or QRSVM (mean absolute prediction error, MAE of ± 3.36 and ± 3.41, respectively). Validation of the models with an independent Spanish testing set (N = 152) provided similar accuracies for both methods (MAE: ± 3.32 and ± 3.45, respectively). The main advantage of using quantile regression statistical tools lies in obtaining age-dependent prediction intervals, fitting the error to the estimated age. An additional analysis of dimensionality reduction shows a direct correlation of increased error and a reduction of correct classifications as the training sample size is reduced. Results indicated that a minimum sample size of six samples per year-of-age covered by the training set is recommended to efficiently capture the most inter-individual variabilityAFA was supported by a post-doctorate grant funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (Modalidade B, ED481B 2018/010). The National DNA Bank Carlos III is supported by ISCIII, Ministry of Science and Innovation, Spain (PT13/0001/0037, PT13/0010/0067): The Murcia Twin Registry is supported by the Seneca Foundation, Regional Agency for Science and Technology, Murcia, Spain (15302/PHCS/10) and Ministry of Science and Innovation, Spain (PSI11560–2009). We particularly wish to gratefully acknowledge the sample volunteers and the BioBank IBSP-CV (PT13/0010/0064) integrated in the Spanish National Biobanks Network and Valencian Biobanking Network for their collaborationS

    Development and Evaluation of the Ancestry Informative Marker Panel of the VISAGE Basic Tool

    Get PDF
    We detail the development of the ancestry informative single nucleotide polymorphisms (SNPs) panel forming part of the VISAGE Basic Tool (BT), which combines 41 appearance predictive SNPs and 112 ancestry predictive SNPs (three SNPs shared between sets) in one massively parallel sequencing (MPS) multiplex, whereas blood-based age analysis using methylation markers is run in a parallel MPS analysis pipeline. The selection of SNPs for the BT ancestry panel focused on established forensic markers that already have a proven track record of good sequencing performance in MPS, and the overall SNP multiplex scale closely matched that of existing forensic MPS assays. SNPs were chosen to differentiate individuals from the five main continental population groups of Africa, Europe, East Asia, America, and Oceania, extended to include differentiation of individuals from South Asia. From analysis of 1000 Genomes and HGDP-CEPH samples from these six population groups, the BT ancestry panel was shown to have no classification error using the Bayes likelihood calculators of the Snipper online analysis portal. The differentiation power of the component ancestry SNPs of BT was balanced as far as possible to avoid bias in the estimation of co-ancestry proportions in individuals with admixed backgrounds. The balancing process led to very similar cumulative population-specific divergence values for Africa, Europe, America, and Oceania, with East Asia being slightly below average, and South Asia an outlier from the other groups. Comparisons were made of the African, European, and Native American estimated co-ancestry proportions in the six admixed 1000 Genomes populations, using the BT ancestry panel SNPs and 572,000 Affymetrix Human Origins array SNPs. Very similar co-ancestry proportions were observed down to a minimum value of 10%, below which, low-level co-ancestry was not always reliably detected by BT SNPs. The Snipper analysis portal provides a comprehensive population dataset for the BT ancestry panel SNPs, comprising a 520-sample standardised reference dataset; 3445 additional samples from 1000 Genomes, HGDP-CEPH, Simons Foundation and Estonian Biocentre genome diversity projects; and 167 samples of six populations from in-house genotyping of individuals from Middle East, North and East African regions complementing those of the sampling regimes of the other diversity projects

    A collaborative exercise on DNA methylation-based age prediction and body fluid typing

    No full text
    DNA methylation has become one of the most useful biomarkers for age prediction and body fluid identification in the forensic field. Therefore, several assays have been developed to detect age-associated and body fluid-specific DNA methylation changes. Among the many methods developed, SNaPshot-based assays should be particularly useful in forensic laboratories, as they permit multiplex analysis and use the same capillary electrophoresis instrumentation as STR analysis. However, technical validation of any developed assays is crucial for their proper integration into routine forensic workflow. In the present collaborative exercise, two SNaPshot multiplex assays for age prediction and a SNaPshot multiplex for body fluid identification were tested in twelve laboratories. The experimental set-up of the exercise was designed to reflect the entire workflow of SNaPshot-based methylation analysis and involved four increasingly complex tasks designed to detect potential factors influencing methylation measurements. The results of body fluid identification from each laboratory provided sufficient information to determine appropriate age prediction methods in subsequent analysis. In age prediction, systematic measurement differences resulting from the type of genetic analyzer used were identified as the biggest cause of DNA methylation variation between laboratories. Also, the use of a buffer that ensures a high ratio of specific to non-specific primer binding resulted in changes in DNA methylation measurement, especially when using degenerate primers in the PCR reaction. In addition, high input volumes of bisulfite-converted DNA often caused PCR failure, presumably due to carry-over of PCR inhibitors from the bisulfite conversion reaction. The proficiency of the analysts and experimental conditions for efficient SNaPshot reactions were also important for consistent DNA methylation measurement. Several bisulfite conversion kits were used for this study, but differences resulting from the use of any specific kit were not clearly discerned. Even when different experimental settings were used in each laboratory, a positive outcome of the study was a mean absolute age prediction error amongst participant's data of only 2.7 years for semen, 5.0 years for blood and 3.8 years for saliva
    corecore