23 research outputs found

    Evaluation of a prior-incorporated statistical model and established classifiers for externally visible characteristics prediction

    Get PDF
    Human identification through DNA has played an important role in forensic science and in the criminal justice system for decades. It is referring to the association of genetic data with a particular human being and has facilitated police investigations in cases such as the identification of suspected perpetrators from biological traces found at crime scenes, missing persons, or victims of mass disasters [1]. Currently there are two main methods developed: the genotyping through short tandem repeats (STR profiling) and the forensic DNA phenotyping (FDP). Despite the fact that these two methods are aiming in identifying a person through its genetic material, their approach and consequences that come up are completely different. STR profiling compares allele repeats at specific loci in DNA and aims at a match with already known to the police authorities DNA profiles, while FDP, which is the focus on the current study, aims in the prediction of appearance traits of an individual [2, 3]. In contrast with STR profiling, information that arise out of FDP cannot be used as sole evidence in the court [4]. The ability of predicting EVCs from DNA can be used as ‘biological witnesses’ that can only provide leads for the investigative authorities and subsequently narrow down a possible large set of potential suspects. The use of FDP begins a new era of ‘DNA intelligence’ and holds great promise especially in cases where individuals cannot be identified with the conventional method of STR profiling and also in cases where there is no additional knowledge on the sample donor. So far in FDP, traits such as eye, hair and skin color can be predicted reliably with high prediction accuracy and predictive models have already been forensically validated [5-7]. Regarding other appearance traits, the current lack of knowledge on the genetic markers responsible for their phenotypic variation and the lower predictability, especially of intermediate categories, has prevented FDP from being routinely implemented in the field of forensic science. The majority of the predictive models developed for appearance trait prediction were based on multinomial logistic regression (MLR) while only few used other methods such as decision trees and neural networks. Machine learning (ML) approaches have become a widely used tool for classification problems in several fields and they are known for their potential to boost model performance and their ability to handle different and complex types of data [8]. However, within the context of predicting EVCs, a systematic and comparative analysis among different ML approaches that could possibly indicate methods that outperform the standard MLR, has not been conducted so far. In addition, incorporation of priors in the EVC prediction models that may have potential to improve the already existing approaches, has not been investigated in the context of forensics yet. These priors indicate the trait category prevalence values among biogeographic ancestry groups, and their use would allow us to leverage Bayesian statistics in order to build more powerful prediction models. In our case, incorporation of such priors in the model could reflect the additional information from all yet unknown causal genetic factors and act as proxies in the prediction model. Therefore, those two approaches were conducted throughout my PhD project in order to improve the already existing approaches of FDP which was the main aim of my study. In the first study, I aimed to collect a comprehensive data set from previously published sources on the spatial distribution of different appearance traits. I conducted a literature review in order to assemble this information, which later on could be incorporated as priors in the EVCs prediction models. Due to the lack of available and reliable sources, our resulting data set contained only eye and hair color for mostly European countries. More specifically, I collected data on eye color from 16 European and Central Asian countries, while for hair color I collected data from seven European countries. For countries outside of Europe, where the variation is low, it was not possible to assemble trustworthy and population-representative data. Afterwards, I calculated the association of those two traits and obtained a moderate association between them. Interpolation techniques were applied in order to infer trait prevalence values in at least neighboring countries. Resulting prevalences and interpolated values were presented in spatial maps. The subject of the second study was to incorporate the trait prevalence values as priors in the prediction model. However, due to the lack of reliable data that was observed in the first study, the incorporation of the actual priors that would give us the actual insight of their impact in the EVC prediction was not feasible with the current existing knowledge and the available data. Therefore, I assessed the impact of priors across a grid that contained all possible values that priors can take, for a set of appearance traits including eye, hair, skin color, hair structure, and freckles. In this way, I aimed to assess potential pitfalls caused by misspecification of priors. Results were compared and evaluated with the corresponding prior-free' previously established prediction models. The effect of priors was demonstrated in the standard performance measurements, including area under curve (AUC) and overall accuracy. I found out that from all possible prior values, there is a proportion that shows potential in improving the prediction accuracy. However, possible misspecification of priors can significantly diminish the overall accuracy. Based on that, I emphasize the importance of accurate prior values in the prediction modelling in order to identify the actual impact. As a consequence of the above, the use of prior informed models in forensics is currently infeasible and more studies on the topic are necessary in order to extend the current knowledge on spatial trait prevalence. Finally, the focus of the third study was exploring and comparing the performances of methodologies beyond MLR. MLR is considered the standard method for predicting EVCs, since the majority of the predictive models developed are based on that method. Due to the fact that there is still potential for improvement of MLR models, especially for traits such as skin color or hair structure, I aimed at applying different ML methods in order to identify whether there is a potential classifier that outperforms the conventional method of MLR. Therefore I conducted a systematic comparison between MLR and three alternative ML classifiers, namely support vector machines (SVM), random forests (RF) and artificial neural networks (ANN). The traits that I focused on here were eye, hair, and skin color. All models were based on the genetic markers that were previously established in IrisPlex, HIrisPlex and HIrisPlex-S [5-7]. Overall, I observed that all four classifiers performed almost equally well, especially for eye color. Only non-substantial differences were obtained across the different traits and across trait categories. Given this outcome, none of the ML methods applied here performed better than MLR, at least for the three traits of eye, hair, and skin color. Ultimately, due to the easier interpretability of the MLR, it is suggested at least for now and for the currently known marker sets, that the use of MLR is the most appropriate method for predicting appearance traits from DNA. Throughout my PhD project, it became apparent that the available knowledge on spatial trait prevalence values was quite restricted not only in certain appearance traits but also in continental groups. More specifically, most available and reliable data were focused on European populations and the traits that were available were mostly for eye and hair color. For other traits, such as skin color, hair structure, and freckles, the data were either extremely few or nonexistent. This was a significant obstacle throughout the project, since it prevented me from applying and testing the actual impact of the accurate trait prevalence values as priors in EVC prediction. However, the lack of data presented an opportunity to perform in-depth theoretical research, in particular testing the impact of priors within a spatial grid that included its possible values. I found out that there is a proportion of priors that showed potential to improve EVC prediction. However, caution is advised regarding misspecification of priors that can significantly deteriorate the models' performance. Furthermore, the application of different ML approaches did not show any significant improvement on the prediction performance against the standard MLR. This could be due to the nature of the traits, since some of them are multifactorial and affected by various external independent factors or due to possible limitations of the currently known predictive markers. With the available knowledge so far, it is emphasized throughout this study that for the time being, priors are refrained from being incorporated in the EVC prediction models while from the different classifiers applied, MLR is considered as the most appropriate method for EVC prediction due to its easier interpretability. In addition, the presented study highlights the importance of reference data on externally visible traits and the identification of more genetic markers that contribute to certain traits and I hope that the present work will motivate the emergence of these certain types of data collections that potentially may improve the current EVC prediction models

    Testing the impact of trait prevalence priors in Bayesian-based genetic prediction modeling of human appearance traits

    Get PDF
    The prediction of appearance traits by use of solely genetic information has become an established approach and a number of statistical prediction models have already been developed for this purpose. However, given limited knowledge on appearance genetics, currently available models are incomplete and do not include all causal genetic variants as predictors. Therefore such prediction models may benefit from the inclusion of additional information that acts as a proxy for this unknown genetic background. Use of priors, possibly informed by trait category prevalence values in biogeographic ancestry groups, in a Bayesian framework may thus improve the prediction accuracy of previously predicted externally visible characteristics, but has not been investigated as of yet. In this study, we assessed the impact of using trait prevalence-informed priors on the prediction p

    Development and evaluations of the ancestry informative markers of the VISAGE Enhanced Tool for Appearance and Ancestry

    Get PDF
    The VISAGE Enhanced Tool for Appearance and Ancestry (ET) has been designed to combine markers for the prediction of bio-geographical ancestry plus a range of externally visible characteristics into a single massively parallel sequencing (MPS) assay. We describe the development of the ancestry panel markers used in ET, and the enhanced analyses they provide compared to previous MPS-based forensic ancestry assays. As well as established autosomal single nucleotide polymorphisms (SNPs) that differentiate sub-Saharan African, European, East Asian, South Asian, Native American, and Oceanian populations, ET includes autosomal SNPs able to efficiently differentiate populations from Middle East regions [...]The study was supported by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 740580 within the framework of the VISible Attributes through GEnomics (VISAGE) Project and Consortium. M.d.l.P. is supported by a post-doctorate grant funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (ED481D-2021–008). J.R. is supported by the “Programa de axudas á etapa predoutoral” funded by the Consellería de Cultura, Educación e Ordenación Universitaria e da Consellería de Economía, Emprego e Industria from Xunta de Galicia, Spain (ED481A-2020/039). C.P., A.F.A., A.M.M., M.d.l.P., M.V.L. and the work to compile ancestry informative tri-allelic SNPs and microhaplotypes are supported by MAPA, ‘Multiple Allele Polymorphism Analysis’ (BIO2016–78525-R), a research project funded by the Spanish Research State Agency (AEI) and co-financed with ERDF funds. The population studies by S.O. at University of Santiago de Compostela, were financed by the Fundação de Apoio a Pesquisa do Distrito Federal (FAPDF), BrazilS

    Physical condition and pulmonary function of asthmatic children 10-15 years after diagnosis

    No full text
    The natural history of childhood asthma has not been fully clarified. This is the first time that a study on the outcome of asthma has been taking place in Greece. The aim of the study was the evaluation of the outcome of asthma in the young adult life and the identification of possible risk factors that may be associated with the prognosis. The subjects of the study were 148 adults with a history of asthma that were followed in the Respiratory Outpatient Clinics of the of the 3rd Paediatric Department, Hippokration General Hospital, Aristotle University of Thessaloniki, who had their first visit during the period 1987- 1992. Methods: The files of all patients were retrospectively reviewed. All the former patients were contacted by telephone and a focused questionnaire was completed on asthma symptoms and related atopic diseases during the previous year, as well as symptoms of asthma at the ages of 10 and 16 years. All the subjects were invited to the Respiratory Laboratory for a complete investigation of their physical condition with spirometry, exercise challenge test, metacholine challenge test and skin prick tests. Of the original population 78 subjects (54 males and 24 females) with a mean age of 27 ± 2.7 years ccompleted the investigation. The mean time that elapsed between the first visit in the Outpatient Department, at the age of 6 years or above and the reinvestigation was 19.5 years (from 16 to 23 years). Results-Conclusions: Although two thirds (2/3) of our sample continued to have asthma symptoms later in life, the majority (85%) had rather mild symptoms. The mean age of remission in the asymptomatic subjects was 15 years. Relapse after a period of remission occurred in 1/3 of the patients. The quality of life was satisfactory and it was negatively related to the severity of asthma. However, there was poor medical follow-up in the majority of subjects and high percentage of smoking. Asthma severity in childhood was predictive of pulmonary function in adulthood. The persistence of symptoms to adolescence was an unfavorable predictive factor for the outcome of asthma. Bronchial hyperreactivity demonstrated by 1/3 of the subjects. There was significant association between bronchial hyperreactivity and current asthma severity as well as pulmonary function. Two thirds of the adults had atopic symptoms in the reexamination, whereas the skin prick test were positive in 82% of the subjects. The number of positive skin prick tests was higher in the adults with allergic rhinitis. There was no significant association between atopy in childhood or in adulthood and the persistence or the severity of current symptoms. Atopy in adulthood was an unfavorable factor for the presence of bronchial hyperreactivity.Η φυσική πορεία του παιδικού άσθματος δεν έχει διερευνηθεί πλήρως. Μελέτη πρόγνωσης του παιδικού άσθματος και μάλιστα με αντικειμενικές μεθόδους προσδιορισμού της πνευμονικής λειτουργίας, διενεργείται για πρώτη φορά στην Ελλάδα. Σκοπός της διατριβής ήταν η εκτίμηση της έκβασης του παιδικού άσθματος στη νεαρή ενήλικη ζωή στην Ελλάδα & η διερεύνηση παραγόντων που μπορεί να σχετίζονται με την πρόγνωση. Το υλικό της μελέτης αποτέλεσαν 148 ενήλικες με ιστορικό βρογχικού άσθματος που παρακολουθούνταν στο τακτικό αναπνευστικό ιατρείο της Γ’ Παιδιατρικής Κλινικής, ΑΠΘ, στο Ιπποκράτειο Γενικό Νοσοκομείο Θεσσαλονίκης, με πρώτη επίσκεψη κατά την πενταετία 1987-1992. Μέθοδος: Έγινε ανασκόπηση και καταγραφή στοιχείων από τα ιστορικά των ασθενών. Πραγματοποιήθηκε τηλεφωνική επικοινωνία με όλους τους ενήλικες και συμπληρώθηκε στοχευμένο ερωτηματολόγιο που αφορούσε συμπτώματα άσθματος και συνοδές αλλεργικές παθήσεις κατά τον προηγούμενο χρόνο, καθώς και συμπτωματολογία άσθματος στις ηλικίες των 10 και 16 χρόνων. Όλα τα άτομα κλήθηκαν να προσέλθουν στο Αναπνευστικό Εργαστήριο για ολοκληρωμένη διερεύνηση της φυσικής τους κατάστασης με σπιρομέτρηση, δοκιμασία κόπωσης, δοκιμασία μεταχολίνης και δερματικές δοκιμασίες νυγμού. Από τον αρχικό πληθυσμό 78 άτομα (54 άνδρες και 24 γυναίκες) με μέση ηλικία 27.4 ± 2.7 έτη ολοκλήρωσαν τον έλεγχο. Το χρονικό διάστημα που μεσολάβησε από την πρώτη τους επίσκεψη στα ειδικά ιατρεία, σε ηλικία μεγαλύτερη των 6 χρόνων, μέχρι τον επανέλεγχο στην ενηλικίωση ήταν κατά μέσο όρο 19.5 χρόνια (διακύμανση από 16 έως 23 χρόνια). Αποτελέσματα- Συμπεράσματα: Παρότι τα δύο τρίτα (2/3) των ενηλίκων συνέχιζαν να έχουν συμπτώματα άσθματος, η πλειοψηφία (85%) είχε μάλλον ήπια συμπτώματα. Η μέση ηλικία υποχώρησης των συμπτωμάτων ήταν τα 15 χρόνια. Υποτροπή συμπτωμάτων μετά από αρχική υποχώρηση παρουσίαζε το 1/3 των ενηλίκων στον επανέλεγχο. Η ποιότητα ζωής των ατόμων ήταν ικανοποιητική και είχε ισχυρή αρνητική συσχέτιση με τη βαρύτητα του άσθματος. Υπήρχε ωστόσο πτωχή ιατρική παρακολούθηση στην πλειοψηφία των ατόμων και υψηλά ποσοστά καπνίσματος. Η βαρύτητα του παιδικού άσθματος ήταν προγνωστικός δείκτης για την πνευμονική λειτουργία στην ενήλικη ζωή. Η παραμονή του άσθματος μέχρι την εφηβεία ήταν επιβαρυντικός παράγοντας για την έκβαση του άσθματος. Βρογχική υπεραντιδραστικότητα επέδειξε το 1/3 των ατόμων στον επανέλεγχο. Σημαντική συσχέτιση διαπιστώθηκε ανάμεσα στη βρογχική υπεραντιδραστικότητα και την τωρινή βαρύτητα του άσθματος, αλλά και την πνευμονική λειτουργία. Συμπτώματα ατοπίας στην ενήλικη ζωή είχαν τα 2/3 των ατόμων που εξετάσθηκαν, ενώ οι δερματικές δοκιμασίες νυγμού ήταν θετικές σε ποσοστό 82% των ατόμων. Ο αριθμός των θετικών δερματικών δοκιμασιών ήταν αυξημένος στους ενήλικες με αλλεργική ρινίτιδα. Δε διαπιστώθηκε σημαντική συσχέτιση ανάμεσα στην ατοπία στην παιδική ηλικία ή κατά την ενηλικίωση και την βαρύτητα ή την παραμονή των συμπτωμάτων στην ενήλικη ζωή. Η ατοπία στην ενήλικη ζωή ήταν επιβαρυντικός παράγοντας βρογχικής υπεραντιδραστικότητας

    Immune response of myelin pepticle analogues conjugated to mannan or emulsified in complete freund's adjuvant CFA in amimal models for multiple sclerosis

    No full text
    Multiple sclerosis (MS) is a commonly occurring chronic, inflammatory and disabling disorder of the central nervous system (CNS). It is widely considered that CD4+ T helper type 1 (Th1) cells play a pivotal role in mediating an autoimmune attack against components of myelin sheath. Additional cells, such as CD8+ T cells, macrophages and complement are also involved in axonal damage and neurodegeneration. Several autoantigens, such as myelin basic protein (MBP), proteolipid protein (PLP) and myelin oligodendrocyte glycoprotein (MOG) have been proposed as candidate antigens in the induction of MS based on auto-T cells and auto-antibodies which are present in patients with MS. The design of peptide mutants of disease-associated myelin epitopes to alter immune responses from Th1 to Th2 offers a promising avenue for the treatment of MS. Hence, a number of peptides were designed and synthesised by mutating principal TCR contact residues based on MBP87-99, MBP83-99 and PLP139-151 peptide epitopes in their linear or cyclic forms. Peptides were either emulsified in equal volumes of complete Freund’s adjuvant (CFA) and phosphate buffer saline (PBS) or conjugated to reduced mannan via a KLH linker, in order to examine their cytokine and antibody profile. Mannan was previously found to generate either a Th1 response (IL-2, IFN-γ,? IL-12, TNF-α and IgG2a antibodies) or Th2 response (not IFN-γ or IL-12, but significant amounts of IL-4, IL-10 and TGF-β and IgG1 antibodies) depending on the mode of conjugation, oxidised or reduced mannan, respectively. IFN-γ, IL-4 and IL-10 or antagonism experiments were conducted using a capture ELISpot method, proliferation assays were performed to evaluate the regulation of the peptides in vitro and ELISA was performed to assess antibody responses in vivo. We noted that the use of CFA with either MBP87-99 or MBP83-99 mutant peptide analogues, in general, for immunisation, induced higher levels of IFN-γ and lower levels of IL-4, when mutant analogues were used. However, very high levels of IL-4 and IL-10 and no IFN-γ were secreted by T cells, when mice were immunised with reduced mannan peptide conjugates. Antibody responses to native peptide, mutant peptides, linear and cyclic peptides and to whole MBP protein, in addition to, T cell proliferation and EAE experiments were also been assessed. Overall, the linear [Y91]MBP83-99 peptide conjugated to reduced mannan showed the best cytokine and antibody profile and could antagonise T cell responses in vitro, thus, gives promise for the immunotherapy of MS and needs to be pursued for further testing in human studies. For the first time, structural alignment of existing crystal structures revealed the peptide binding motif of H2 I-As (MHC class II). Molecular modelling was used to identify novel H-bonds and van der Waals interactions between peptides and MΗC (I-As). Finally, mannosylation of PLP139-151 peptide could protect mice from EAE in a prophylactic vaccination setting.Η σκλήρυνση κατά πλάκας (ΣΚΠ) είναι μια χρόνια, φλεγμονώδης και εκφυλιστική νόσος του κεντρικού νευρικού συστήματος. Πιστεύεται ότι οφείλεται σε εξειδικευμένα CD4+ Τ λεμφοκύτταρα (Th1) τα οποία παίζουν σημαντικό ρόλο στην επίθεση κατά της μυελινικής θήκης. Η νόσος χαρακτηρίζεται επίσης από τοπικά CD8+ Τ λεμφοκύτταρα, διηθημένα μακροφάγα, περιαγγειακή φλεγμονή, εστίες απομυελίνωσης και απώλειες νευρολογικής λειτουργίας. Διάφορα αυτο-αντιγόνα, όπως η βασική πρωτεΐνη της μυελίνης (MBP), η πρωτεολιπιδική πρωτεΐνη της μυελίνης (PLP) και η μυελινική γλυκοπρωτεΐνη των ολιγοδενδριτών (MOG), έχουν προταθεί ως υποψήφια αντιγόνα για την πρόκληση της ΣΚΠ, αφού T κύτταρα ειδικά για MBP, PLP ή MOG και αυτοαντισώματα έχουν ανιχνευθεί σε αίμα ή εγκεφαλονωτιαίο υγρό ασθενών. Πολλές μελέτες έχουν σχεδιάσει και χρησιμοποιήσει τροποποιημένα πεπτιδικά ανάλογα (APL) της μυελινικής θήκης με στόχο να αλλάξουν την ανοσολογική απόκριση από Th1 σε Th2. Επομένως, σχεδιάστηκαν και συνετέθησαν διάφορα πεπτιδικά ανάλογα των επιτόπων MBP87-99, MBP83-99 και PLP139-151 (γραμμικά και κυκλικά), αλλάζοντας κύριες θέσεις πρόσδεσης με τον TCR. Σε αυτή τη μελέτη, τα πεπτιδικά ανάλογα αναμείχθηκαν με ανοσοενισχυτικό CFA ή συζεύχθηκαν με ανηγμένη μαννάνη μέσω φορέα KLH, με στόχο τη μελέτη κυτταρικής και χυμικής απόκρισης. Η μαννάνη ανάλογα αν είναι στην οξειδωμένη ή στην ανηγμένη μορφή της, οδηγεί σε έκκριση διαφορετικών κυτταροκινών. Συγκεκριμένα, αντιγόνα συζευγμένα με οξειδωμένη ή ανηγμένη μαννάνη προάγουν Th1 (IL-2, IFN-γ, IL-12, ΤΝF-α, IgG2α αντισώματα) ή Th2 (IL-4, IL-10, TGF-β, IgG1 αντισώματα) ανοσοαπόκριση, αντίστοιχα. Η έκκριση IFN-γ, IL-4, IL-10 ή πειράματα ανταγωνισμού μελετήθηκαν με πείραμα ELISpot, πειράματα Τ κυτταρικού πολλαπλασιασμού μελέτησαν τη δράση των πεπτιδίων in vitro και χρησιμοποιήθηκε μέθοδο ELISA για την μέτρηση αντισωμάτων έναντι των πεπτιδίων. Η χρήση CFA για ανοσοποίηση των τροποποιημένων πεπτιδικών αναλόγων προκάλεσε έκκριση υψηλότερων επιπέδων IFN-γ και μεσαία ή χαμηλά επίπεδα IL-4. Σε αντίθεση, τροποποιημένα πεπτίδια συζευγμένα σε ανηγμένη μαννάνη προκάλεσαν παραγωγή υψηλών επιπέδων IL-4, IL-10 και καθόλου IFN-γ, αλλάζοντας την ανοσολογική απόκριση από Th1 σε Th2. Επίσης, μελετήθηκαν χυμικές αποκρίσεις έναντι των αγωνιστών, των τροποποιημένων πεπτιδικών αναλόγων (γραμμικών και κυκλικών), έναντι ολόκληρης της MBP πρωτεΐνης, Τ κυτταρικός πολλαπλασιασμός των πεπτιδικών αναλόγων και η αποτελεσματικότητα τους σε πειράματα ΕΑΕ. Το γραμμικό [Υ91]MBP83-99 πεπτιδικό ανάλογο συζευγμένο με ανηγμένη μαννάνη έδειξε την καλύτερη κυτταρική, χυμική ανοσοαπόκριση και μπόρεσε να ανταγωνιστεί Τ κυτταρικές αποκρίσεις in vitro και φαίνεται να αποτελεί το καλύτερο ανάλογο ως υποψήφιο για την ανοσοθεραπεία της ΣΚΠ και θα μελετηθεί περαιτέρω. Ο μοριακός σχεδιασμός των τροποποιημένων πεπτιδικών αναλόγων βοήθησε στην καλύτερη κατανόηση των δεσμών υδρογόνου και των αλληλεπιδράσεων van der Waals ανάμεσα στα πεπτίδια με το MHC τάξης ΙΙ (I-As). Τέλος, ανοσοποίηση με το PLP139-151 πεπτιδικό ανάλογο συζευγμένο με οξειδωμένη ή ανηγμένη μαννάνη προστάτευσε τα ζώα από την ΕΑΕ, χρησιμοποιώντας προφυλακτικό πείραμα ανοσοσοποίηση

    True colors: A literature review on the spatial distribution of eye and hair pigmentation

    No full text
    DNA-based prediction of externally visible characteristics has become an established approach in forensic genetics, with the aim of tracing individuals who are potentially unknown to the investigating authorities but without using this prediction as evidence in court. While a number of prediction models have been proposed, use of prior probabilities in those models has largely been absent. Here, we aim at compiling information on the spatial distribution of eye and hair coloration in order to use this as prior knowledge to improve prediction accuracy. To this end, we conducted a detailed literature review and created maps showing the eye and hair pigmentation prevalence both by countries with available information and by interpolation in order to obtain prior estimates for populations without available data. Furthermore, we assessed the association between these two traits in a very large data set. A strong limitation was the quite low amount of available data, especially outside Europe. We hope that our results will facilitate the improvement of already existing and of novel prediction methods for pigmentation traits and induce further studies on the spatial distribution of these traits

    Evaluation of supervised machine-learning methods for predicting appearance traits from DNA

    Get PDF
    The prediction of human externally visible characteristics (EVCs) based solely on DNA information has become an established approach in forensic and anthropological genetics in recent years. While for a large set of EVCs, predictive models have already been established using multinomial logistic regression (MLR), the prediction performances of other possible classification methods have not been thoroughly investigated thus far. Motivated by the question to identify a potential classifier that outperforms these specific trait models, we conducted a systematic comparison between the widely used MLR and three popular machine learning (ML) classifiers, namely support vector machines (SVM), random forest (RF) and artificial neural networks (ANN), that have shown good performance outside EVC prediction. As examples, we used eye, hair and skin color categories as phenotypes and genotypes based on the previously established IrisPlex, HIrisPlex, and HIrisPlex-S DNA markers. We compared and assessed the performances of each of the four methods, complemented by detailed hyperparameter tuning that was applied to some of the methods in order to maximize their performance. Overall, we observed that all four classification methods showed rather similar performance, with no method being substantially superior to the others for any of the traits, although performances varied slightly across the different traits and more so across the trait categories. Hence, based on our findings, none of the ML methods applied here provide any advantage on appearance prediction, at least when it comes to the categorical pigmentation traits and the selected DNA markers used here

    Testing the impact of trait prevalence priors in Bayesian-based genetic prediction modeling of human appearance traits

    Get PDF
    The prediction of appearance traits by use of solely genetic information has become an established approach and a number of statistical prediction models have already been developed for this purpose. However, given limited knowledge on appearance genetics, currently available models are incomplete and do not include all causal genetic variants as predictors. Therefore such prediction models may benefit from the inclusion of additional information that acts as a proxy for this unknown genetic background. Use of priors, possibly informed by trait category prevalence values in biogeographic ancestry groups, in a Bayesian framework may thus improve the prediction accuracy of previously predicted externally visible characteristics, but has not been investigated as of yet. In this study, we assessed the impact of using trait prevalence-informed priors on the prediction performance in Bayesian models for eye, hair and skin color as well as hair structure and freckles in comparison to the respective prior-free models. Those prior-free models were either similarly defined either very close to the already established ones by using a reduced predictive marker set. However, these differences in the number of the predictive markers should not affect significantly our main outcomes. We observed that such priors often had a strong effect on the prediction performance, but to varying degrees between different traits and also different trait categories, with some categories barely showing an effect. While we found potential for improving the prediction accuracy of many of the appearance trait categories tested by using priors, our analyses also showed that misspecification of those prior values often severely diminished the accuracy compared to the respective prior-free approach. This emphasizes the importance of accurate specification of prevalence-informed priors in Bayesian prediction modeling of appearance traits. However, the existing literature knowledge on spatial prevalence is sparse for most appearance traits, including those investigated here. Due to the limitations in appearance trait prevalence knowledge, our results render the use of trait prevalence-informed priors in DNA-based appearance trait prediction currently infeasible
    corecore