12 research outputs found

    Inference of Biogeographical Ancestry Under Resource Constraints

    Get PDF
    We study the problem of predicting human biogeographical ancestry using genomic data. While continental level ancestry prediction is relatively simple using genomic information, distinguishing between individuals from closely associated sub-populations (e.g., from the same continent) is still a difficult challenge. In particular, we focus on the case where the analysis is constrained to using single nucleotide polymorphisms (SNPs) from just one chromosome. We thus propose methods to construct ancestry informative SNP panels analyzing variants from a single chromosome, and evaluate the performance of such panels for both continental-level and sub-continental level ancestry prediction.;Efficient selection of ancestry informative SNPs is the key to successful ancestry prediction. The removal of redundant and noisy SNP features is essential prior to applying a learning algorithm. Here we propose two distinct methods of SNP selection: one is correlation-based SNP selection which uses a correlation metric to evaluate the usefulness of SNP features, while the other is random subspace projection based SNP selection which uses the learning algorithm itself to evaluate the worth of the SNP features. Correlation-based SNP selection approach can construct a small panel of useful SNPs for both continental level classification as well as binary classification of sub-populations. Unlike the correlation-based selection, random subspace projection based selection can construct efficient panel of SNP markers to address the difficult task of multinomial classification with multiple closely related sub-populations. We include results that demonstrate the performance of both methods, including comparison with other recently published related methods

    Ancestry-independent osteometric sex estimation from selected postcranial skeletal elements of South Africans: a machine learning approach

    Get PDF
    Sex estimation, as part of a biological profile, has the power to halve the number of possible identities of unidentified skeletal remains. Postcranial elements have been studied in South Africa (SA) for the purpose of sex estimation and have often proven to be more accurate than the cranium. Estimation techniques using postcranial elements in SA almost exclusively utilise discriminant analysis to evaluate sex, but international publications have shown success using alternative machine learning (ML) algorithms. SA methods and standards are often restricted by limited sample size, lack of robust statistical techniques in older publications and, the prerequisite of known or estimated ancestry. Most methods are specific to SA African, European or, more recently, Mixed ancestry groups and are unreliable when ancestry is unknown. The aim of this study was to apply a series of ML algorithms to train ancestry independent sex classification models using postcranial osteometric measurements from the cadaveric skeletal remains of modern South Africans, focussing on long bone joints. The study consisted of a roughly demographically representative, pooled sample, of 650 South Africans (325 male, 325 female). 12 osteometric measurements were taken from available left- and, or right-sided bones for each individual. All 12 mensurations were sexually dimorphic and differences between left- and right-sided bones were negligible. The dataset was subjected to ML algorithm training using univariate and multivariate predictor combinations. The best performing ML algorithm, given the sample size and available predictors was discriminant function analysis. Univariate model accuracies ranged from 80.5-89.1% and multivariate model accuracies ranged from 84.5%, using 2 predictors, to 92.8%, using 12 predictors. An optimised 3-predictor model was able to predict sex with 92.7% accuracy. Results from this study were comparable to those using ancestry-specific models and non-ancestry-specific models, where available. Findings from this study suggest that the inclusion of ancestry, when predicting sex using the elements examined, is not necessary as it does not significantly improve prediction accuracy

    Análisis de datos etnográficos, antropológicos y arqueológicos: una aproximación desde las humanidades digitales y los sistemas complejos

    Get PDF
    La llegada de las Ciencias de la Computación, el Big Data, el Análisis de Datos, el Aprendizaje Automático y la Minería de Datos ha modificado la manera en que se hace ciencia en todos los campos científicos, dando lugar, a su vez, a la aparición de nuevas disciplinas tales como la Mecánica Computacional, la Bioinformática, la Ingeniería de la Salud, las Ciencias Sociales Computacionales, la Economía Computacional, la Arqueología Computacional y las Humanidades Digitales –entre otras. Cabe destacar que todas estas nuevas disciplinas son todavía muy jóvenes y están en continuo crecimiento, por lo que contribuir a su avance y consolidación tiene un gran valor científico. En esta tesis doctoral contribuimos al desarrollo de una nueva línea de investigación dedicada al uso de modelos formales, métodos analíticos y enfoques computacionales para el estudio de las sociedades humanas tanto actuales como del pasado.El Ministerio de Ciencia e Innovación • Proyecto SimulPast – “Transiciones sociales y ambientales: simulando el pasado para entender el comportamiento humano” (CSD2010-00034 CONSOLIDER-INGENIO 2010). • Proyecto CULM – “Modelado del cultivo en la prehistoria” (HAR2016-77672-P). • Red de Excelencia SimPastNet – “Simular el pasado para entender el comportamiento humano” (HAR2017-90883-REDC). • Red de Excelencia SocioComplex – “Sistemas Complejos Socio-Tecnológicos” (RED2018-102518-T). La Consejería de Educación de la Junta de Castilla y León • Subvención a la línea de investigación “Entendiendo el comportamiento humano, una aproximación desde los sistemas complejos y las humanidades digitales” dentro del programa de apoyo a los grupos de investigación reconocidos (GIR) de las universidades públicas de Castilla y León (BDNS 425389
    corecore