44 research outputs found

    XSim: Simulation of Descendants from Ancestors with Sequence Data.

    Get PDF
    Real or imputed high-density SNP genotypes are routinely used for genomic prediction and genome-wide association studies. Many researchers are moving toward the use of actual or imputed next-generation sequence data in whole-genome analyses. Simulation studies are useful to mimic complex scenarios and test different analytical methods. We have developed the software tool XSim to efficiently simulate sequence data in descendants in arbitrary pedigrees. In this software, a strategy to drop-down origins and positions of chromosomal segments rather than every allele state is implemented to simulate sequence data and to accommodate complicated pedigree structures across multiple generations. Both C++ and Julia versions of XSim have been developed

    The Effect of Exposure Duration on Perceived Similarity in Simultaneous Lineups

    Get PDF
    This item is only available electronically.Quantifying reliable and accurate eyewitness identification procedures which avoid wrongful convictions and give confidence to justice systems as to the accuracy of suspect guilt continues to be an area of intense research. Defining the parameters of a 'fair lineup' is central to this endeavour. Measures of similarity between lineup members have been key variables used to accurately describe what is and isn't a 'fair lineup'. To date little research has been done on how the perception of similarity may vary across groups and conditions, particularly as a result of memory encoding strength. This study aimed to understand how exposure time, a key variable for altering the encoding strength of a face, in the context of simultaneous lineups, may alter perceived similarity. Results showed that the observed data fit the Unequal Variance Signal Detection (UVSD) model well, however the predicted increases in discriminability with longer exposure duration and higher lineup similarity were not measured. Similarly no significant changes in perceived similarity were found between any of the conditions. Given observed differences in Hit (CID) and False Alarm (FA) rates between low and high similarity lineups this result suggests that judgements of perceived similarity between faces in a line up are unrelated to participants face familiarity judgements. This supports the independent observations assumption within the maximum likelihood method and indicates that overall a priori categorical classifications of lineups as wholly low or high in similarity are less important to discriminability than participants judgements about each faces familiarity to the memory of the target. This finding has implications for future research into 'fair lineup' design and measurement.Thesis (B.PsychSc(Hons)) -- University of Adelaide, School of Psychology, 201

    Setting the Standard: A Special Focus on Genomic Selection in GENETICS and G3

    Get PDF

    Abductive Equivalential Translation and its application to Natural Language Database Interfacing

    Full text link
    The thesis describes a logical formalization of natural-language database interfacing. We assume the existence of a ``natural language engine'' capable of mediating between surface linguistic string and their representations as ``literal'' logical forms: the focus of interest will be the question of relating ``literal'' logical forms to representations in terms of primitives meaningful to the underlying database engine. We begin by describing the nature of the problem, and show how a variety of interface functionalities can be considered as instances of a type of formal inference task which we call ``Abductive Equivalential Translation'' (AET); functionalities which can be reduced to this form include answering questions, responding to commands, reasoning about the completeness of answers, answering meta-questions of type ``Do you know...'', and generating assertions and questions. In each case, a ``linguistic domain theory'' (LDT) Γ\Gamma and an input formula FF are given, and the goal is to construct a formula with certain properties which is equivalent to FF, given Γ\Gamma and a set of permitted assumptions. If the LDT is of a certain specified type, whose formulas are either conditional equivalences or Horn-clauses, we show that the AET problem can be reduced to a goal-directed inference method. We present an abstract description of this method, and sketch its realization in Prolog. The relationship between AET and several problems previously discussed in the literature is discussed. In particular, we show how AET can provide a simple and elegant solution to the so-called ``Doctor on Board'' problem, and in effect allows a ``relativization'' of the Closed World Assumption. The ideas in the thesis have all been implemented concretely within the SRI CLARE project, using a real projects and payments database. The LDT for the example database is described in detail, and examples of the types of functionality that can be achieved within the example domain are presented.Comment: 162 pages, Latex source, PhD thesis (U Stockholm, 1993). Uses style-file ustockholm_thesis.st

    Simulated Data for Genomic Selection and Genome-Wide Association Studies Using a Combination of Coalescent and Gene Drop Methods

    Get PDF
    An approach is described for simulating data sequence, genotype, and phenotype data to study genomic selection and genome-wide association studies (GWAS). The simulation method, implemented in a software package called AlphaDrop, can be used to simulate genomic data and phenotypes with flexibility in terms of the historical population structure, recent pedigree structure, distribution of quantitative trait loci effects, and with sequence and single nucleotide polymorphism-phased alleles and genotypes. Ten replicates of a representative scenario used to study genomic selection in livestock were generated and have been made publically available. The simulated data sets were structured to encompass a spectrum of additive quantitative trait loci effect distributions, relationship structures, and single nucleotide polymorphism chip densities

    Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods.

    Get PDF
    Incorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful

    Novel Bayesian Networks for Genomic Prediction of Developmental Traits in Biomass Sorghum.

    Get PDF
    The ability to connect genetic information between traits over time allow Bayesian networks to offer a powerful probabilistic framework to construct genomic prediction models. In this study, we phenotyped a diversity panel of 869 biomass sorghum (Sorghum bicolor (L.) Moench) lines, which had been genotyped with 100,435 SNP markers, for plant height (PH) with biweekly measurements from 30 to 120 days after planting (DAP) and for end-of-season dry biomass yield (DBY) in four environments. We evaluated five genomic prediction models: Bayesian network (BN), Pleiotropic Bayesian network (PBN), Dynamic Bayesian network (DBN), multi-trait GBLUP (MTr-GBLUP), and multi-time GBLUP (MTi-GBLUP) models. In fivefold cross-validation, prediction accuracies ranged from 0.46 (PBN) to 0.49 (MTr-GBLUP) for DBY and from 0.47 (DBN, DAP120) to 0.75 (MTi-GBLUP, DAP60) for PH. Forward-chaining cross-validation further improved prediction accuracies of the DBN, MTi-GBLUP and MTr-GBLUP models for PH (training slice: 30-45 DAP) by 36.4-52.4% relative to the BN and PBN models. Coincidence indices (target: biomass, secondary: PH) and a coincidence index based on lines (PH time series) showed that the ranking of lines by PH changed minimally after 45 DAP. These results suggest a two-level indirect selection method for PH at harvest (first-level target trait) and DBY (second-level target trait) could be conducted earlier in the season based on ranking of lines by PH at 45 DAP (secondary trait). With the advance of high-throughput phenotyping technologies, our proposed two-level indirect selection framework could be valuable for enhancing genetic gain per unit of time when selecting on developmental traits

    The Dimensionality of Genomic Information and Its Effect on Genomic Prediction

    Get PDF
    The genomic relationship matrix (GRM) can be inverted by the algorithm for proven and young (APY) based on recursion on a random subset of animals. While a regular inverse has a cubic cost, the cost of the APY inverse can be close to linear. Theory for the APY assumes that the optimal size of the subset (maximizing accuracy of genomic predictions) is due to a limited dimensionality of the GRM, which is a function of the effective population size (N(e)). The objective of this study was to evaluate these assumptions by simulation. Six populations were simulated with approximate effective population size (N(e)) from 20 to 200. Each population consisted of 10 nonoverlapping generations, with 25,000 animals per generation and phenotypes available for generations 1–9. The last 3 generations were fully genotyped assuming genome length L = 30. The GRM was constructed for each population and analyzed for distribution of eigenvalues. Genomic estimated breeding values (GEBV) were computed by single-step GBLUP, using either a direct or an APY inverse of GRM. The sizes of the subset in APY were set to the number of the largest eigenvalues explaining x% of variation (EIGx, x = 90, 95, 98, 99) in GRM. Accuracies of GEBV for the last generation with the APY inverse peaked at EIG98 and were slightly lower with EIG95, EIG99, or the direct inverse. Most information in the GRM is contained in ∼N(e)L largest eigenvalues, with no information beyond 4N(e)L. Genomic predictions with the APY inverse of the GRM are more accurate than by the regular inverse

    Relating Morphology to Syntax

    Get PDF
    B1 - Research Book Chapter

    Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    Get PDF
    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cp. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE
    corecore