A joint likelihood estimator of relatedness and allele frequencies from a small sample of individuals

Abstract

As a key parameter in population genetics, relatedness has found wide applications in molecular ecology, evolutionary biology, conservation, forensics and in studies of human inheritable diseases. It is defined as the probability that two individuals share an allele due to recent common ancestry. Many estimators have been developed to estimate relatedness from genotype data. However, they are invariably biased when a sample is small or contains a high proportion of close relatives, because allele frequencies required for inferring relatedness are poorly estimated in both cases under the impracticable and yet indispensable assumption of a large sample of unrelated genotypes. In this study, I develop a likelihood method to estimate relatedness and allele frequencies jointly from a sample of multilocus genotypes. I propose an expectation maximization (EM) algorithm to update allele frequencies and the nine condensed identical by descent (IBD) coefficients ( ) of each pair of sampled individuals iteratively till convergence. Relatedness between and inbreeding coefficients of individuals is then calculated from the estimated nine IBD coefficients. The EM algorithm is also implemented in the reduced non-inbreeding model ( ) to estimate three condensed IBD coefficients ( ) and relatedness. Using simulated and empirical data, I show that the new method is much less biased and more accurate than previous methods, providing almost unbiased relatedness and inbreeding estimates, when the sampled individuals are few or/and contain many close relatives. The EM algorithm for the likelihood estimator is fast enough to handle a sample with thousands of individuals and millions of markers, thanks to the parallelization using openMP and MPI. The method is implemented in a software package, EMIBD9, that runs on all major computer platforms. This study shows allele frequencies and relatedness, although highly correlated and difficult to disentangle from each other when the only information available is a sample of multilocus genotypes, can be estimated jointly from genotype data of diallelic and multiallelic markers in a likelihood framework. The new method and software are especially useful for analysing small samples (such as ancient samples from museums, or samples from endangered species) and samples with a strong genetic structure

    Similar works