Improving the estimation of genetic distances from Next-Generation Sequencing data

Abstract

Next-Generation Sequencing (NGS) technologies have revolutionized research in evolutionary biology, by increasing the sequencing speed and reducing the experimental costs. However, sequencing errors are higher than in traditional technologies and, furthermore, many studies rely on low-depth sequencing. Under these circumstances, the use of standard methods for inferring genotypes leads to biased estimates of nucleotide variation, which can bias all downstream analyses. Through simulations, we assessed the bias in estimating genetic distances under several different scenarios. The results indicate that naive methods for assigning individual genotypes greatly overestimate genetic distances. We propose a novel method to estimate genetic distances that is suitable for low-depth NGS data and takes genotype call statistical uncertainty into account. We applied this method to investigate the genetic structure of domesticated and wild strains of rice. We implemented this approach in an open-source software and discuss further directions of phylogenetic analyses within this novel probabilistic framework

Similar works

Full text

thumbnail-image

Spiral - Imperial College Digital Repository

redirect

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.