Integrations and introgressions: a viral and genetic history of the human genome

Abstract

The human genome combines archaic relics which have been obtained through two biological vehicles: viral infection and interbreeding. In this thesis I explore detection methods that search for the products of these biological processes, starting with some of the oldest fossils in our genome: viruses. I examine the most recently active endogenous retrovirus (ERV) family in humans: human endogenous retrovirus-K (HERV-K), human mouse mammary virus-like 2 (HML2), also known as HK2. I collaboratively developed and benchmarked the software STEAK for detecting mobile element and retroviral integrations in high-throughput sequencing data. Using STEAK on 70 whole genome sequences (WGS), I show that all polymorphic HK2 integrations found in Europeans are also present in African populations and estimate that HK2 activity stopped prior to the emergence of modern humans. The second half of this thesis investigates the outstanding question, did archaic and modern humans admix in Africa? Non-African populations today carry 1-5% of Neanderthal and/or Denisovan admixture. However, it remains unclear whether archaic introgression is present in African populations. I use WGS simulations to assess a current archaic reference-free method, S*, under different demographic models varying in divergence dates, admixture dates, and admixture proportion. I show that in the absence of an archaic reference and an obscure demographic history, S* results must be interpreted with caution given its sensitivity to demographic processes. Using S* and Sprime on African WGS, I find that putatively introgressed regions detected by both methods are notably long and old haplotypes. I show with D-statistics that African populations such as the San share excess derived alleles with Denisovans but not with Neanderthals. These analyses collectively support a model where interbreeding took place between an ancestral human population and other archaic hominins within Africa. Overall, this thesis elucidates the relationship between the human genome and the archaic sequences it hosts

    Similar works