Méthodes de partitionnements pour détecter des structures fines de population et applications au projet POPGEN

Abstract

International audienceIntroduction: To identify genetic risk factors for multifactorial disease, it is essential to compare the genomes of patients with those of genetically similar healthy individuals. It is therefore crucial to understand the genetic structure of the overall population. One important way of gaining such understanding is by applying clustering methods whose aim is to identify groups of individuals based on their genomes.Methods: In this context, we present a comparative analysis of various clustering approaches, with a focus on hierarchical methods such as fineSTRUCTURE, model-based clustering approaches such as Mclust, and aggregation-based clustering techniques. We also investigate the impact of different similarity measures obtained through haplotype-sharing methods on clustering outcomes.Results: We enhance previous comparative studies by evaluating clustering methods in the context of fine-scale population structure by simulating data that aligns with the observed population structure in French populations. This approach enables us to gauge the robustness and accuracy of various methods using simulated datasets. Additionally, we apply these methods to real data from POPGEN, a project encompassing the entire metropolitan territory of France and aggregating precise genetic and geographical information from over 9,772 volunteers. We investigate how the genetic clusters observed in POPGEN correspond to the fine-scale geography within different regions of France.Conclusion: Our study serves to demonstrate the performance of different clustering approaches on both simulated and real datasets, offering insights to help choose the most suitable clustering methods for identifying fine-scale population structure.Funding: This work is funded by the French Ministry of Research for the POPGEN project in the framework of the French initiative for genomic medicine (Plan France Médecine Génomique 2025; PFMG 2025; https://pfmg2025.aviesan.fr). The CONSTANCES cohort benefits from grant ANR-11INBS-0002 from the French National Research Agency

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 22/02/2025