A protocol for the identification of ancestry informative markers (AIMs) from
genome-wide single nucleotide polymorphism (SNP) data is proposed. The protocol
consists of three main steps: (a) identification of potential positive
selection regions via Fst extremity measurement, (b) SNP screening via
two-stage attribute selection and (c) classification model construction using a
naive Bayes classifier. The two-stage attribute selection is composed of a
newly developed round robin symmetrical uncertainty ranking technique and a
wrapper embedded with a naive Bayes classifier. The protocol has been applied
to the HapMap Phase II data. Two AIM panels, which consist of 10 and 16 SNPs
that lead to complete classification between CEU, CHB, JPT and YRI populations,
are identified. Moreover, the panels are at least four times smaller than those
reported in previous studies. The results suggest that the protocol could be
useful in a scenario involving a larger number of populations.Comment: 24 pages, 4 figure