183 research outputs found

    Genetic variation and disease in the Roma (Gypsies)

    Get PDF
    The Roma (Gypsies) are a European people composed of a mosaic of culturally heterogeneous populations. Linguistic analyses point to their origins in the Indian subcontinent. Cultural diversity in extant Romani populations suggests that they are descended from a mixture of Indian populations. Previous population genetic studies of the Roma have supported this claim by demonstrating the genetic heterogeneity of Romani populations. More recently, medical genetic research has detected identical founder mutations in separated Romani populations, which provides evidence of their relatedness. In this thesis, the genetic heritage of the Roma and its significance for genetic disease and research is investigated. Male and female lineages were analysed in eight traditionally endogamous Romani populations. Asian specific Y chromosome haplogroup VI-68 and mitochondrial DNA (mtDNA) haplogroup M were detected in all populations and accounted for 39% and 25% of all lineages respectively. Diversity within haplogroups was assessed by genotyping Y chromosome short tandem repeats (YSTRs) and sequencing the mtDNA hypervariable segment 1 (HVSl). Lineages within haplogroups VI-68 and M were found to be closely related suggesting that Romani populations are predominantly descended from a single Indian ethnic population. The differing historical legacies of Romani populations and adherence to endogamous practices have resulted in genetic substructure and limited diversity within populations. Thus, the Roma are shown to comprise a conglomerate of related admixed population isolates. The unique genetic heritage of the Roma provides a powerful tool for the positional cloning of monogenic disease genes. This is demonstrated through the reduction of the critical chromosomal region for a novel genetic disorder, hereditary motor and sensory neuropathy type Lom (HMSNL). In the initial report, the HMSNL disease locus was defined as a 3cM region on chromosome 8q24. In this study, refined genetic mapping utilising historical and parental recombinations observed in Romani individuals from different populations reduced the HMSNL critical interval to 202kb. Sequence analysis of two genes contained within this genomic interval found all affected individuals to be homozygous for a CT mutation in codon 148 of N-myc downstream regulated gene 1 (NDRGJ), resulting in a truncating Rl48X mutation. Investigation of the population distribution of the R148X disease allele shows that it occurs in six of eight separated Romani populations. Another founder mutation, C283Y in the y-sarcoglycan gene (SGCG), which causes limb girdle muscular dystrophy type 2C (LGMD2C), was found in two of eight Romani populations. Profound founder effects are apparent within Romani populations with a carrier frequency of 19.5% determined for the R148X mutation in the Lom population, and 6.25% for the C283Y allele in the Turgovzi population. High carrier frequencies for autosomal recessive diseases can be expected to pose a significant health risk for these communities. Thus, community-wide carrier testing represents a potential means of addressing this health problem. A pilot community based carrier-testing program was implemented in a Romani community of north eastern Bulgaria and relevant attitudes assessed by means of a questionnaire. Community-based carrier screening was demonstrated to be an appropriate approach to improving health amongst the Roma

    Genotype-Based Test in Mapping Cis-Regulatory Variants from Allele-Specific Expression Data

    Get PDF
    Identifying and understanding the impact of gene regulatory variation is of considerable importance in evolutionary and medical genetics; such variants are thought to be responsible for human-specific adaptation [1] and to have an important role in genetic disease. Regulatory variation in cis is readily detected in individuals showing uneven expression of a transcript from its two allelic copies, an observation referred to as allelic imbalance (AI). Identifying individuals exhibiting AI allows mapping of regulatory DNA regions and the potential to identify the underlying causal genetic variant(s). However, existing mapping methods require knowledge of the haplotypes, which make them sensitive to phasing errors. In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions. The test relies on partitioning genotypes of individuals exhibiting AI and those not expressing AI in a 2×3 contingency table. The performance of this test to detect linkage disequilibrium (LD) between a potential regulatory site and a SNP located in this region was examined by analyzing the simulated and the empirical AI datasets. In simulation experiments, the genotype-based test outperforms the haplotype-based tests with the increasing distance separating the regulatory region from its regulated transcript. The genotype-based test performed equally well with the experimental AI datasets, either from genome–wide cDNA hybridization arrays or from RNA sequencing. By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript

    An ancestral haplotype of the human PERIOD2 gene associates with reduced sensitivity to light-induced melatonin suppression.

    Get PDF
    Humans show various responses to the environmental stimulus in individual levels as "physiological variations." However, it has been unclear if these are caused by genetic variations. In this study, we examined the association between the physiological variation of response to light-stimulus and genetic polymorphisms. We collected physiological data from 43 subjects, including light-induced melatonin suppression, and performed haplotype analyses on the clock genes, PER2 and PER3, exhibiting geographical differentiation of allele frequencies. Among the haplotypes of PER3, no significant difference in light sensitivity was found. However, three common haplotypes of PER2 accounted for more than 96% of the chromosomes in subjects, and 1 of those 3 had a significantly low-sensitive response to light-stimulus (P < 0.05). The homozygote of the low-sensitive PER2 haplotype showed significantly lower percentages of melatonin suppression (P < 0.05), and the heterozygotes of the haplotypes varied their ratios, indicating that the physiological variation for light-sensitivity is evidently related to the PER2 polymorphism. Compared with global haplotype frequencies, the haplotype with a low-sensitive response was more frequent in Africans than in non-Africans, and came to the root in the phylogenetic tree, suggesting that the low light-sensitive haplotype is the ancestral type, whereas the other haplotypes with high sensitivity to light are the derived types. Hence, we speculate that the high light-sensitive haplotypes have spread throughout the world after the Out-of-Africa migration of modern humans

    Ethnic Related Selection for an ADH Class I Variant within East Asia

    Get PDF
    The alcohol dehydrogenases (ADH) are widely studied enzymes and the evolution of the mammalian gene cluster encoding these enzymes is also well studied. Previous studies have shown that the ADH1B*47His allele at one of the seven genes in humans is associated with a decrease in the risk of alcoholism and the core molecular region with this allele has been selected for in some East Asian populations. As the frequency of ADH1B*47His is highest in East Asia, and very low in most of the rest of the world, we have undertaken more detailed investigation in this geographic region.Here we report new data on 30 SNPs in the ADH7 and Class I ADH region in samples of 24 populations from China and Laos. These populations cover a wide geographic region and diverse ethnicities. Combined with our previously published East Asian data for these SNPs in 8 populations, we have typed populations from all of the 6 major linguistic phyla (Altaic including Korean-Japanese and inland Altaic, Sino-Tibetan, Hmong-Mien, Austro-Asiatic, Daic, and Austronesian). The ADH1B genotyping data are strongly related to ethnicity. Only some eastern ethnic phyla or subphyla (Korean-Japanese, Han Chinese, Hmong-Mien, Daic, and Austronesian) have a high frequency of ADH1B*47His. ADH1B haplotype data clustered the populations into linguistic subphyla, and divided the subphyla into eastern and western parts. In the Hmong-Mien and Altaic populations, the extended haplotype homozygosity (EHH) and relative EHH (REHH) tests for the ADH1B core were consistent with selection for the haplotype with derived SNP alleles. In the other ethnic phyla, the core showed only a weak signal of selection at best.The selection distribution is more significantly correlated with the frequency of the derived ADH1B regulatory region polymorphism than the derived amino-acid altering allele ADH1B*47His. Thus, the real focus of selection may be the regulatory region. The obvious ethnicity-related distributions of ADH1B diversities suggest the existence of some culture-related selective forces that have acted on the ADH1B region

    Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases

    Get PDF
    Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology. This article is part of the theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes'. This issue is linked with the subsequent theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control'

    A general and efficient representation of ancestral recombination graphs

    Get PDF
    As a result of recombination, adjacent nucleotides can have different paths of genetic inheritance and therefore the genealogical trees for a sample of DNA sequences vary along the genome. The structure capturing the details of these intricately interwoven paths of inheritance is referred to as an ancestral recombination graph (ARG). Classical formalisms have focused on mapping coalescence and recombination events to the nodes in an ARG. However, this approach is out of step with some modern developments, which do not represent genetic inheritance in terms of these events or explicitly infer them. We present a simple formalism that defines an ARG in terms of specific genomes and their intervals of genetic inheritance, and show how it generalizes these classical treatments and encompasses the outputs of recent methods. We discuss nuances arising from this more general structure, and argue that it forms an appropriate basis for a software standard in this rapidly growing field.</p

    Unveiling the murine t-haplotype’s extent and emergence of diversity in MHC class II genes

    Get PDF
    Genes of the major histocompatibility complex locus that present pathogen peptides to T-Lymphocytes are among the most polymorphic genes in mammals. Within the major histocompatibility complex diversity is under positive selection and new alleles are generated by point mutations and recombination events. In some house mice (Mus musculus) however the major histocompatibility complex (called H-2) is part of the t-haplotype, a meiotic driver located on Chromosome 17 carried by 10 to 40 percent of the natural population. Meiotic drivers are selfish chromosomal arrangements defined by the deviation of Mendelian ratio of inheritance, meaning that they are over-proportionally transmitted to offspring. As such purifying selection on meiotic drivers is reduced and deleterious mutations can accumulate although outside of lethal alleles few non-synonymous mutations are observed. Additionally meiotic drivers often show strong linkage disequilibrium which is the result of reduced recombination between the wildtype chromo ome and the chromosome carrying the genetic driver often caused by structural variations between chromosomes such as inversions. In this thesis, the physical extent of the t-haplotype inversions is resolved using optical mapping at higher resolution than ever done before. Evidence for a fifth inversion in the t-haplotype of Mus musculus domesticus was found. Also the allelic diversity of the MHC class II gene H2Aa in t-haplotype individuals of the subspecies Mus musculus musculus and Mus musculus domesticus was described based on sanger sequenced exons. The degree of diversity found here indicates that recombination between the t-haplotype and the wildtype might play an important role in diversifying this H2Aa exon. Lastly, to uncover de novo meiotic recombination events within the H2Aa gene and the Prdm9 ZnF Array, which determines the placement of meiotic hotspots, Nanopore Sequencing was implemented. To identify the original template of sequenced amplicons the Primer ID as presented by Jabara et al., 2011 as included in the Primers for amplifying gene regions to be sequenced. However due to low coverage with Primer IDs no definite de novo recombination events could be defined, leaving the use of Primer IDs in Nanopore Sequencing up to discussion

    Molecular, bioinformatic and statistical approaches to identify genes underlying complex traits in livestock

    Get PDF
    One of the primary goals for molecular geneticists working with livestock species is to identify and characterize genes underlying complex traits, the so-called quantitative trait loci (QTL). The primary strategy for identifying QTL involves several steps, one being fine mapping of a previously defined chromosomal region and another being identification of candidate genetic polymorphisms that may cause differences in phenotype. The studies presented in this dissertation address fine mapping methodology, use of the candidate gene approach for directly identifying candidate genetic polymorphisms and use of bioinformatic tools for identifying genetic polymorphisms in silico. Results from simulation studies suggest that two linkage disequilibrium-based fine mapping methods, one using haplotype information, the other using single marker information, provide QTL position estimates with comparable accuracy. Additional research is necessary to determine optimal fine mapping methods under experimental research conditions. The candidate gene studies presented, concerning the porcine connexin 37 (CX37) and bone morphogenetic factor 15 (BMP15) genes, highlight use of comparative sequence and biological information for identifying candidate genetic variants. Two synonymous mutations were discovered in the CX37 gene, which was subsequently mapped to SSC6 q24--31. However, these mutations were not significantly associated with fertility traits as hypothesized. Unfortunately, mutations could not be identified in BMP15, which was physically mapped to SSCX p11--13. Bioinformatic tools are shown here to be lucrative for identifying putative single nucleotide polymorphisms (SNPs) from redundant expressed sequence tag (EST) information in the pig. Using computer-derived SNPs, a correlation of 0.77 (p \u3c 0.00001) was found between the frequency of human and porcine SNPs in the coding regions (cSNPs) of 25 genes, while a correlation of 0.48 (p \u3c 0.0005) was found between the frequency of human and mouse cSNPs in 50 genes. This strong human-pig relationship should be verified in a larger sample so that SNP identification in pigs could be expedited by screening porcine genes homologous to human genes known to be SNP-dense in their coding regions. By capitalizing on statistical, bioinformatic and molecular tools in an integrated approach, the rate at which QTL are identified in livestock could be increased
    corecore