506 research outputs found

    Biological Role and Disease Impact of Copy Number Variation in Complex Disease

    Get PDF
    In the human genome, DNA variants give rise to a variety of complex phenotypes. Ranging from single base mutations to copy number variations (CNVs), many of these variants are neutral in selection and disease etiology, making difficult the detection of true common or rare frequency disease-causing mutations. However, allele frequency comparisons in cases, controls, and families may reveal disease associations. Single nucleotide polymorphism (SNP) arrays and exome sequencing are popular assays for genome-wide variant identification. To limit bias between samples, uniform testing is crucial, including standardized platform versions and sample processing. Bases occupy single points while copy variants occupy segments. Bases are bi-allelic while copies are multi-allelic. One genome also encodes many different cell types. In this study, we investigate how CNV impacts different cell types, including heart, brain and blood cells, all of which serve as models of complex disease. Here, we describe ParseCNV, a systematic algorithm specifically developed as a part of this project to perform more accurate disease associations using SNP arrays or exome sequencing-generated CNV calls with quality tracking of variants, contributing to each significant overlap signal. Red flags of variant quality, genomic region, and overlap profile are assessed in a continuous score and shown to correlate over 90% with independent verification methods. We compared these data with our large internal cohort of 68,000 subjects, with carefully mapped CNVs, which gave a robust rare variant frequency in unaffected populations. In these investigations, we uncovered a number of loci in which CNVs are significantly enriched in non-coding RNA (ncRNA), Online Mendelian Inheritance in Man (OMIM), and genome-wide association study (GWAS) regions, impacting complex disease. By evaluating thoroughly the variant frequencies in pediatric individuals, we subsequently compared these frequencies in geriatric individuals to gain insight of these variants\u27 impact on lifespan. Longevity-associated CNVs enriched in pediatric patients were found to aggregate in alternative splicing genes. Congenital heart disease is the most common birth defect and cause of infant mortality. When comparing congenital heart disease families, with cases and controls genotyped both on SNP arrays and exome sequencing, we uncovered significant and confident loci that provide insight into the molecular basis of disease. Neurodevelopmental disease affects the quality of life and cognitive potential of many children. In the neurodevelopmental and psychiatric diseases, CACNA, GRM, CNTN, and SLIT gene families show multiple significant signals impacting a large number of developmental and psychiatric disease traits, with the potential of informing therapeutic decision-making. Through new tool development and analysis of large disease cohorts genotyped on a variety of assays, I have uncovered an important biological role and disease impact of CNV in complex disease

    Data analysis methods for copy number discovery and interpretation

    Get PDF
    Copy number variation (CNV) is an important type of genetic variation that can give rise to a wide variety of phenotypic traits. Differences in copy number are thought to play major roles in processes that involve dosage sensitive genes, providing beneficial, deleterious or neutral modifications to individual phenotypes. Copy number analysis has long been a standard in clinical cytogenetic laboratories. Gene deletions and duplications can often be linked with genetic Syndromes such as: the 7q11.23 deletion of Williams-­‐Bueren Syndrome, the 22q11 deletion of DiGeorge syndrome and the 17q11.2 duplication of Potocki-­‐Lupski syndrome. Interestingly, copy number based genomic disorders often display reciprocal deletion / duplication syndromes, with the latter frequently exhibiting milder symptoms. Moreover, the study of chromosomal imbalances plays a key role in cancer research. The datasets used for the development of analysis methods during this project are generated as part of the cutting-­‐edge translational project, Deciphering Developmental Disorders (DDD). This project, the DDD, is the first of its kind and will directly apply state of the art technologies, in the form of ultra-­‐high resolution microarray and next generation sequencing (NGS), to real-­‐time genetic clinical practice. It is collaboration between the Wellcome Trust Sanger Institute (WTSI) and the National Health Service (NHS) involving the 24 regional genetic services across the UK and Ireland. Although the application of DNA microarrays for the detection of CNVs is well established, individual change point detection algorithms often display variable performances. The definition of an optimal set of parameters for achieving a certain level of performance is rarely straightforward, especially where data qualities vary ... [cont.]

    Association mapping of genomic microdeletions and common susceptibility variants predisposing to genetic generalized epilepsies

    Get PDF
    Approximately 3% of the general population is affected by epilepsy during lifetime, making epilepsy one of the most common neurological diseases. Genetic generalized epilepsies (GGE) are the most common of genetic epilepsies and account for 20-30% of all epilepsies. GGE is subdivided into genetically determined subgroups with gradual transition, including genetic absence epilepsies (GAE), juvenile myoclonic epilepsy (JME), and epilepsy with generalized tonic-clonic seizures (EGTCS). In spite of a high heritability rate of 80% and a predominant genetic etiology, the genetic factors predisposing to GGE are still mostly unknown. In the present study, we carried out association studies to investigate whether genomic microdeletions and common susceptibility variants increase risk for GGE. To test the common disease/common variant hypothesis, genome-wide association studies (GWAS) were performed in several GGE cohorts using case-control and family-based study designs. For analysis, all patients were either pooled or stratified according to the subgroup they belong to in order to detect common or subgroupspecific risk factors, respectively. The GWAS comprised a case-control cohort of 1,523 European GGE patients and 2,454 German controls and a sample cohort of 566 European parent-offspring trios. Meta-GWAS analyses revealed significant association (P < 5.0 × 10-8) with GGE at 2p16.1 (rs35577149, meta-analysis P = 1.65E-08, OR[C] = 0.78, 95% CI 0.71 - 0.86). Significant association with JME was detected at 1q43 (rs12059546, meta-analysis P = 2.27E-08, OR[G] = 1.53, 95% CI 1.33 - 1.78). Suggestive evidence for association (P < 1.0E-05) was found for GGE at 8q12.2 (rs6999304, meta-analysis P= 1.77E-06, OR[G] = 1.33, 95% CI 1.17 - 1.51) and for GAE at 2q22.3 (rs75917352, meta-analysis P = 1.41E-07, OR[T] = 0.67, 95% CI 0.58 - 0.79). The associated regions harbor high-ranking candidate genes: CHRM3 at 1q43, VRK2 at 2p16.1, and ZEB2 at 2q22.3. Further replication efforts are necessary to elucidate whether these positional candidate genes contribute to the heritability of the common GGE syndromes. Exploring the rare variant/common disease hypothesis, we investigated the impact of six recurrent microdeletions on the genetic risk of GGE at the genomic hotspot regions 1q21.1, 15q11.2, 15q13.3, 16p11.2, 16p13.11, and 22q11.2, which had been implicated as rare genetic risk factors in a wide range of neurodevelopmental disorders. Recurrent microdeletions were assessed in 1,497 European GGE patients, 5,374 controls, and 566 GGE trios using high-resolution SNP microarrays. Considering all six microdeletion hot spots together, we found a significant excess of these microdeletions in 2,563 GGE patients versus 5,940 controls (P < 2.20E-16, OR = 7.65, 95% CI 4.59 - 13.18). Individually, significant associations with GGE were observed for the microdeletions at 15q11.2 (P = 1.12E-4, OR = 3.59, 95% CI 1.80 - 7.25), 15q13.3 (P = 5.48× 10−9) and 16p13.11 (P = 4.42E-06, OR = 17.39, 95% CI 3.86 - 159.88). In a candidate-gene approach, we tested whether exon-disrupting/removing microdeletions in the genes encoding NRXN1 and RBFOX1 confer susceptibility for GGE. We found a significant association with GGE at both loci (NRXN1: P = 0.0049; RBFOX1: P = 0.0083). However, high phenotypic variability and incomplete penetrance, resulting in apparently imperfect segregation, indicate that partial NRXN1 and RBFOX1 deletions represent susceptibility factors rather than highly penetrant mutations. The present study substantiates a role of both genomic microdeletions and common susceptibility variants in the genetic predisposition of common GGE syndromes. We strengthened the statistical evidence for associations of genetic variants at 1q43, 2p16.1, and 2q23.2 with GGE syndromes and identified a novel susceptibility locus at 8q12.2. Although individually rare, the associations of all microdeletions at 15q11.2, 15q13.3, 16p13.3, NRXN1, and RBFOX1 taken together contribute significantly to the genetic variance of GGE

    CNV analysis in Chinese children of mental retardation highlights a sex differentiation in parental contribution to de novo and inherited mutational burdens

    Get PDF
    Rare copy number variations (CNVs) are a known genetic etiology in neurodevelopmental disorders (NDD). Comprehensive CNV analysis was performed in 287 Chinese children with mental retardation and/or development delay (MR/DD) and their unaffected parents. When compared with 5,866 ancestry-matched controls, 11~12% more MR/DD children carried rare and large CNVs. The increased CNV burden in MR/DD was predominantly due to de novo CNVs, the majority of which (62%) arose in the paternal germline. We observed a 2~3 fold increase of large CNV burden in the mothers of affected children. By implementing an evidence-based review approach, pathogenic structural variants were identified in 14.3% patients and 2.4% parents, respectively. Pathogenic CNVs in parents were all carried by mothers. The maternal transmission bias of deleterious CNVs was further replicated in a published dataset. Our study confirms the pathogenic role of rare CNVs in MR/DD, and provides additional evidence to evaluate the dosage sensitivity of some candidate genes. It also supports a population model of MR/DD that spontaneous mutations in males’ germline are major contributor to the de novo mutational burden in offspring, with higher penetrance in male than female; unaffected carriers of causative mutations, mostly females, then contribute to the inherited mutational burden.published_or_final_versio

    Copy number variations in the gene space of Picea glauca

    Get PDF
    Les variations de nombre de copies (VNCs) sont des variations génétiques de grande taille qui ont été détectées parmi les individus de tous les organismes multicellulaires examinés à ce jour. Ces variations ont un impact considérable sur la structure et la fonction des gènes et ont été impliquées dans le contrôle de différents traits phénotypiques. Chez les plantes, les caractéristiques génétiques des VNCs sont encore peu caractérisées et les connaissances concernant les VNCs sont encore plus limitées chez les espèces arborescentes. Les objectifs principaux de cette thèse consistaient i) au développement d’une approche pour la détection de VNCs dans l’espace génique de conifères arborescents appartenant à l’espèce P. glauca, ii) à l’estimation du taux de mutation des VNCs à l’échelle du génome et iii) à l’examen des profils de transmission des VNCs d’une génération à la suivante. Nous avons utilisé des données brutes de génotypage par puces de SNPs qui ont été générées pour 3663 individus appartenant à 55 familles biparentales, et avons examiné plus de 14 000 gènes pour identifier des VNCs. Nos résultats montrent que les VNCs affectent une petite proportion de l’espace génique. Les polymorphismes de nombre de copies observés chez les descendants étaient soit hérités soit générés par des mutations spontanées. Notre analyse montre aussi que les estimés du taux de mutation couvrent au moins trois ordres de grandeur, pouvant atteindre de hauts niveaux et variant pour différents gènes, allèles et classes de VNCs. Le taux de mutation du nombre de copies était aussi corrélé au niveau d’expression des gènes et la relation entre le taux de mutation et l’expression des gènes était mieux expliquée dans le cadre de l’hypothèse de barrière par la dérive génétique. Concernant l’hérédité des VNCs, nos résultats montrent que la plupart de ces derniers (70%) sont transmises en violation des lois mendéliennes de l’hérédité. La majorité des distorsions de transmission favorisaient la transmission d’une copie et contribuaient à la restauration rapide du génotype à deux-copies dans la génération suivante. Les niveaux de distorsion observés variaient considérablement et étaient influencés par des effets parentaux et des effets liés au contexte génétique. Nous avons aussi identifié des situations où la perte d’une copie de gène était favorisée et soumise à différentes formes de pressions sélectives. Cette étude montre que les mutations de novo et les distorsions de transmission de VNCs influencent la diversité génétique présente chez une espèce et jouent un rôle important dans l’adaptation et l’évolution.Copy number variations (CNVs) are large genetic variations detected among the individuals of every multicellular organism examined so far. These variations have a considerable impact on gene structure and function and have been shown to be involved in the control of several phenotypic traits. In plants, the key genetic features of CNVs are still poorly understood and even less is known about CNVs in trees. The goals of this thesis were to i) develop an approach for the identification of CNVs in the gene space of the conifer tree Picea glauca, ii) estimate the rate of CNV generation genome-wide and iii) examine the transmission patterns of CNVs from one generation to the next. We used SNP-array raw intensity genotyping data for 3663 individuals belonging to 55 full-sib families to scan more than 14 000 genes for CNVs. Our findings show that CNVs affect a small proportion of the gene space and copy number variants detected in the progeny were either inherited or generated through de novo events. Our analyses show that copy number (CN) mutation rate estimates spanned at least three orders of magnitude, could reach high levels and varied for different genes, alleles and CNV classes. CN mutation rate was also correlated with gene expression levels and the relationship between mutation rate and gene expression was best explained within the frame of the drift-barrier hypothesis (DBH). With regard to CNV inheritance, our results show that most CNVs (70%) are transmitted from the parents in violation of Mendelian expectations. The majority of transmission distortions favored the one-copy allele and contributed to the rapid restoration of the two-copy genotype in the next generation. The observed distortion levels varied considerably and were influenced by parental, partner genotype and genetic background effects. We also identified instances where the loss of a gene copy was favored and subject to different types of selection pressures. This study shows that de novo mutations and transmission distortions of CNVs contribute both to the shaping of the standing genetic variation and play an important role in species adaptation and evolution

    A genome-wide study of de novo deletions identifies a candidate locus for non-syndromic isolated cleft lip/palate risk

    Get PDF
    Background: Copy number variants (CNVs) may play an important part in the development of common birth defects such as oral clefts, and individual patients with multiple birth defects (including clefts) have been shown to carry small and large chromosomal deletions. In this paper we investigate de novo deletions defined as DNA segments missing in an oral cleft proband but present in both unaffected parents. We compare de novo deletion frequencies in children of European ancestry with an isolated, non-syndromic oral cleft to frequencies in children of European ancestry from randomly sampled trios.Results: We identified a genome-wide significant 62 kilo base (kb) non-coding region on chromosome 7p14.1 where de novo deletions occur more frequently among oral cleft cases than controls. We also observed wider de novo deletions among cleft lip and palate (CLP) cases than seen among cleft palate (CP) and cleft lip (CL) cases.Conclusions: This study presents a region where de novo deletions appear to be involved in the etiology of oral clefts, although the underlying biological mechanisms are still unknown. Larger de novo deletions are more likely to interfere with normal craniofacial development and may result in more severe clefts. Study protocol and sample DNA source can severely affect estimates of de novo deletion frequencies. Follow-up studies are needed to further validate these findings and to potentially identify additional structural variants underlying oral clefts. © 2014 Younkin et al.; licensee BioMed Central Ltd

    Strategies for Genome-Wide Association Analyses of Raw Copy Number Variation Data

    Get PDF
    Copy number variations (CNVs), as one type of genetic variation in which a large sequence of nucleotides is repeated in tandem multiple times to a variable extent among different individuals of one population, have gained much attention with regard to human phenotypic diversity. Recent efforts to map human structural variation have shown that CNVs affect a significantly larger proportion of the human genome than single nucleotide polymorphisms (SNPs). This gave rise to the idea of CNVs playing an important role in explaining some of the large proportion of the phenotypic variance in a population that is due to genetic factors and that could not yet be explained by common SNPs. Current data from SNP genotyping arrays were found to be useful not only for the genome-wide genotyping of SNPs, but also for the detection of CNVs. However, due to the mostly still inadequate accuracy of CNV detection and the rareness of provided methods for association testing, to design a genome-wide CNV association study can be a challenge. This thesis explored four strategies for the genome-wide association analyses of raw CNV data being derived from the Affymetrix Genome-Wide Human SNP Array 6.0. Initially, the two most commonly used strategic approaches are presented and applied to real data examples for the phenotypes early-onset extreme obesity and childhood attention - deficit / hyperactivity disorder (ADHD). On the one hand, raw intensity values reflecting individual copy numbers are directly tested for an association with the risk of disease, without providing or making use of any information about CNV genotypes. On the other hand, genome-wide CNV analyses are performed as a two-step procedure in first calling individual CNV genotypes and then using these to test for CNV - phenotype associations. Secondly, two extensions of the standard strategies are introduced, which both form its own strategy with a special focus on the intention to overcome problems and weaknesses of the respective widely used strategy. In this sense, one proposed strategy accounts for the fact that thousands of array-provided CNV marker are located in genomic regions without underlying copy number variability, and thus suggests to test only a pre-selected set of relevant and informative intensity values for associations in order to relax the multiple testing issue. Furthermore, the second proposed strategy addresses the known inaccuracy of CNV calling in especially common CNV regions that is often caused to some extent by the high CNV population frequency and the consequent inadequacy of estimating CNV genotypes relative to sample's mean or median hybridization intensity values. Instead, the use of intensity reference values being estimated in a Gaussian mixture model framework, called MCMR, is investigated in application to data examples for the HapMap and replicate samples as well as to the previously analysed obesity data set. The latter obesity sample has been analysed in use of all four genome-wide CNV analyses strategies which allowed a comparison on the strategy's applicability and performance. The four strategies were observed to greatly vary in terms of computing efforts and genetic results. Whereas one of the two standard strategies was successful in the identification of rare CNVs at the PARK2 locus being genome-wide statistitically significantly associated with ADHD in children, none of these two strategies detected any CNV - obesity association. Contrarily, alternative MCMR reference intensity values showed improved reliability of CNV calls compared to standard calling in terms of stability, reproducibility and false positive rates. As a consequence, a novel common CNV for early-onset extreme obesity on chromosome 11q11 was identified in application of the proposed analyses strategies. Moreover, a common deletion at chromosome 10q11.22, which was previously reported to be associated with body mass index (BMI), was also replicated in use of one the proposed strategies. The results suggest that the choice of the genome-wide CNV association analyses strategy may greatly influence genetic results. The presented strategic investigations presented here give an overview on aspects to consider when planning a genome-wide CNV analyses pipeline, but do not allow general recommendations towards an optimal design

    Contribution of unexplored genomic variations to neurodevelopmental disorders

    Get PDF
    Neurodevelopmental disorders are a group of conditions with impairments of the personal, social, academic or occupational behaviour. Autism spectrum disorder is a neurodevelopmental disorder with a high genetic component with a large fraction still unknown. In this dissertation we analyse two unexplored genomic variants: Chromosomal mosaicism and Ancestral polymorphic inversions. Chromosomal mosaic events are responsible for a small but significant proportion of patients with ASD (0.45%), with the additional detection of two loss of chromosome Y events. In addition, we developed a bioinformatic tool that improves previous methods to detect loss of chromosome Y: MADloy. In the study of ancestral polymorphic inversions, inv8p23.1 and inv17q21.31 inversions were associated with autism risk. Improvements on the method to genotype ancestral polymorphic inversions allowed the prediction of a novel inversion in 22q11.21 region which has been validated by fiber-FISH.Els trastorns del neurodesenvolupament son un grup de condicions amb discapacitats conductuals en els àmbits personals, socials, acadèmics o ocupacionals. Els trastorns d’espectre autista són un trastorn del neurodesenvolupament amb una gran component genètica, part de la qual encara es desconeguda. En aquest treball analitzem dues variants genòmiques poc explorades: els reordenaments cromosòmics en mosaic I les inversions ancestrals polimòrfiques. Els reordenaments cromosòmics en Mosaic son responsables d’una significant però petita proporció dels pacients amb trastorn d’espectre autista (0.45%), amb la detecció addicional de dues pèrdues del cromosoma Y. Addicionalment, s’ha desenvolupat una eina bioinformàtica que millora els mètodes previs per detectar la pèrdua de cromosoma Y: MADloy. En l’estudi de les inversions ancestrals polimòrfiques, les inversions inv8p23.1 i inv17q21.31 s’han associat amb el risc d’autisme. Millores en el mètode de genotipació de les inversions ha permès la predicció de una nova inversió localitzada a la regió 22q11.21 que s’ha validat per fiber-FISH
    corecore