506 research outputs found
Biological Role and Disease Impact of Copy Number Variation in Complex Disease
In the human genome, DNA variants give rise to a variety of complex phenotypes. Ranging from single base mutations to copy number variations (CNVs), many of these variants are neutral in selection and disease etiology, making difficult the detection of true common or rare frequency disease-causing mutations. However, allele frequency comparisons in cases, controls, and families may reveal disease associations. Single nucleotide polymorphism (SNP) arrays and exome sequencing are popular assays for genome-wide variant identification. To limit bias between samples, uniform testing is crucial, including standardized platform versions and sample processing. Bases occupy single points while copy variants occupy segments. Bases are bi-allelic while copies are multi-allelic. One genome also encodes many different cell types. In this study, we investigate how CNV impacts different cell types, including heart, brain and blood cells, all of which serve as models of complex disease. Here, we describe ParseCNV, a systematic algorithm specifically developed as a part of this project to perform more accurate disease associations using SNP arrays or exome sequencing-generated CNV calls with quality tracking of variants, contributing to each significant overlap signal. Red flags of variant quality, genomic region, and overlap profile are assessed in a continuous score and shown to correlate over 90% with independent verification methods. We compared these data with our large internal cohort of 68,000 subjects, with carefully mapped CNVs, which gave a robust rare variant frequency in unaffected populations. In these investigations, we uncovered a number of loci in which CNVs are significantly enriched in non-coding RNA (ncRNA), Online Mendelian Inheritance in Man (OMIM), and genome-wide association study (GWAS) regions, impacting complex disease. By evaluating thoroughly the variant frequencies in pediatric individuals, we subsequently compared these frequencies in geriatric individuals to gain insight of these variants\u27 impact on lifespan. Longevity-associated CNVs enriched in pediatric patients were found to aggregate in alternative splicing genes. Congenital heart disease is the most common birth defect and cause of infant mortality. When comparing congenital heart disease families, with cases and controls genotyped both on SNP arrays and exome sequencing, we uncovered significant and confident loci that provide insight into the molecular basis of disease. Neurodevelopmental disease affects the quality of life and cognitive potential of many children. In the neurodevelopmental and psychiatric diseases, CACNA, GRM, CNTN, and SLIT gene families show multiple significant signals impacting a large number of developmental and psychiatric disease traits, with the potential of informing therapeutic decision-making. Through new tool development and analysis of large disease cohorts genotyped on a variety of assays, I have uncovered an important biological role and disease impact of CNV in complex disease
Data analysis methods for copy number discovery and interpretation
Copy
number
variation
(CNV)
is
an
important
type
of
genetic
variation
that
can
give
rise
to
a
wide
variety
of
phenotypic
traits.
Differences
in
copy
number
are
thought
to
play
major
roles
in
processes
that
involve
dosage
sensitive
genes,
providing
beneficial,
deleterious
or
neutral
modifications
to
individual
phenotypes.
Copy
number
analysis
has
long
been
a
standard
in
clinical
cytogenetic
laboratories.
Gene
deletions
and
duplications
can
often
be
linked
with
genetic
Syndromes
such
as:
the
7q11.23
deletion
of
Williams-‐Bueren
Syndrome,
the
22q11
deletion
of
DiGeorge
syndrome
and
the
17q11.2
duplication
of
Potocki-‐Lupski
syndrome.
Interestingly,
copy
number
based
genomic
disorders
often
display
reciprocal
deletion
/
duplication
syndromes,
with
the
latter
frequently
exhibiting
milder
symptoms.
Moreover,
the
study
of
chromosomal
imbalances
plays
a
key
role
in
cancer
research.
The
datasets
used
for
the
development
of
analysis
methods
during
this
project
are
generated
as
part
of
the
cutting-‐edge
translational
project,
Deciphering
Developmental
Disorders
(DDD).
This
project,
the
DDD,
is
the
first
of
its
kind
and
will
directly
apply
state
of
the
art
technologies,
in
the
form
of
ultra-‐high
resolution
microarray
and
next
generation
sequencing
(NGS),
to
real-‐time
genetic
clinical
practice.
It
is
collaboration
between
the
Wellcome
Trust
Sanger
Institute
(WTSI)
and
the
National
Health
Service
(NHS)
involving
the
24
regional
genetic
services
across
the
UK
and
Ireland.
Although
the
application
of
DNA
microarrays
for
the
detection
of
CNVs
is
well
established,
individual
change
point
detection
algorithms
often
display
variable
performances.
The
definition
of
an
optimal
set
of
parameters
for
achieving
a
certain
level
of
performance
is
rarely
straightforward,
especially
where
data
qualities
vary ... [cont.]
Association mapping of genomic microdeletions and common susceptibility variants predisposing to genetic generalized epilepsies
Approximately 3% of the general population is affected by epilepsy during lifetime,
making epilepsy one of the most common neurological diseases. Genetic generalized
epilepsies (GGE) are the most common of genetic epilepsies and account for 20-30%
of all epilepsies. GGE is subdivided into genetically determined subgroups with
gradual transition, including genetic absence epilepsies (GAE), juvenile myoclonic
epilepsy (JME), and epilepsy with generalized tonic-clonic seizures (EGTCS). In spite
of a high heritability rate of 80% and a predominant genetic etiology, the genetic
factors predisposing to GGE are still mostly unknown. In the present study, we
carried out association studies to investigate whether genomic microdeletions and
common susceptibility variants increase risk for GGE.
To test the common disease/common variant hypothesis, genome-wide association
studies (GWAS) were performed in several GGE cohorts using case-control and
family-based study designs. For analysis, all patients were either pooled or stratified
according to the subgroup they belong to in order to detect common or subgroupspecific
risk factors, respectively. The GWAS comprised a case-control cohort of
1,523 European GGE patients and 2,454 German controls and a sample cohort of 566
European parent-offspring trios. Meta-GWAS analyses revealed significant
association (P < 5.0 × 10-8) with GGE at 2p16.1 (rs35577149, meta-analysis P = 1.65E-08, OR[C] = 0.78, 95% CI 0.71 - 0.86). Significant association with JME was
detected at 1q43 (rs12059546, meta-analysis P = 2.27E-08, OR[G] = 1.53, 95% CI
1.33 - 1.78). Suggestive evidence for association (P < 1.0E-05) was found for GGE
at 8q12.2 (rs6999304, meta-analysis P= 1.77E-06, OR[G] = 1.33, 95% CI 1.17 -
1.51) and for GAE at 2q22.3 (rs75917352, meta-analysis P = 1.41E-07, OR[T] =
0.67, 95% CI 0.58 - 0.79). The associated regions harbor high-ranking candidate
genes: CHRM3 at 1q43, VRK2 at 2p16.1, and ZEB2 at 2q22.3. Further replication
efforts are necessary to elucidate whether these positional candidate genes
contribute to the heritability of the common GGE syndromes.
Exploring the rare variant/common disease hypothesis, we investigated the impact
of six recurrent microdeletions on the genetic risk of GGE at the genomic hotspot
regions 1q21.1, 15q11.2, 15q13.3, 16p11.2, 16p13.11, and 22q11.2, which had been implicated as rare genetic risk factors in a wide range of neurodevelopmental
disorders. Recurrent microdeletions were assessed in 1,497 European GGE patients,
5,374 controls, and 566 GGE trios using high-resolution SNP microarrays.
Considering all six microdeletion hot spots together, we found a significant excess of
these microdeletions in 2,563 GGE patients versus 5,940 controls (P < 2.20E-16,
OR = 7.65, 95% CI 4.59 - 13.18). Individually, significant associations with GGE were
observed for the microdeletions at 15q11.2 (P = 1.12E-4, OR = 3.59, 95% CI 1.80
- 7.25), 15q13.3 (P = 5.48× 10−9) and 16p13.11 (P = 4.42E-06, OR = 17.39, 95% CI
3.86 - 159.88).
In a candidate-gene approach, we tested whether exon-disrupting/removing
microdeletions in the genes encoding NRXN1 and RBFOX1 confer susceptibility for
GGE. We found a significant association with GGE at both loci (NRXN1: P = 0.0049;
RBFOX1: P = 0.0083). However, high phenotypic variability and incomplete
penetrance, resulting in apparently imperfect segregation, indicate that partial
NRXN1 and RBFOX1 deletions represent susceptibility factors rather than highly
penetrant mutations.
The present study substantiates a role of both genomic microdeletions and common
susceptibility variants in the genetic predisposition of common GGE syndromes. We
strengthened the statistical evidence for associations of genetic variants at 1q43,
2p16.1, and 2q23.2 with GGE syndromes and identified a novel susceptibility locus
at 8q12.2. Although individually rare, the associations of all microdeletions at
15q11.2, 15q13.3, 16p13.3, NRXN1, and RBFOX1 taken together contribute
significantly to the genetic variance of GGE
CNV analysis in Chinese children of mental retardation highlights a sex differentiation in parental contribution to de novo and inherited mutational burdens
Rare copy number variations (CNVs) are a known genetic etiology in neurodevelopmental disorders (NDD). Comprehensive CNV analysis was performed in 287 Chinese children with mental retardation and/or development delay (MR/DD) and their unaffected parents. When compared with 5,866 ancestry-matched controls, 11~12% more MR/DD children carried rare and large CNVs. The increased CNV burden in MR/DD was predominantly due to de novo CNVs, the majority of which (62%) arose in the paternal germline. We observed a 2~3 fold increase of large CNV burden in the mothers of affected children. By implementing an evidence-based review approach, pathogenic structural variants were identified in 14.3% patients and 2.4% parents, respectively. Pathogenic CNVs in parents were all carried by mothers. The maternal transmission bias of deleterious CNVs was further replicated in a published dataset. Our study confirms the pathogenic role of rare CNVs in MR/DD, and provides additional evidence to evaluate the dosage sensitivity of some candidate genes. It also supports a population model of MR/DD that spontaneous mutations in males’ germline are major contributor to the de novo mutational burden in offspring, with higher penetrance in male than female; unaffected carriers of causative mutations, mostly females, then contribute to the inherited mutational burden.published_or_final_versio
Copy number variations in the gene space of Picea glauca
Les variations de nombre de copies (VNCs) sont des variations génétiques de grande taille qui ont été détectées parmi les individus de tous les organismes multicellulaires examinés à ce jour. Ces variations ont un impact considérable sur la structure et la fonction des gènes et ont été impliquées dans le contrôle de différents traits phénotypiques. Chez les plantes, les caractéristiques génétiques des VNCs sont encore peu caractérisées et les connaissances concernant les VNCs sont encore plus limitées chez les espèces arborescentes. Les objectifs principaux de cette thèse consistaient i) au développement d’une approche pour la détection de VNCs dans l’espace génique de conifères arborescents appartenant à l’espèce P. glauca, ii) à l’estimation du taux de mutation des VNCs à l’échelle du génome et iii) à l’examen des profils de transmission des VNCs d’une génération à la suivante. Nous avons utilisé des données brutes de génotypage par puces de SNPs qui ont été générées pour 3663 individus appartenant à 55 familles biparentales, et avons examiné plus de 14 000 gènes pour identifier des VNCs. Nos résultats montrent que les VNCs affectent une petite proportion de l’espace génique. Les polymorphismes de nombre de copies observés chez les descendants étaient soit hérités soit générés par des mutations spontanées. Notre analyse montre aussi que les estimés du taux de mutation couvrent au moins trois ordres de grandeur, pouvant atteindre de hauts niveaux et variant pour différents gènes, allèles et classes de VNCs. Le taux de mutation du nombre de copies était aussi corrélé au niveau d’expression des gènes et la relation entre le taux de mutation et l’expression des gènes était mieux expliquée dans le cadre de l’hypothèse de barrière par la dérive génétique. Concernant l’hérédité des VNCs, nos résultats montrent que la plupart de ces derniers (70%) sont transmises en violation des lois mendéliennes de l’hérédité. La majorité des distorsions de transmission favorisaient la transmission d’une copie et contribuaient à la restauration rapide du génotype à deux-copies dans la génération suivante. Les niveaux de distorsion observés variaient considérablement et étaient influencés par des effets parentaux et des effets liés au contexte génétique. Nous avons aussi identifié des situations où la perte d’une copie de gène était favorisée et soumise à différentes formes de pressions sélectives. Cette étude montre que les mutations de novo et les distorsions de transmission de VNCs influencent la diversité génétique présente chez une espèce et jouent un rôle important dans l’adaptation et l’évolution.Copy number variations (CNVs) are large genetic variations detected among the individuals of every multicellular organism examined so far. These variations have a considerable impact on gene structure and function and have been shown to be involved in the control of several phenotypic traits. In plants, the key genetic features of CNVs are still poorly understood and even less is known about CNVs in trees. The goals of this thesis were to i) develop an approach for the identification of CNVs in the gene space of the conifer tree Picea glauca, ii) estimate the rate of CNV generation genome-wide and iii) examine the transmission patterns of CNVs from one generation to the next. We used SNP-array raw intensity genotyping data for 3663 individuals belonging to 55 full-sib families to scan more than 14 000 genes for CNVs. Our findings show that CNVs affect a small proportion of the gene space and copy number variants detected in the progeny were either inherited or generated through de novo events. Our analyses show that copy number (CN) mutation rate estimates spanned at least three orders of magnitude, could reach high levels and varied for different genes, alleles and CNV classes. CN mutation rate was also correlated with gene expression levels and the relationship between mutation rate and gene expression was best explained within the frame of the drift-barrier hypothesis (DBH). With regard to CNV inheritance, our results show that most CNVs (70%) are transmitted from the parents in violation of Mendelian expectations. The majority of transmission distortions favored the one-copy allele and contributed to the rapid restoration of the two-copy genotype in the next generation. The observed distortion levels varied considerably and were influenced by parental, partner genotype and genetic background effects. We also identified instances where the loss of a gene copy was favored and subject to different types of selection pressures. This study shows that de novo mutations and transmission distortions of CNVs contribute both to the shaping of the standing genetic variation and play an important role in species adaptation and evolution
A genome-wide study of de novo deletions identifies a candidate locus for non-syndromic isolated cleft lip/palate risk
Background: Copy number variants (CNVs) may play an important part in the development of common birth defects such as oral clefts, and individual patients with multiple birth defects (including clefts) have been shown to carry small and large chromosomal deletions. In this paper we investigate de novo deletions defined as DNA segments missing in an oral cleft proband but present in both unaffected parents. We compare de novo deletion frequencies in children of European ancestry with an isolated, non-syndromic oral cleft to frequencies in children of European ancestry from randomly sampled trios.Results: We identified a genome-wide significant 62 kilo base (kb) non-coding region on chromosome 7p14.1 where de novo deletions occur more frequently among oral cleft cases than controls. We also observed wider de novo deletions among cleft lip and palate (CLP) cases than seen among cleft palate (CP) and cleft lip (CL) cases.Conclusions: This study presents a region where de novo deletions appear to be involved in the etiology of oral clefts, although the underlying biological mechanisms are still unknown. Larger de novo deletions are more likely to interfere with normal craniofacial development and may result in more severe clefts. Study protocol and sample DNA source can severely affect estimates of de novo deletion frequencies. Follow-up studies are needed to further validate these findings and to potentially identify additional structural variants underlying oral clefts. © 2014 Younkin et al.; licensee BioMed Central Ltd
Strategies for Genome-Wide Association Analyses of Raw Copy Number Variation Data
Copy number variations (CNVs), as one type of genetic variation in which a large sequence of nucleotides is repeated in tandem multiple times to a variable extent among different individuals of one population, have gained much attention with regard to human phenotypic diversity. Recent efforts to map human structural variation have shown that CNVs affect a significantly larger proportion of the human genome than single nucleotide polymorphisms (SNPs). This gave rise to the idea of CNVs playing an important role in explaining some of the large proportion of the phenotypic variance in a population that is due to genetic factors and that could not yet be explained by common SNPs. Current data from SNP genotyping arrays were found to be useful not only for the genome-wide genotyping of SNPs, but also for the detection of CNVs. However, due to the mostly still inadequate accuracy of CNV detection and the rareness of provided methods for association testing, to design a genome-wide CNV association study can be a challenge.
This thesis explored four strategies for the genome-wide association analyses of raw CNV data being derived from the Affymetrix Genome-Wide Human SNP Array 6.0. Initially, the two most commonly used strategic approaches are presented and applied to real data examples for the phenotypes early-onset extreme obesity and childhood attention - deficit / hyperactivity disorder (ADHD). On the one hand, raw intensity values reflecting individual copy numbers are directly tested for an association with the risk of disease, without providing or making use of any information about CNV genotypes. On the other hand, genome-wide CNV analyses are performed as a two-step procedure in first calling individual CNV genotypes and then using these to test for CNV - phenotype associations. Secondly, two extensions of the standard strategies are introduced, which both form its own strategy with a special focus on the intention to overcome problems and weaknesses of the respective widely used strategy. In this sense, one proposed strategy accounts for the fact that thousands of array-provided CNV marker are located in genomic regions without underlying copy number variability, and thus suggests to test only a pre-selected set of relevant and informative intensity values for associations in order to relax the multiple testing issue. Furthermore, the second proposed strategy addresses the known inaccuracy of CNV calling in especially common CNV regions that is often caused to some extent by the high CNV population frequency and the consequent inadequacy of estimating CNV genotypes relative to sample's mean or median hybridization intensity values. Instead, the use of intensity reference values being estimated in a Gaussian mixture model framework, called MCMR, is investigated in application to data examples for the HapMap and replicate samples as well as to the previously analysed obesity data set. The latter obesity sample has been analysed in use of all four genome-wide CNV analyses strategies which allowed a comparison on the strategy's applicability and performance.
The four strategies were observed to greatly vary in terms of computing efforts and genetic results. Whereas one of the two standard strategies was successful in the identification of rare CNVs at the PARK2 locus being genome-wide statistitically significantly associated with ADHD in children, none of these two strategies detected any CNV - obesity association. Contrarily, alternative MCMR reference intensity values showed improved reliability of CNV calls compared to standard calling in terms of stability, reproducibility and false positive rates. As a consequence, a novel common CNV for early-onset extreme obesity on chromosome 11q11 was identified in application of the proposed analyses strategies. Moreover, a common deletion at chromosome 10q11.22, which was previously reported to be associated with body mass index (BMI), was also replicated in use of one the proposed strategies.
The results suggest that the choice of the genome-wide CNV association analyses strategy may greatly influence genetic results. The presented strategic investigations presented here give an overview on aspects to consider when planning a genome-wide CNV analyses pipeline, but do not allow general recommendations towards an optimal design
Contribution of unexplored genomic variations to neurodevelopmental disorders
Neurodevelopmental disorders are a group of conditions with impairments of the personal, social, academic or occupational behaviour. Autism spectrum disorder is a neurodevelopmental disorder with a high genetic component with a large fraction still unknown. In this dissertation we analyse two unexplored genomic variants: Chromosomal mosaicism and Ancestral polymorphic inversions. Chromosomal mosaic events are responsible for a small but significant proportion of patients with ASD (0.45%), with the additional detection of two loss of chromosome Y events. In addition, we developed a bioinformatic tool that improves previous methods to detect loss of chromosome Y: MADloy. In the study of ancestral polymorphic inversions, inv8p23.1 and inv17q21.31 inversions were associated with autism risk. Improvements on the method to genotype ancestral polymorphic inversions allowed the prediction of a novel inversion in 22q11.21 region which has been validated by fiber-FISH.Els trastorns del neurodesenvolupament son un grup de condicions amb discapacitats conductuals en els àmbits personals, socials, acadèmics o ocupacionals. Els trastorns d’espectre autista són un trastorn del neurodesenvolupament amb una gran component genètica, part de la qual encara es desconeguda. En aquest treball analitzem dues variants genòmiques poc explorades: els reordenaments cromosòmics en mosaic I les inversions ancestrals polimòrfiques. Els reordenaments cromosòmics en Mosaic son responsables d’una significant però petita proporció dels pacients amb trastorn d’espectre autista (0.45%), amb la detecció addicional de dues pèrdues del cromosoma Y. Addicionalment, s’ha desenvolupat una eina bioinformàtica que millora els mètodes previs per detectar la pèrdua de cromosoma Y: MADloy. En l’estudi de les inversions ancestrals polimòrfiques, les inversions inv8p23.1 i inv17q21.31 s’han associat amb el risc d’autisme. Millores en el mètode de genotipació de les inversions ha permès la predicció de una nova inversió localitzada a la regió 22q11.21 que s’ha validat per fiber-FISH
Recommended from our members
Haplotype Phasing and Inheritance of Copy Number Variants in Nuclear Families
DNA copy number variants (CNVs) that alter the copy number of a particular DNA segment in the genome play an important role in human phenotypic variability and disease susceptibility. A number of CNVs overlapping with genes have been shown to confer risk to a variety of human diseases thus highlighting the relevance of addressing the variability of CNVs at a higher resolution. So far, it has not been possible to deterministically infer the allelic composition of different haplotypes present within the CNV regions. We have developed a novel computational method, called PiCNV, which enables to resolve the haplotype sequence composition within CNV regions in nuclear families based on SNP genotyping microarray data. The algorithm allows to i) phase normal and CNV-carrying haplotypes in the copy number variable regions, ii) resolve the allelic copies of rearranged DNA sequence within the haplotypes and iii) infer the heritability of identified haplotypes in trios or larger nuclear families. To our knowledge this is the first program available that can deterministically phase null, mono-, di-, tri- and tetraploid genotypes in CNV loci. We applied our method to study the composition and inheritance of haplotypes in CNV regions of 30 HapMap Yoruban trios and 34 Estonian families. For 93.6% of the CNV loci, PiCNV enabled to unambiguously phase normal and CNV-carrying haplotypes and follow their transmission in the corresponding families. Furthermore, allelic composition analysis identified the co-occurrence of alternative allelic copies within 66.7% of haplotypes carrying copy number gains. We also observed less frequent transmission of CNV-carrying haplotypes from parents to children compared to normal haplotypes and identified an emergence of several de novo deletions and duplications in the offspring.Peer reviewe
- …