20 research outputs found
Direct chromosome-length haplotyping by single-cell sequencing
Haplotypes are fundamental to fully characterize the diploid genome of an individual, yet methods to directly chart the unique genetic makeup of each parental chromosome are lacking. Here we introduce single-cell DNA template strand sequencing (Strand-seq) as a novel approach to phasing diploid genomes along the entire length of all chromosomes. We demonstrate this by building a complete haplotype for a HapMap individual (NA12878) at high accuracy (concordance 99.3%), without using generational information or statistical inference. By use of this approach, we mapped all meiotic recombination events in a family trio with high resolution (median range ∼14 kb) and phased larger structural variants like deletions, indels, and balanced rearrangements like inversions. Lastly, the single-cell resolution of Strand-seq allowed us to observe loss of heterozygosity regions in a small number of cells, a significant advantage for studies of heterogeneous cell populations, such as cancer cells. We conclude that Strand-seq is a unique and powerful approach to completely phase individual genomes and map inheritance patterns in families, while preserving haplotype differences between single cells
A Fine-Mapping Study of 7 Top Scoring Genes from a GWAS for Major Depressive Disorder
Major depressive disorder (MDD) is a psychiatric disorder that is characterized -amongst others- by persistent depressed mood, loss of interest and pleasure and psychomotor retardation. Environmental circumstances have proven to influence the aetiology of the disease, but MDD also has an estimated 40% heritability, probably with a polygenic background. In 2009, a genome wide association study (GWAS) was performed on the Dutch GAIN-MDD cohort. A non-synonymous coding single nucleotide polymorphism (SNP) rs2522833 in the PCLO gene became only nominally significant after post-hoc analysis with an Australian cohort which used similar ascertainment. The absence of genome-wide significance may be caused by low SNP coverage of genes. To increase SNP coverage to 100% for common variants (m.a.f.>0.1, r2>0.8), we selected seven genes from the GAIN-MDD GWAS: PCLO, GZMK, ANPEP, AFAP1L1, ST3GAL6, FGF14 and PTK2B. We genotyped 349 SNPs and obtained the lowest P-value for rs2715147 in PCLO at P = 6.8E−7. We imputed, filling in missing genotypes, after which rs2715147 and rs2715148 showed the lowest P-value at P = 1.2E−6. When we created a haplotype of these SNPs together with the non-synonymous coding SNP rs2522833, the P-value decreased to P = 9.9E−7 but was not genome wide significant. Although our study did not identify a more strongly associated variant, the results for PCLO suggest that the causal variant is in high LD with rs2715147, rs2715148 and rs2522833
Chromosome-Wise Dissection of the Genome of the Extremely Big Mouse Line DU6i
The extreme high-body-weight-selected mouse line DU6i is a polygenic model for growth research, harboring many small-effect QTL. We dissected the genome of this line into 19 autosomes and the Y chromosome by the construction of a new panel of chromosome substitution strains (CSS). The DU6i chromosomes were transferred to a DBA/2 mice genetic background by marker-assisted recurrent backcrossing. Mitochondria and the X chromosome were of DBA/2 origin in the backcross. During the construction of these novel strains, >4000 animals were generated, phenotyped, and genotyped. Using these data, we studied the genetic control of variation in body weight and weight gain at 21, 42, and 63 days. The unique data set facilitated the analysis of chromosomal interaction with sex and parent-of-origin effects. All analyzed chromosomes affected body weight and weight gain either directly or in interaction with sex or parent of origin. The effects were age specific, with some chromosomes showing opposite effects at different stages of development
The number of SNPs used for the epistasis analysis.
<p>The number of SNPs used for the epistasis analysis.</p
Base pairs and percentage of region covered on the Sequence Capture arrays.
<p>Base pairs and percentage of region covered on the Sequence Capture arrays.</p
LD-plot of the region of interest in PCLO.
<p>The SNPs with the lowest P-values, rs2715147 and rs2715148 are in high LD with eachother and with rs2522833. This supports the hypothesis that either rs2522833 or a SNP in high LD with it is the most likely causal variant in this cohort.</p
Haplotypes constructed using PLINK and their respective P-values.
<p>Haplotypes constructed using PLINK and their respective P-values.</p
Linear fit for the Z-scores and correlation (√r<sup>2</sup>) between markers and rs2715147.
<p>The linear fit with Z-scores versus r relative to rs2715147, for 77 markers in <i>PCLO</i>.</p