377 research outputs found

    The Diploid Genome Sequence of an Individual Human

    Get PDF
    Presented here is a genome sequence of an individual human. It was produced from ~32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information

    ESTIMATING GENOME-WIDE COPY NUMBER USING ALLELE SPECIFIC MIXTURE MODELS

    Get PDF
    Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to be an underlying cause of cancer. More than one decade ago comparative genomic hybridization (CGH)technology was developed to detect copy number changes in a high-throughput fashion. However, this technology only provides a 10 MB resolution which limits the ability to detect copy number alterations spanning small regions. It is widely believed that a copy number alteration as small as one base can have significant downstream effects, thus microarray manufacturers have developed technologies that provide much higher resolution. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. CGH arrays use a two-color hybridization, usually comparing a sample of interest to a reference sample, which to some degree removes the probe effect. However, the resolution is not nearly high enough to provide single-point copy number estimates. Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing the resolution. Recently, regression-type models that account for probe-effect have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, constructed with public datasets, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy numbers. With the estimated models in place we can compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo packagehttp://www.bioconductor.org)

    On the power and the systematic biases of the detection of chromosomal inversions by paired-end genome sequencing

    Get PDF
    One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions -SVDetect, GRIAL, and VariationHunter-, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects

    Rare copy number variation in cerebral palsy

    Get PDF
    As per publisher: published online 22 May 2013Recent studies have established the role of rare copy number variants (CNVs) in several neurological disorders but the contribution of rare CNVs to cerebral palsy (CP) is not known. Fifty Caucasian families having children with CP were studied using two microarray designs. Potentially pathogenic, rare (<1% population frequency) CNVs were identified, and their frequency determined, by comparing the CNVs found in cases with 8329 adult controls with no known neurological disorders. Ten of the 50 cases (20%) had rare CNVs of potential relevance to CP; there were a total of 14 CNVs, which were observed in <0.1% (<8/8329) of the control population. Eight inherited from an unaffected mother: a 751-kb deletion including FSCB, a 1.5-Mb duplication of 7q21.13, a 534-kb duplication of 15q11.2, a 446-kb duplication including CTNND2, a 219-kb duplication including MCPH1, a 169-kb duplication of 22q13.33, a 64-kb duplication of MC2R, and a 135-bp exonic deletion of SLC06A1. Three inherited from an unaffected father: a 386-kb deletion of 12p12.2-p12.1, a 234-kb duplication of 10q26.13, and a 4-kb exonic deletion of COPS3. The inheritance was unknown for three CNVs: a 157-bp exonic deletion of ACOX1, a 693-kb duplication of 17q25.3, and a 265-kb duplication of DAAM1. This is the first systematic study of CNVs in CP, and although it did not identify de novo mutations, has shown inherited, rare CNVs involving potentially pathogenic genes and pathways requiring further investigation.Gai McMichael, Santhosh Girirajan, Andres Moreno-De-Luca, Jozef Gecz, Chloe Shard, Lam Son Nguyen, Jillian Nicholl, Catherine Gibson, Eric Haan, Evan Eichler, Christa Lese Martin and Alastair MacLenna

    KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses

    Get PDF
    High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations, and provides a critical resource that can be used to more accurately identify pathogenic genetic variants. We report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals in order to characterize the benign ethnicity-relevant genetic variation present in the Korean population. In total, KoVariome includes 12.7M single-nucleotide variants (SNVs), 1.7M short insertions and deletions (indels), 4K structural variations (SVs), and 3.6K copy number variations (CNVs). Among them, 2.4M (19%) SNVs and 0.4M (24%) indels were identified as novel. We also discovered selective enrichment of 3.8M SNVs and 0.5M indels in Korean individuals, which were used to filter out 1,271 coding-SNVs not originally removed from the 1,000 Genomes Project when prioritizing disease-causing variants. KoVariome health records were used to identify novel disease-causing variants in the Korean population, demonstrating the value of high-quality ethnic variation databases for the accurate interpretation of individual genomes and the precise characterization of genetic variation

    "GenotypeColour™": colour visualisation of SNPs and CNVs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The volume of data available on genetic variations has increased considerably with the recent development of high-density, single-nucleotide polymorphism (SNP) arrays. Several software programs have been developed to assist researchers in the analysis of this huge amount of data, but few can rely upon a whole genome variability visualisation system that could help data interpretation.</p> <p>Results</p> <p>We have developed <it>GenotypeColour™ </it>as a rapid user-friendly tool able to upload, visualise and compare the huge amounts of data produced by Affymetrix Human Mapping GeneChips without losing the overall view of the data.</p> <p>Some features of <it>GenotypeColour™ </it>include visualising the entire genome variability in a single screenshot for one or more samples, the simultaneous display of the genotype and Copy Number state for thousands of SNPs, and the comparison of large amounts of samples by producing "consensus" images displaying regions of complete or partial identity. The software is also useful for genotype analysis of trios and to show regions of potential uniparental disomy (UPD). All information can then be exported in a tabular format for analysis with dedicated software. At present, the software can handle data from 10 K, 100 K, 250 K, 5.0 and 6.0 Affymetrix chips.</p> <p>Conclusion</p> <p>We have created a software that offers a new way of displaying and comparing SNP and CNV genomic data. The software is available free at <url>http://www.med.unibs.it/~barlati/GenotypeColour</url> and is especially useful for the analysis of multiple samples.</p

    Correction: Exome Sequencing in an Admixed Isolated Population IndicatesNFXL1 Variants Confer a Risk for Specific Language Impairment

    Get PDF
    Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10–4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model

    An enhanced method for targeted next generation sequencing copy number variant detection using ExomeDepth [version 1; peer review: 1 approved, 1 approved with reservations]

    Get PDF
    Copy number variants (CNV) are a major cause of disease, with over 30,000 reported in the DECIPHER database. To use read depth data from targeted Next Generation Sequencing (NGS) panels to identify CNVs with the highest degree of sensitivity, it is necessary to account for biases inherent in the data. GC content and ambiguous mapping due to repetitive sequence elements and pseudogenes are the principal components of technical variability. In addition, the algorithms used favour the detection of multi-exon CNVs, and rely on suitably matched normal dosage samples for comparison. We developed a calling strategy that subdivides target intervals, and uses pools of historical control samples to overcome these limitations in a clinical diagnostic laboratory. We compared our enhanced strategy with an unmodified pipeline using the R software package ExomeDepth, using a cohort of 109 heterozygous CNVs (91 deletions, 18 duplications in 26 genes), including 25 single exon CNVs. The unmodified pipeline detected 104/109 CNVs, giving a sensitivity of 89.62% to 98.49% at the 95% confidence interval. The detection of all 109 CNVs by our enhanced method demonstrates 95% confidence the sensitivity is ≥96.67%, allowing NGS read depth analysis to be used for CNV detection in a clinical diagnostic setting

    Genome-Wide Association Study of Copy Number Variants Suggests LTBP1 and FGD4 Are Important for Alcohol Drinking

    Get PDF
    Alcohol dependence (AD) is a complex disorder characterized by psychiatric and physiological dependence on alcohol. AD is reflected by regular alcohol drinking, which is highly inheritable. In this study, to identify susceptibility genes associated with alcohol drinking, we performed a genome-wide association study of copy number variants (CNVs) in 2,286 Caucasian subjects with Affymetrix SNP6.0 genotyping array. We replicated our findings in 1,627 Chinese subjects with the same genotyping array. We identified two CNVs, CNV207 (combined p-value 1.91E-03) and CNV1836 (combined p-value 3.05E-03) that were associated with alcohol drinking. CNV207 and CNV1836 are located at the downstream of genes LTBP1 (870 kb) and FGD4 (400 kb), respectively. LTBP1, by interacting TGFB1, may down-regulate enzymes directly participating in alcohol metabolism. FGD4 plays a role in clustering and trafficking GABAA receptor and subsequently influence alcohol drinking through activating CDC42. Our results provide suggestive evidence that the newly identified CNV regions and relevant genes may contribute to the genetic mechanism of alcohol dependence

    An examination of the Apo-1/Fas promoter Mva I polymorphism in Japanese patients with multiple sclerosis

    Get PDF
    BACKGROUND: The Apo-1/Fas (CD95) molecule is an apoptosis-signaling cell surface receptor belonging to the tumor necrosis factor (TNF) receptor family. Both Fas and Fas ligand (FasL) are expressed in activated mature T cells, and prolonged cell activation induces susceptibility to Fas-mediated apoptosis. The Apo-1/Fas gene is located in a chromosomal region that shows linkage in multiple sclerosis (MS) genome screens, and studies indicate that there is aberrant expression of the Apo-1/Fas molecule in MS. METHODS: Mva I polymorphism on the Apo-1/Fas promoter gene was detected by PCR-RFLP from the DNA of 114 Japanese patients with conventional MS and 121 healthy controls. We investigated the association of the Mva I polymorphism in Japanese MS patients using a case-control association study design. RESULTS: We found no evidence that the polymorphism contributes to susceptibility to MS. Furthermore, there was no association between Apo-1/Fas gene polymorphisms and clinical course (relapsing-remitting course or secondary-progressive course). No significant association was observed between Apo-1/Fas gene polymorphisms and the age at disease onset. CONCLUSIONS: Overall, our findings suggest that Apo-1/Fas promoter gene polymorphisms are not conclusively related to susceptibility to MS or the clinical characteristics of Japanese patients with MS
    corecore