396 research outputs found

    2011 Introduction to Curt Stern Award1

    Get PDF

    Massively parallel rare disease genetics

    Get PDF
    A report on the 'Genomic Disorders 2011 - The Genomics of Rare Diseases' meeting, Wellcome Trust Sanger Institute, Hinxton, UK, 23-26 March 201

    A MULTILEVEL MODEL TO ADDRESS BATCH EFFECTS IN COPY NUMBER USING SNP ARRAYS

    Get PDF
    Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org)

    Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays

    Get PDF
    Extended and validated CRLMM is shown to be more accurate than the Affymetrix default programs, and datasets and methods for validation are presented that can serve as standard benchmarks by which future SNP chip calling algorithms can be measured

    Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays

    Get PDF
    Extended and validated CRLMM is shown to be more accurate than the Affymetrix default programs, and datasets and methods for validation are presented that can serve as standard benchmarks by which future SNP chip calling algorithms can be measured

    Pathogenic alleles in microtubule, secretory granule and extracellular matrix-related genes in familial keratoconus.

    Get PDF
    Keratoconus is a common corneal defect with a complex genetic basis. By whole exome sequencing of affected members from 11 multiplex families of European ancestry, we identified 23 rare, heterozygous, potentially pathogenic variants in 8 genes. These include nonsynonymous single amino acid substitutions in HSPG2, EML6 and CENPF in two families each, and in NBEAL2, LRP1B, PIK3CG and MRGPRD in three families each; ITGAX had nonsynonymous single amino acid substitutions in two families and an indel with a base substitution producing a nonsense allele in the third family. Only HSPG2, EML6 and CENPF have been associated with ocular phenotypes previously. With the exception of MRGPRD and ITGAX, we detected the transcript and encoded protein of the remaining genes in the cornea and corneal cell cultures. Cultured stromal cells showed cytoplasmic punctate staining of NBEAL2, staining of the fibrillar cytoskeletal network by EML6, while CENPF localized to the basal body of primary cilia. We inhibited the expression of HSPG2, EML6, NBEAL2 and CENPF in stromal cell cultures and assayed for the expression of COL1A1 as a readout of corneal matrix production. An upregulation in COL1A1 after siRNA inhibition indicated their functional link to stromal cell biology. For ITGAX, encoding a leukocyte integrin, we assayed its level in the sera of 3 affected families compared with 10 unrelated controls to detect an increase in all affecteds. Our study identified genes that regulate the cytoskeleton, protein trafficking and secretion, barrier tissue function and response to injury and inflammation, as being relevant to keratoconus

    The tetranucleotide repeat polymorphism D21S1245 demonstrates hypermutability in germline and somatic cells

    Get PDF
    Six novel polymorphic short sequence repeats were identified and localized on the linkage map of human chromosome 21 by genotyping the CEPH reference pedigrees. One of these markers, the tetrameric (AAAG)n repeat D21S1245, was found to be hypermutable. In the DNAs from lymphoblastoid cell lines of members of the 40 CEPH families a total of 18 new alleles were detected. These new alleles, sometimes appearing in mosaic forms, arose equally in paternal and maternal DNAs, and could be equally larger or smaller than the alleles from which they were derived. The larger alleles of D21S1245 are more prone to be converted to new alleles. None of the new alleles with mosaicism were present in the corresponding genomic blood DNA, and therefore originated during or after the establishment of the lymphoblastoid cell lines; half of the new alleles without mosaicism were also found in genomic blood DNA of the appropriate CEPH individuals. The range of germline mutation rate observed In the 716 meioses examined was 0.56-1.4×10−2 the range of somatic mutations observed in the 405 cell lines examined was 1.96-3.46×10−2 This is one of the most hypermutabie microsatellite repeat polymorphism in the human genome detected to date. D21S1245, is highly polymorphic (heterozygosity of 0.96) and maps between D21S231 and D21S19

    Estimation of the frequency of isoform–genotype discrepancies at the apolipoprotein E locus in heterozygotes for the isoforms

    Full text link
    Estimates of the impact of apolipoprotein E (apo E) alleles coding for the three common isoforms on plasma lipid levels assume genetic homogeneity among the genotype classes. To test this assumption, we have determined the apo E genotype at the two common polymorphic sites (amino acids 112 and 158) by DNA amplification and hybridisation with allele‐specific oligoprobes, in 195 unrelated Caucasian participants of the Rochester Family Heart Study previously classified as heterozygotes by isoelectric focusing (IEF). Fourteen discordant samples were initially detected. Repeat typing of these samples by both methods resolved nine discrepancies and analysis of additional blood samples from the remaining five individuals eliminated a further four discrepancies. The only truly discordant allele was found in a female subject who had an E3 isoform with the common E2 (Cys 112 , Cys 158 ) genotype. Transmission of this allele from the mother was demonstrated. From these results, we estimate the frequency of discrepancies between isoforms and common genotypes to be 0.25% in this population. Allele misclassification was caused by poor amplification of the DNA in six samples and superimposition of glycosylated and nonglycosylated apo E isoforms on isoelectric focusing gels in five samples. We conclude that the assumption of genetic homogeneity among genotype classes is valid and that misclassification due to technical difficulties is more frequent than true discordancies. © 1992 Wiley‐Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/101763/1/1370090403_ftp.pd

    Quality assessment and refinement of chromatin accessibility data using a sequence-based predictive model

    Get PDF
    Chromatin accessibility assays are central to the genome-wide identification of gene regulatory elements associated with transcriptional regulation. However, the data have highly variable quality arising from several biological and technical factors. To surmount this problem, we developed a sequence-based machine learning method to evaluate and refine chromatin accessibility data. Our framework, gapped k-mer SVM quality check (gkmQC), provides the quality metrics for a sample based on the prediction accuracy of the trained models. We tested 886 DNase-seq samples from the ENCODE/Roadmap projects to demonstrate that gkmQC can effectively identify high-quality (HQ) samples with low conventional quality scores owing to marginal read depths. Peaks identified in HQ samples are more accurately aligned at functional regulatory elements, show greater enrichment of regulatory elements harboring functional variants, and explain greater heritability of phenotypes from their relevant tissues. Moreover, gkmQC can optimize the peak-calling threshold to identify additional peaks, especially for rare cell types in single-cell chromatin accessibility data
    corecore