160 research outputs found

    Copy number variant detection in inbred strains from short read sequence data

    Get PDF
    Summary: We have developed an algorithm to detect copy number variants (CNVs) in homozygous organisms, such as inbred laboratory strains of mice, from short read sequence data. Our novel approach exploits the fact that inbred mice are homozygous at virtually every position in the genome to detect CNVs using a hidden Markov model (HMM). This HMM uses both the density of sequence reads mapped to the genome, and the rate of apparent heterozygous single nucleotide polymorphisms, to determine genomic copy number. We tested our algorithm on short read sequence data generated from re-sequencing chromosome 17 of the mouse strains A/J and CAST/EiJ with the Illumina platform. In total, we identified 118 copy number variants (43 for A/J and 75 for CAST/EiJ). We investigated the performance of our algorithm through comparison to CNVs previously identified by array-comparative genomic hybridization (array CGH). We performed quantitative-PCR validation on a subset of the calls that differed from the array CGH data sets

    Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes

    Get PDF
    Myelodysplastic syndromes (MDS) are hematopoietic stem cell disorders that often progress to chemotherapy-resistant secondary acute myeloid leukemia (sAML). We used whole-genome sequencing to perform an unbiased comprehensive screen to discover the somatic mutations in a sample from an individual with sAML and genotyped the loci containing these mutations in the matched MDS sample. Here we show that a missense mutation affecting the serine at codon 34 (Ser34) in U2AF1 was recurrently present in 13 out of 150 (8.7%) subjects with de novo MDS, and we found suggestive evidence of an increased risk of progression to sAML associated with this mutation. U2AF1 is a U2 auxiliary factor protein that recognizes the AG splice acceptor dinucleotide at the 3' end of introns, and the alterations in U2AF1 are located in highly conserved zinc fingers of this protein. Mutant U2AF1 promotes enhanced splicing and exon skipping in reporter assays in vitro. This previously unidentified, recurrent mutation in U2AF1 implicates altered pre-mRNA splicing as a potential mechanism for MDS pathogenesis

    Integrated analysis of germline and somatic variants in ovarian cancer

    Get PDF
    We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyze germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2, and PALB2. Additionally, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B, and MLL3). Evidence for loss of heterozygosity was found in 100% and 76% of cases with germline BRCA1 and BRCA2 truncations respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 237 candidate functional germline truncation and missense variants, including 2 pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK, and MLL pathways

    A Common and Unstable Copy Number Variant Is Associated with Differences in Glo1 Expression and Anxiety-Like Behavior

    Get PDF
    Glyoxalase 1 (Glo1) has been implicated in anxiety-like behavior in mice and in multiple psychiatric diseases in humans. We used mouse Affymetrix exon arrays to detect copy number variants (CNV) among inbred mouse strains and thereby identified a ∼475 kb tandem duplication on chromosome 17 that includes Glo1 (30,174,390–30,651,226 Mb; mouse genome build 36). We developed a PCR-based strategy and used it to detect this duplication in 23 of 71 inbred strains tested, and in various outbred and wild-caught mice. Presence of the duplication is associated with a cis-acting expression QTL for Glo1 (LOD>30) in BXD recombinant inbred strains. However, evidence for an eQTL for Glo1 was not obtained when we analyzed single SNPs or 3-SNP haplotypes in a panel of 27 inbred strains. We conclude that association analysis in the inbred strain panel failed to detect an eQTL because the duplication was present on multiple highly divergent haplotypes. Furthermore, we suggest that non-allelic homologous recombination has led to multiple reversions to the non-duplicated state among inbred strains. We show associations between multiple duplication-containing haplotypes, Glo1 expression and anxiety-like behavior in both inbred strain panels and outbred CD-1 mice. Our findings provide a molecular basis for differential expression of Glo1 and further implicate Glo1 in anxiety-like behavior. More broadly, these results identify problems with commonly employed tests for association in inbred strains when CNVs are present. Finally, these data provide an example of biologically significant phenotypic variability in model organisms that can be attributed to CNVs

    Structural variation in the chicken genome identified by paired-end next-generation DNA sequencing of reduced representation libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs) to kilobase, and even megabase, sized structural variants (SVs), such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and mice, species in which they exert significant effects on phenotypes, very little is known about the extent of SVs in the 2.5-times smaller and less repetitive genome of the chicken.</p> <p>Results</p> <p>We identified hundreds of shared and divergent SVs in four commercial chicken lines relative to the reference chicken genome. The majority of SVs were found in intronic and intergenic regions, and we also found SVs in the coding regions. To identify the SVs, we combined high-throughput short read paired-end sequencing of genomic reduced representation libraries (RRLs) of pooled samples from 25 individuals and computational mapping of DNA sequences from a reference genome.</p> <p>Conclusion</p> <p>We provide a first glimpse of the high abundance of small structural genomic variations in the chicken. Extrapolating our results, we estimate that there are thousands of rearrangements in the chicken genome, the majority of which are located in non-coding regions. We observed that structural variation contributes to genetic differentiation among current domesticated chicken breeds and the Red Jungle Fowl. We expect that, because of their high abundance, SVs might explain phenotypic differences and play a role in the evolution of the chicken genome. Finally, our study exemplifies an efficient and cost-effective approach for identifying structural variation in sequenced genomes.</p

    Integrated genomics of susceptibility to alkylator-induced leukemia in mice

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Therapy-related acute myeloid leukemia (t-AML) is a secondary, generally incurable, malignancy attributable to chemotherapy exposure. Although there is a genetic component to t-AML susceptibility in mice, the relevant loci and the mechanism(s) by which they contribute to t-AML are largely unknown. An improved understanding of susceptibility factors and the biological processes in which they act may lead to the development of t-AML prevention strategies.</p> <p>Results</p> <p>In this work we applied an integrated genomics strategy in inbred strains of mice to find novel factors that might contribute to susceptibility. We found that the pre-exposure transcriptional state of hematopoietic stem/progenitor cells predicts susceptibility status. More than 900 genes were differentially expressed between susceptible and resistant strains and were highly enriched in the apoptotic program, but it remained unclear which genes, if any, contribute directly to t-AML susceptibility. To address this issue, we integrated gene expression data with genetic information, including single nucleotide polymorphisms (SNPs) and DNA copy number variants (CNVs), to identify genetic networks underlying t-AML susceptibility. The 30 t-AML susceptibility networks we found are robust: they were validated in independent, previously published expression data, and different analytical methods converge on them. Further, the networks are enriched in genes involved in cell cycle and DNA repair (pathways not discovered in traditional differential expression analysis), suggesting that these processes contribute to t-AML susceptibility. Within these networks, the putative regulators (e.g., <it>Parp2</it>, <it>Casp9</it>, <it>Polr1b</it>) are the most likely to have a non-redundant role in the pathogenesis of t-AML. While identifying these networks, we found that current CNVR and SNP-based haplotype maps in mice represented distinct sources of genetic variation contributing to expression variation, implying that mapping studies utilizing either source alone will have reduced sensitivity.</p> <p>Conclusion</p> <p>The identification and prioritization of genes and networks not previously implicated in t-AML generates novel hypotheses on the biology and treatment of this disease that will be the focus of future research.</p

    Genomic characteristics of cattle copy number variations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Copy number variation (CNV) represents another important source of genetic variation complementary to single nucleotide polymorphism (SNP). High-density SNP array data have been routinely used to detect human CNVs, many of which have significant functional effects on gene expression and human diseases. In the dairy industry, a large quantity of SNP genotyping results are becoming available and can be used for CNV discovery to understand and accelerate genetic improvement for complex traits.</p> <p>Results</p> <p>We performed a systematic analysis of CNV using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the pedigree information, we identified 682 candidate CNV regions, which represent 139.8 megabases (~4.60%) of the genome. Selected CNVs were further experimentally validated and we found that copy number "gain" CNVs were predominantly clustered in tandem rather than existing as interspersed duplications. Many CNV regions (~56%) overlap with cattle genes (1,263), which are significantly enriched for immunity, lactation, reproduction and rumination. The overlap of this new dataset and other published CNV studies was less than 40%; however, our discovery of large, high frequency (> 5% of animals surveyed) CNV regions showed 90% agreement with other studies. These results highlight the differences and commonalities between technical platforms.</p> <p>Conclusions</p> <p>We present a comprehensive genomic analysis of cattle CNVs derived from SNP data which will be a valuable genomic variation resource. Combined with SNP detection assays, gene-containing CNV regions may help identify genes undergoing artificial selection in domesticated animals.</p

    A Comprehensive Genetic Analysis of Candidate Genes Regulating Response to Trypanosoma congolense Infection in Mice

    Get PDF
    About one-third of cattle in sub-Saharan Africa are at risk of contracting “Nagana”—a disease caused by Trypanosoma parasites similar to those that cause human “Sleeping Sickness.” Laboratory mice can also be infected by trypanosomes, and different mouse breeds show varying levels of susceptibility to infection, similar to what is seen between different breeds of cattle. Survival time after infection is controlled by the underlying genetics of the mouse breed, and previous studies have localised three genomic regions that regulate this trait. These three “Quantitative Trait Loci” (QTL), which have been called Tir1, Tir2 and Tir3 (for Trypanosoma Infection Response 1–3) are well defined, but nevertheless still contain over one thousand genes, any number of which may be influencing survival. This study has aimed to identify the specific differences associated with genes that are controlling mouse survival after T. congolense infection. We have applied a series of analyses to existing datasets, and combined them with novel sequencing, and other genetic data to create short lists of genes that share polymorphisms across susceptible mouse breeds, including two promising “candidate genes”: Pram1 at Tir1 and Cd244 at Tir3. These genes can now be tested to confirm their effect on response to trypanosome infection
    corecore