68 research outputs found
Acorn: An R package for de novo variant analysis
BACKGROUND: The study of de novo variation is important for assessing biological characteristics of new variation and for studies related to human phenotypes. Software programs exist to call de novo variants and programs also exist to test the burden of these variants in genomic regions; however, I am unaware of a program that fits in between these two aspects of de novo variant assessment. This intermediate space is important for assessing the quality of de novo variants and to understand the characteristics of the callsets. For this reason, I developed an R package called acorn.
RESULTS: Acorn is an R package that examines various features of de novo variants including subsetting the data by individual(s), variant type, or genomic region; calculating features including variant change counts, variant lengths, and presence/absence at CpG sites; and characteristics of parental age in relation to de novo variant counts.
CONCLUSIONS: Acorn is an R package that fills a critical gap in assessing de novo variants and will be of benefit to many investigators studying de novo variation
HAT: De novo variant calling for highly accurate short-read and long-read sequencing data
MOTIVATION: de novo variants (DNVs) are variants that are present in offspring but not in their parents. DNVs are both important for examining mutation rates as well as in the identification of disease-related variation. While efforts have been made to call DNVs, calling of DNVs is still challenging from parent-child sequenced trio data. We developed Hare And Tortoise (HAT) as an automated DNV detection workflow for highly accurate short-read and long-read sequencing data. Reliable detection of DNVs is important for human genomics and HAT addresses this need.
RESULTS: HAT is a computational workflow that begins with aligned read data (i.e. CRAM or BAM) from a parent-child sequenced trio and outputs DNVs. HAT detects high-quality DNVs from Illumina short-read whole-exome sequencing, Illumina short-read whole-genome sequencing, and highly accurate PacBio HiFi long-read whole-genome sequencing data. The quality of these DNVs is high based on a series of quality metrics including number of DNVs per individual, percent of DNVs at CpG sites, and percent of DNVs phased to the paternal chromosome of origin.
AVAILABILITY AND IMPLEMENTATION: https://github.com/TNTurnerLab/HAT
The Mutationathon highlights the importance of reaching standardization in estimates of pedigree-based germline mutation rates
In the past decade, several studies have estimated the human per-generation germline mutation rate using large pedigrees. More recently, estimates for various nonhuman species have been published. However, methodological differences among studies in detecting germline mutations and estimating mutation rates make direct comparisons difficult. Here, we describe the many different steps involved in estimating pedigree-based mutation rates, including sampling, sequencing, mapping, variant calling, filtering, and appropriately accounting for false-positive and false-negative rates. For each step, we review the different methods and parameter choices that have been used in the recent literature. Additionally, we present the results from a \u27Mutationathon,\u27 a competition organized among five research labs to compare germline mutation rate estimates for a single pedigree of rhesus macaques. We report almost a twofold variation in the final estimated rate among groups using different post-alignment processing, calling, and filtering criteria, and provide details into the sources of variation across studies. Though the difference among estimates is not statistically significant, this discrepancy emphasizes the need for standardized methods in mutation rate estimations and the difficulty in comparing rates from different studies. Finally, this work aims to provide guidelines for computational and statistical benchmarks for future studies interested in identifying germline mutations from pedigrees
Single-cell epigenomics reveals mechanisms of human cortical development
During mammalian development, differences in chromatin state coincide with cellular differentiation and reflect changes in the gene regulatory landscap
A gap-free genome assembly of Chlamydomonas reinhardtii and detection of translocations induced by CRISPR-mediated mutagenesis
Genomic assemblies of the unicellular green alga Chlamydomonas reinhardtii have provided important resources for researchers. However, assembly errors, large gaps, and unplaced scaffolds as well as strain-specific variants currently impede many types of analysis. By combining PacBio HiFi and Oxford Nanopore long-read technologies, we generated a de novo genome assembly for strain CC-5816, derived from crosses of strains CC-125 and CC-124. Multiple methods of evaluating genome completeness and base-pair error rate suggest that the final telomere-to-telomere assembly is highly accurate. The CC-5816 assembly enabled previously difficult analyses that include characterization of the 17 centromeres, rDNA arrays on three chromosomes, and 56 insertions of organellar DNA into the nuclear genome. Using Nanopore sequencing, we identified sites of cytosine (CpG) methylation, which are enriched at centromeres. We analyzed CRISPR-Cas9 insertional mutants in the PF23 gene. Two of the three alleles produced progeny that displayed patterns of meiotic inviability that suggested the presence of a chromosomal aberration. Mapping Nanopore reads from pf23-2 and pf23-3 onto the CC-5816 genome showed that these two strains each carry a translocation that was initiated at the PF23 gene locus on chromosome 11 and joined with chromosomes 5 or 3, respectively. The translocations were verified by demonstrating linkage between loci on the two translocated chromosomes in meiotic progeny. The three pf23 alleles display the expected short-cilia phenotype, and immunoblotting showed that pf23-2 lacks the PF23 protein. Our CC-5816 genome assembly will undoubtedly provide an important tool for the Chlamydomonas research community
Coding and noncoding variants in EBF3 are involved in HADDS and simplex autism
BACKGROUND: Previous research in autism and other neurodevelopmental disorders (NDDs) has indicated an important contribution of protein-coding (coding) de novo variants (DNVs) within specific genes. The role of de novo noncoding variation has been observable as a general increase in genetic burden but has yet to be resolved to individual functional elements. In this study, we assessed whole-genome sequencing data in 2671 families with autism (discovery cohort of 516 families, replication cohort of 2155 families). We focused on DNVs in enhancers with characterized in vivo activity in the brain and identified an excess of DNVs in an enhancer named hs737.
RESULTS: We adapted the fitDNM statistical model to work in noncoding regions and tested enhancers for excess of DNVs in families with autism. We found only one enhancer (hs737) with nominal significance in the discovery (p = 0.0172), replication (p = 2.5 × 10
CONCLUSIONS: In this study, we identify DNVs in the hs737 enhancer in individuals with autism. Through multiple approaches, we find hs737 targets the gene EBF3 that is genome-wide significant in NDDs. By assessment of noncoding variation and the genes they affect, we are beginning to understand their impact on gene regulatory networks in NDDs
- …