47 research outputs found

    Acorn: An R package for de novo variant analysis

    Get PDF
    BACKGROUND: The study of de novo variation is important for assessing biological characteristics of new variation and for studies related to human phenotypes. Software programs exist to call de novo variants and programs also exist to test the burden of these variants in genomic regions; however, I am unaware of a program that fits in between these two aspects of de novo variant assessment. This intermediate space is important for assessing the quality of de novo variants and to understand the characteristics of the callsets. For this reason, I developed an R package called acorn. RESULTS: Acorn is an R package that examines various features of de novo variants including subsetting the data by individual(s), variant type, or genomic region; calculating features including variant change counts, variant lengths, and presence/absence at CpG sites; and characteristics of parental age in relation to de novo variant counts. CONCLUSIONS: Acorn is an R package that fills a critical gap in assessing de novo variants and will be of benefit to many investigators studying de novo variation

    HAT: De novo variant calling for highly accurate short-read and long-read sequencing data

    Get PDF
    MOTIVATION: de novo variants (DNVs) are variants that are present in offspring but not in their parents. DNVs are both important for examining mutation rates as well as in the identification of disease-related variation. While efforts have been made to call DNVs, calling of DNVs is still challenging from parent-child sequenced trio data. We developed Hare And Tortoise (HAT) as an automated DNV detection workflow for highly accurate short-read and long-read sequencing data. Reliable detection of DNVs is important for human genomics and HAT addresses this need. RESULTS: HAT is a computational workflow that begins with aligned read data (i.e. CRAM or BAM) from a parent-child sequenced trio and outputs DNVs. HAT detects high-quality DNVs from Illumina short-read whole-exome sequencing, Illumina short-read whole-genome sequencing, and highly accurate PacBio HiFi long-read whole-genome sequencing data. The quality of these DNVs is high based on a series of quality metrics including number of DNVs per individual, percent of DNVs at CpG sites, and percent of DNVs phased to the paternal chromosome of origin. AVAILABILITY AND IMPLEMENTATION: https://github.com/TNTurnerLab/HAT

    Single-cell epigenomics reveals mechanisms of human cortical development

    Get PDF
    During mammalian development, differences in chromatin state coincide with cellular differentiation and reflect changes in the gene regulatory landscap

    A gap-free genome assembly of Chlamydomonas reinhardtii and detection of translocations induced by CRISPR-mediated mutagenesis

    Get PDF
    Genomic assemblies of the unicellular green alga Chlamydomonas reinhardtii have provided important resources for researchers. However, assembly errors, large gaps, and unplaced scaffolds as well as strain-specific variants currently impede many types of analysis. By combining PacBio HiFi and Oxford Nanopore long-read technologies, we generated a de novo genome assembly for strain CC-5816, derived from crosses of strains CC-125 and CC-124. Multiple methods of evaluating genome completeness and base-pair error rate suggest that the final telomere-to-telomere assembly is highly accurate. The CC-5816 assembly enabled previously difficult analyses that include characterization of the 17 centromeres, rDNA arrays on three chromosomes, and 56 insertions of organellar DNA into the nuclear genome. Using Nanopore sequencing, we identified sites of cytosine (CpG) methylation, which are enriched at centromeres. We analyzed CRISPR-Cas9 insertional mutants in the PF23 gene. Two of the three alleles produced progeny that displayed patterns of meiotic inviability that suggested the presence of a chromosomal aberration. Mapping Nanopore reads from pf23-2 and pf23-3 onto the CC-5816 genome showed that these two strains each carry a translocation that was initiated at the PF23 gene locus on chromosome 11 and joined with chromosomes 5 or 3, respectively. The translocations were verified by demonstrating linkage between loci on the two translocated chromosomes in meiotic progeny. The three pf23 alleles display the expected short-cilia phenotype, and immunoblotting showed that pf23-2 lacks the PF23 protein. Our CC-5816 genome assembly will undoubtedly provide an important tool for the Chlamydomonas research community

    Coding and noncoding variants in EBF3 are involved in HADDS and simplex autism

    Get PDF
    BACKGROUND: Previous research in autism and other neurodevelopmental disorders (NDDs) has indicated an important contribution of protein-coding (coding) de novo variants (DNVs) within specific genes. The role of de novo noncoding variation has been observable as a general increase in genetic burden but has yet to be resolved to individual functional elements. In this study, we assessed whole-genome sequencing data in 2671 families with autism (discovery cohort of 516 families, replication cohort of 2155 families). We focused on DNVs in enhancers with characterized in vivo activity in the brain and identified an excess of DNVs in an enhancer named hs737. RESULTS: We adapted the fitDNM statistical model to work in noncoding regions and tested enhancers for excess of DNVs in families with autism. We found only one enhancer (hs737) with nominal significance in the discovery (p = 0.0172), replication (p = 2.5 × 10 CONCLUSIONS: In this study, we identify DNVs in the hs737 enhancer in individuals with autism. Through multiple approaches, we find hs737 targets the gene EBF3 that is genome-wide significant in NDDs. By assessment of noncoding variation and the genes they affect, we are beginning to understand their impact on gene regulatory networks in NDDs

    ACES: Analysis of Conservation with an Extensive list of Species

    Get PDF
    MOTIVATION: An abundance of new reference genomes is becoming available through large-scale sequencing efforts. While the reference FASTA for each genome is available, there is currently no automated mechanism to query a specific sequence across all new reference genomes. RESULTS: We developed ACES (Analysis of Conservation with an Extensive list of Species) as a computational workflow to query specific sequences of interest (e.g., enhancers, promoters, exons) against reference genomes with an available reference FASTA. This automated workflow generates BLAST hits against each of the reference genomes, a multiple sequence alignment file, a graphical fragment assembly file, and a phylogenetic tree file. These data files can then be used by the researcher in several ways to provide key insights into conservation of the query sequence. AVAILABILITY: ACES is available at https://github.com/TNTurnerLab/ACES. SUPPLEMENTARY INFORMATION: Supplementary Figure 1 is available online in Bioinformatics

    Genetic counseling as preventive intervention: Toward individual specification of transgenerational autism risk

    Get PDF
    BACKGROUND: Although autism spectrum disorders (ASD) are among the most heritable of all neuropsychiatric syndromes, most affected children are born to unaffected parents. Recently, we reported an average increase of 3-5% over general population risk of ASD among offspring of adults who have first-degree relatives with ASD in a large epidemiologic family sample. A next essential step is to investigate whether there are measurable characteristics of individual parents placing them at higher or lower recurrence risk, as this information could allow more personalized genetic counseling. METHODS: We assembled what is to our knowledge the largest collection of data on the ability of four measurable characteristics of unaffected prospective parents to specify risk for autism among their offspring: (1) sub clinical autistic trait burden, (2) parental history of a sibling with ASD, (3) transmitted autosomal molecular genetic abnormalities, and (4) parental age. Leveraging phenotypic and genetic data in curated family cohorts, we evaluate the respective associations between these factors and child outcome when autism is present in the family in the parental generation. RESULTS: All four characteristics were associated with elevation in offspring risk; however, the magnitude of their predictive power-with the exception of isolated rare inherited pathogenic variants -does not yet reach a threshold that would typically be considered actionable for reproductive decision-making. CONCLUSIONS: Individual specification of risk to offspring of adults in ASD-affected families is not straightforwardly improved by ascertainment of parental phenotype, and it is not yet clear whether genomic screening of prospective parents in families affected by idiopathic ASD is warranted as a clinical standard. Systematic screening of affected family members for heritable pathogenic variants, including rare sex-linked mutations, will identify a subset of families with substantially elevated transmission risk. Polygenic risk scores are only weakly predictive at this time but steadily improving and ultimately may enable more robust prediction either singly or when combined with the risk variables examined in this study
    corecore