155 research outputs found

    breakpointR:an R/Bioconductor package to localize strand state changes in Strand-seq data

    Get PDF
    MOTIVATION: Strand-seq is a specialized single-cell DNA sequencing technique centered around the directionality of single-stranded DNA. Computational tools for Strand-seq analyses must capture the strand-specific information embedded in these data. RESULTS: Here we introduce breakpointR, an R/Bioconductor package specifically tailored to process and interpret single-cell strand-specific sequencing data obtained from Strand-seq. We developed breakpointR to detect local changes in strand directionality of aligned Strand-seq data, to enable fine-mapping of sister chromatid exchanges, germline inversion and to support global haplotype assembly. Given the broad spectrum of Strand-seq applications we expect breakpointR to be an important addition to currently available tools and extend the accessibility of this novel sequencing technique. AVAILABILITY: R/Bioconductor package https://bioconductor.org/packages/breakpointR

    Dense and accurate whole-chromosome haplotyping of individual genomes

    Get PDF
    The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes

    Investigation of Indazole Unbinding Pathways in CYP2E1 by Molecular Dynamics Simulations

    Get PDF
    Human microsomal cytochrome P450 2E1 (CYP2E1) can oxidize not only low molecular weight xenobiotic compounds such as ethanol, but also many endogenous fatty acids. The crystal structure of CYP2E1 in complex with indazole reveals that the active site is deeply buried into the protein center. Thus, the unbinding pathways and associated unbinding mechanisms remain elusive. In this study, random acceleration molecular dynamics simulations combined with steered molecular dynamics and potential of mean force calculations were performed to identify the possible unbinding pathways in CYP2E1. The results show that channel 2c and 2a are most likely the unbinding channels of CYP2E1. The former channel is located between helices G and I and the B-C loop, and the latter resides between the region formed by the F-G loop, the B-C loop and the β1 sheet. Phe298 and Phe478 act as the gate keeper during indazole unbinding along channel 2c and 2a, respectively. Previous site-directed mutagenesis experiments also supported these findings

    Thymic Hyperplasia with Lymphoepithelial Sialadenitis (LESA)-Like Features: Strong Association with Lymphomas and Non-Myasthenic Autoimmune Diseases.

    Get PDF
    Thymic hyperplasia (TH) with lymphoepithelial sialadenitis (LESA)-like features (LESA-like TH) has been described as a tumor-like, benign proliferation of thymic epithelial cells and lymphoid follicles. We aimed to determine the frequency of lymphoma and autoimmunity in LESA-like TH and performed retrospective analysis of cases with LESA-like TH and/or thymic MALT-lymphoma. Among 36 patients (21 males) with LESA-like TH (age 52 years, 32-80; lesion diameter 7.0 cm, 1-14.5; median, range), five (14%) showed associated lymphomas, including four (11%) thymic MALT lymphomas and one (3%) diffuse large B-cell lymphoma. One additional case showed a clonal B-cell-receptor rearrangement without evidence of lymphoma. Twelve (33%) patients (7 women) suffered from partially overlapping autoimmune diseases: systemic lupus erythematosus (n = 4, 11%), rheumatoid arthritis (n = 3, 8%), myasthenia gravis (n = 2, 6%), asthma (n = 2, 6%), scleroderma, Sjögren syndrome, pure red cell aplasia, Grave's disease and anti-IgLON5 syndrome (each n = 1, 3%). Among 11 primary thymic MALT lymphomas, remnants of LESA-like TH were found in two cases (18%). In summary, LESA-like TH shows a striking association with autoimmunity and predisposes to lymphomas. Thus, a hematologic and rheumatologic workup should become standard in patients diagnosed with LESA-like TH. Radiologists and clinicians should be aware of LESA-like TH as a differential diagnosis for mediastinal mass lesions in patients with autoimmune diseases

    Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

    Get PDF
    The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes

    Inversion polymorphism in a complete human genome assembly

    Get PDF
    The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1-23.1, and 22q11.21

    Functional analysis of structural variants in single cells using Strand-seq

    Full text link
    Somatic structural variants (SVs) are widespread in cancer, but their impact on disease evolution is understudied due to a lack of methods to directly characterize their functional consequences. We present a computational method, scNOVA, which uses Strand-seq to perform haplotype-aware integration of SV discovery and molecular phenotyping in single cells by using nucleosome occupancy to infer gene expression as a readout. Application to leukemias and cell lines identifies local effects of copy-balanced rearrangements on gene deregulation, and consequences of SVs on aberrant signaling pathways in subclones. We discovered distinct SV subclones with dysregulated Wnt signaling in a chronic lymphocytic leukemia patient. We further uncovered the consequences of subclonal chromothripsis in T cell acute lymphoblastic leukemia, which revealed c-Myb activation, enrichment of a primitive cell state and informed successful targeting of the subclone in cell culture, using a Notch inhibitor. By directly linking SVs to their functional effects, scNOVA enables systematic single-cell multiomic studies of structural variation in heterogeneous cell populations

    Linking Inflammation to Natural Killer T Cell Activation

    Get PDF
    Immune activation is often associated with inflammation, but inflammation's role in the expansion of antigen-specific immune responses remains unclear. This primer focuses on recent findings that show how specific natural killer T cells are activated by inflammatory messengers, thus illuminating the cellular and molecular links between immunity and inflammation

    Gaps and complex structurally variant loci in phased genome assemblies

    Get PDF
    There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation
    corecore