40 research outputs found

    Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

    Get PDF
    The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes

    An evolutionary driver of interspersed segmental duplications in primates

    Get PDF
    Background The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human–ape gene families, nuclear pore interacting protein (NPIP). Results Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis. Conclusions LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution

    A High-Quality Bonobo Genome Refines The Analysis Of Hominid Evolution

    Get PDF
    The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation(1,2). Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes(1,3-5) and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome

    A high-quality bonobo genome refines the analysis of hominid evolution

    Get PDF
    The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3,4,5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome

    A fully phased accurate assembly of an individual human genome

    Get PDF
    The prevailing genome assembly paradigm is to produce consensus sequences that "collapse" parental haplotypes into a consensus sequence. Here, we leverage the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing (Strand-seq) and combine them with high-fidelity (HiFi) long sequencing reads, in a novel reference-free workflow for diploid de novo genome assembly. Employing this strategy, we produce completely phased de novo genome assemblies separately for each haplotype of a single individual of Puerto Rican origin (HG00733) in the absence of parental data. The assemblies are accurate (QV > 40), highly contiguous (contig N50 > 25 Mbp) with low switch error rates (0.4%) providing fully phased single-nucleotide variants (SNVs), indels, and structural variants (SVs). A comparison of Oxford Nanopore and PacBio phased assemblies identifies 150 regions that are preferential sites of contig breaks irrespective of sequencing technology or phasing algorithms

    Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

    Get PDF
    Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing with continuous long-read or high-fidelity sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms

    Synthesis, X-ray diffraction and anti-proliferative biological activity of hispolon derivatives and their (η6-p-cymene)(Hispolonato)Ruthenium[II] chloride complexes

    No full text
    Hispolon is a natural product extracted from Phellinus igniarius and Phellinus linteus fungi that has previously shown antitumor activity. In this study we present the synthesis, chemical characterization and in vitro antiproliferative activity of three [(eta(6)-p-cymene)Ru(L)Cl] neutral complexes, where L = hispolon derivatives. The single crystal X-ray structures of the three Ru complexes all have the expected piano stool geometry with the p-cymene ligand at the apex of the piano stool and occupying three of the sites in a distorted-octahedral arrangement. Completing the octahedral coordination sphere are two beta-diketone oxygen atoms of the deprotonated hispolon derivatives which are coordinated in a bidentate fashion and a chlorine atom. The cytotoxicity of the three hispolon derivatives along with their corresponding complexes was studied for A549 lung, HCT116 colon and U87 glioblastoma cell lines. In the glioblastoma cell line, increase of biological activity (lower IC50) is seen for all three hispolons after arene-ruthenium complexation: 5.1 times higher for 2,3,4-trimethoxy-hispolon (Hisp8), and 1.5 times for both 3-methoxy,4-hydroxy-hispolon (Hisp1) and 3,4-dimethoxy-hispolon (Hisp4). This effect was less pronounced in the colon HCT116 cell line with complexation having 1.2-1.4 times higher biological activity. For the lung cell line A549, there was no such effect and complex formation slightly decreased the biological activity. Regardless, in the glioblastoma cell line, the presence of Hisp8, the most liposoluble agent, either as the pure compound or as ligand in the eta(6)-p-cymene-Ru complex, proved effective as an antitumor agent. More specifically, the [(eta(6)-p-cymene)Ru(Flisp8)Cl] complex showed a decrease in IC50 value for all 3 cell lines, when compared to Hisp8. We also performed molecular mechanics, DFT and docking calculations to determine the inhibitory ability of hispolon analogs towards aldehyde dehydrogenase, which targets glioblastoma stem cells. All three [(eta(6)-p-cymene)Ru(Hisp)Cl] coordination complexes of hispolon derivatives are more effective than the related neutral hispolons in the U87 MG human glioblastoma cell line. Our computational studies clearly indicate an important role for aldehyde dehydrogenase Arg139 amino acid. In addition, computational results indicate hispolonato anions are much better inhibitors of aldehyde dehydmgenase and improve the anti-glioblastoma biological activity, consistent with IC50 data
    corecore