204 research outputs found

    Genome Assembly: Novel Applications by Harnessing Emerging Sequencing Technologies and Graph Algorithms

    Get PDF
    Genome assembly is a critical first step for biological discovery. All current sequencing technologies share the fundamental limitation that segments read from a genome are much shorter than even the smallest genomes. Traditionally, whole- genome shotgun (WGS) sequencing over-samples a single clonal (or inbred) target chromosome with segments from random positions. The amount of over-sampling is known as the coverage. Assembly software then reconstructs the target. So called next-generation (or second-generation) sequencing has reduced the cost and increased throughput exponentially over first-generation sequencing. Unfortunately, next-generation sequences present their own challenges to genome assembly: (1) they require amplification of source DNA prior to sequencing leading to artifacts and biased coverage of the genome; (2) they produce relatively short reads: 100bp- 700bp; (3) the sizeable runtime of most second-generation instruments is prohibitive for applications requiring rapid analysis, with an Illumina HiSeq 2000 instrument requiring 11 days for the sequencing reaction. Recently, successors to the second-generation instruments (third-generation) have become available. These instruments promise to alleviate many of the down- sides of second-generation sequencing and can generate multi-kilobase sequences. The long sequences have the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of these reads is challenging and has limited their use. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. Our approach achieves over 99% read accuracy and produces substantially better assemblies than current sequencing strategies. The availability of cheaper sequencing has made new sequencing targets, such as multiple displacement amplified (MDA) single-cells and metagenomes, popular. Current algorithms assume assembly of a single clonal target, an assumption that is violated in these sequencing projects. We developed Bambus 2, a new scaffolder that works for metagenomics and single cell datasets. It can accurately detect repeats without assumptions about the taxonomic composition of a dataset. It can also identify biological variations present in a sample. We have developed a novel end-to-end analysis pipeline leveraging Bambus 2. Due to its modular nature, it is applicable to clonal, metagenomic, and MDA single-cell targets and allows a user to rapidly go from sequences to assembly, annotation, genes, and taxonomic info. We have incorporated a novel viewer, allowing a user to interactively explore the variation present in a genomic project on a laptop. Together, these developments make genome assembly applicable to novel targets while utilizing emerging sequencing technologies. As genome assembly is critical for all aspects of bioinformatics, these developments will enable novel biological discovery

    Assembly algorithms for next-generation sequencing data

    Get PDF
    AbstractThe emergence of next-generation sequencing platforms led to resurgence of research in whole-genome shotgun assembly algorithms and software. DNA sequencing data from the Roche 454, Illumina/Solexa, and ABI SOLiD platforms typically present shorter read lengths, higher coverage, and different error profiles compared with Sanger sequencing data. Since 2005, several assembly software packages have been created or revised specifically for de novo assembly of next-generation sequencing data. This review summarizes and compares the published descriptions of packages named SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo. More generally, it compares the two standard methods known as the de Bruijn graph approach and the overlap/layout/consensus approach to assembly

    Stable optical oxygen sensing materials based on click-coupling of fluorinated platinum(II) and palladium(II) porphyrins—A convenient way to eliminate dye migration and leaching

    Get PDF
    AbstractNucleophilic substitution of the labile para-fluorine atoms of 2,3,4,5,6-pentafluorophenyl groups enables a click-based covalent linkage of an oxygen indicator (platinum(II) or palladium(II) 5,10,15,20-meso-tetrakis-(2,3,4,5,6-pentafluorophenyl)-porphyrin) to the sensor matrix. Copolymers of styrene and pentafluorostyrene are chosen as polymeric materials. Depending on the reaction conditions either soluble sensor materials or cross-linked microparticles are obtained. Additionally, we prepared Ormosil-based sensors with linked indicator, which showed very high sensitivity toward oxygen. The effect of covalent coupling on sensor characteristics, stability and photophysical properties is studied. It is demonstrated that leaching and migration of the dye are eliminated in the new materials but excellent photophysical properties of the indicators are preserved

    Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph

    Full text link
    Despite recent advances in the length and the accuracy of long-read data, building haplotype-resolved genome assemblies from telomere to telomere still requires considerable computational resources. In this study, we present an efficient de novo assembly algorithm that combines multiple sequencing technologies to scale up population-wide telomere-to-telomere assemblies. By utilizing twenty-two human and two plant genomes, we demonstrate that our algorithm is around an order of magnitude cheaper than existing methods, while producing better diploid and haploid assemblies. Notably, our algorithm is the only feasible solution to the haplotype-resolved assembly of polyploid genomes.Comment: 14 pages, 4 fuhire

    The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation.

    Get PDF
    Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is in part created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-type lectin receptors encoded within the NK complex (NKC). Little is known about the gene content of the NKC beyond rodent and primate lineages, other than it appears to be extremely variable between mammalian groups. We compared the NKC structure between mammalian species using new high-quality draft genome assemblies for cattle and goat; re-annotated sheep, pig, and horse genome assemblies; and the published human, rat, and mouse lemur NKC. The major NKC genes are largely in the equivalent positions in all eight species, with significant independent expansions and deletions between species, allowing us to propose a model for NKC evolution during mammalian radiation. The ruminant species, cattle and goats, have independently evolved a second KLRC locus flanked by KLRA and KLRJ, and a novel KLRH-like gene has acquired an activating tail. This novel gene has duplicated several times within cattle, while other activating receptor genes have been selectively disrupted. Targeted genome enrichment in cattle identified varying levels of allelic polymorphism between the NKC genes concentrated in the predicted extracellular ligand-binding domains. This novel recombination and allelic polymorphism is consistent with NKC evolution under balancing selection, suggesting that this diversity influences individual immune responses and may impact on differential outcomes of pathogen infection and vaccination

    Assemblathon 1: A competitive assessment of de novo short read assembly methods

    Get PDF
    Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/

    Poor fruit set due to lack of pollinators in Aristolochia manshuriensis (Aristolochiaceae)

    Get PDF
    Background and aims – Interactions of insects with trap flowers of Aristolochia manshuriensis, a relic woody liana with fragmented natural populations from south-eastern Russia, were studied. Pollination experiments were conducted to identify the causes of the poor fruit set in this plant.Material and methods – The study was carried out at two ex situ sites within the natural range of A. manshuriensis in the suburban zone of the city of Vladivostok (Russia). The floral morphology was examined to verify how it may affect the process of pollination in this species. To test for a probability of self-pollination, randomly selected flowers at the female phase of anthesis (day 1 of limb opening) were hand-pollinated with pollen from the same plant. The daily insect visitation was studied. The pollen limitation coefficient and the number of visitors to the flowers were determined. To identify insects that lay eggs on the flowers, the insects were reared from eggs collected from fallen flowers. Both caught and reared insects were identified.Key results – The floral morphology and the colour pattern of A. manshuriensis are adapted to temporarily trap insects of a certain size. The hand-pollination experiment showed that flowers of this plant are capable of self-pollination by geitonogamy and require a pollinator for successful pollination. The positive value (2.64) for the pollen limitation coefficient indicates a higher fruit set after hand-pollination compared to the control without pollination. The number of visitors to the flowers was low (0.17 visitors per flower per day). Insects from three orders were observed on the flowers: Diptera (up to 90.9%), Coleoptera (8.3%), and Hymenoptera (0.8%). Four species of flies (Scaptomyza pallida, Drosophila transversa (Drosophilidae), Botanophila fugax, and Botanophila sp. 1 (Anthomyiidae)) are capable of transferring up to 2500–4000 pollen grains on their bodies and can be considered as pollinators of A. manshuriensis. Data of the rearing experiment indicate that flies of the families Drosophilidae (S. pallida, D. transversa), Chloropidae (Elachiptera tuberculifera, E. sibirica, and Conioscinella divitis), and Anthomyiidae (B. fugax, B. sp. 1) use A. manshuriensis flowers to lay eggs. Beetles were also collected from the flowers, but they were probably not involved in pollination, because no pollen grains were observed on them during our study.Conclusions – Pollinators of A. manshuriensis include mainly Diptera that lay eggs on the flowers. The poor fruit set (2%) in A. manshuriensis is associated with pollen limitation due to the lack of pollinators, as the number of visitors to flowers was extremely low. This may be due to the fact that the flowers of this species are highly specialized on insects of a certain size for pollination

    Automated ensemble assembly and validation of microbial genomes

    Get PDF
    The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.https://doi.org/10.1186/1471-2105-15-12
    • …
    corecore