131 research outputs found

    HGGA : hierarchical guided genome assembler

    Get PDF
    Background De novo genome assembly typically produces a set of contigs instead of the complete genome. Thus additional data such as genetic linkage maps, optical maps, or Hi-C data is needed to resolve the complete structure of the genome. Most of the previous work uses the additional data to order and orient contigs. Results Here we introduce a framework to guide genome assembly with additional data. Our approach is based on clustering the reads, such that each read in each cluster originates from nearby positions in the genome according to the additional data. These sets are then assembled independently and the resulting contigs are further assembled in a hierarchical manner. We implemented our approach for genetic linkage maps in a tool called HGGA. Conclusions Our experiments on simulated and real Pacific Biosciences long reads and genetic linkage maps show that HGGA produces a more contiguous assembly with less contigs and from 1.2 to 9.8 times higher NGA50 or N50 than a plain assembly of the reads and 1.03 to 6.5 times higher NGA50 or N50 than a previous approach integrating genetic linkage maps with contig assembly. Furthermore, also the correctness of the assembly remains similar or improves as compared to an assembly using only the read data.Peer reviewe

    Cost-Benefit Analysis in a Framework of Stakeholder Involvement and Integrated Coastal Zone Modeling

    Get PDF
    Active involvement of local stakeholders is currently an increasingly important requirement in European environmental regulations such as the EU Water Framework Directive (WFD) and the EU Marine Strategy Framework Directive (MSFD). The same is true for economic analyses such as cost-benefit analysis (CBA). For example, the Swedish WFD implementation requires i) quantification of cost and benefits of proposed measures and ii) stakeholder involvement. How can these two requirements be integrated in practice? And can such requirements facilitate implementation of projects with a potential net benefit? This paper presents a stepwise CBA procedure with participatory elements and applies it for evaluating nutrient management options for reducing eutrophication effects in the coastal area of Himmerfjärden SW of Stockholm, Sweden. The CBA indicates a positive net benefit for a combination of options involving increased nitrogen removal at a major sewage treatment plant, creation of new wetlands and connecting a proportion of private sewers to sewage treatment plants. The procedure also illustrates how the interdisciplinary development of a coupled ecological-economic simulation model can be used as a tool for facilitating the involvement of stakeholders in a CBA.cost-benefit analysis; stakeholder involvement; integrated modeling; eutrophication

    Kermit: Linkage map guided long read assembly

    Get PDF
    Background: With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes. Results: We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly. Conclusions: We present the first method to integrate linkage maps directly into genome assembly. With a modest increase in runtime, our method improves contiguity and correctness of genome assembly.Peer reviewe

    Space-Efficient Indexing of Spaced Seeds for Accurate Overlap Computation of Raw Optical Mapping Data

    Get PDF
    A key problem in processing raw optical mapping data (Rmaps) is finding Rmaps originating from the same genomic region. These sets of related Rmaps can be used to correct errors in Rmap data, and to find overlaps between Rmaps to assemble consensus optical maps. Previous Rmap overlap aligners are computationally very expensive and do not scale to large eukaryotic data sets. We present SELKIE, an Rmap overlap aligner based on a spaced (l,k)-mer index which was pioneered in the Rmap error correction tool ELMER. Here we present a space efficient version of the index which is twice as fast as prior art while using just a quarter of the memory on a human data set. Moreover, our index can be used for filtering candidates for Rmap overlap computation, whereas ELMERI used the index only for error correction of Rmaps. By combining our filtering of Rmaps with the exhaustive, but highly accurate, algorithm of Valouev etal. (2006), SELKIE maintains or increases the accuracy of finding overlapping Rmaps on a bacterial dataset while being at least four times faster. Furthermore, for finding overlaps in a human dataset, SELKIE is up to two orders of magnitude faster than previous methods.Peer reviewe

    Kermit: Guided Long Read Assembly using Coloured Overlap Graphs

    Get PDF
    With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes. We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly

    Variant Genotyping with Gap Filling

    Get PDF
    Although recent developments in DNA sequencing have allowed for great leaps in both the quality and quantity of genome assembly projects, de novo assemblies still lack the efficiency and accuracy required for studying individual genomes. Thus, efficient and accurate methods for calling and genotyping structural variations are still needed. Structural variations are variations between genomes that are longer than a single nucleotide, i.e. they affect the structure of a genome as opposed to affecting only the content. Structural variations exist in many different types. By finding the structural variations between a donor genome and a high quality reference genome, genotyping the variations becomes the only required genome assembly step. The hardest of the structural variations to genotype is the insertion variant, which requires assembly to genotype; genotyping the other variants require different transformations of the reference genome. The methods currently used for constructing insertion variants are fairly basic; they are mostly linked to variation calling methods and are only able to construct small insertions. A subproblem in genome assembly, the gap filling problem, provides techniques that are very applicable to insertion genotyping. Yet there are currently no tools that take full advantage of the solution space. Gap filling takes the context and length of a missing sequence in a genome assembly and attempts to assemble the sequence. This thesis shows how gap filling can be used to assemble the insertion variants by modeling the problem of insertion genotyping as finding a path in de Bruijn graph that has approximately the estimated length of the insertion

    Kermit : Guided Long Read Assembly using Coloured Overlap Graphs

    Get PDF
    Peer reviewe

    The Effect of Optical Properties on Secchi Depth and Implications for Eutrophication Management

    Get PDF
    Successful management of coastal environments requires reliable monitoring methods and indicators. Besides Chlorophyll-a concentration (Chl-a), water transparency measured as Secchi Depth (ZSD) is widely used in Baltic Sea management for water quality assessment as eutrophication indicator. However, in many coastal waters not only phytoplankton but also colored dissolved organic matter (CDOM) and suspended particulate matter (SPM) influence the under-water light field, and therefore the ZSD. In this study all three main optical variables (CDOM, Chl-a, and SPM [organic and inorganic]) as well as ZSD were measured in three Swedish regions: the Bothnian Sea, the Baltic Proper, and the Skagerrak in 2010–2014. Regional multiple regressions with Chl-a, CDOM, and inorganic SPM as predictors explained the variations in ZSD well (Radj2 = 0.53–0.84). Commonality analyses of the regressions indicated considerable differences between regions regarding the contribution of each factor to the variance, Radj2, in ZSD. CDOM explained most of the variance in the Bothnian Sea and the Skagerrak; in general, Chl-a contributed only modestly to the ZSD variance. In the Baltic Proper the largest contribution was from the interaction of all three variables. As expected, the link between Chl-a and ZSD was much weaker in the Bothnian Sea with high CDOM absorption and SPM concentration. When applying the Swedish EU Water Framework Directive threshold for Good/Moderate Chl-a status in the models it was shown that ZSD is neither a sufficient indicator for eutrophication, nor for changes in Chl-a. Natural coastal gradients in CDOM and SPM influence the reference conditions for ZSD and other eutrophication indicators, such as the depth distribution of macro-algae. Hence, setting targets for these indicators based on reference Chl-a concentrations and simple Chl-a to ZSD relationships might in some cases be inappropriate and misleading due to overestimation of water transparency under natural conditions
    • …
    corecore