Search CORE

443 research outputs found

Optical maps in guided genome assembly

Author: Leinonen Miika
Publication venue: Helsingfors universitet
Publication date: 01/01/2019
Field of study

With the introduction of DNA sequencing over 40 years ago, we have been able to take a peek at our genetic material. Even though we have had a long time to develop sequencing strategies further, we are still unable to read the whole genome in one go. Instead, we are able to gather smaller pieces of the genetic material, which we can then use to reconstruct the original genome with a process called genome assembly. As a result of the genome assembly we often obtain multiple long sequences representing different regions of the genome, which are called contigs. Even though a genome often consists of a few separate DNA molecules (chromosomes), the number of obtained contigs outnumbers them substantially, meaning our reconstruction of the genome is not perfect. The resulting contigs can afterwards be refined by ordering, orienting and scaffolding them using additional information about the genome, which is often done manually by hand. The assembly process can also be guided automatically with the additional information, and in this thesis we are introducing a method that utilizes optical maps to aid us assemble the genome more accurately. A noticeable improvement of this method is the unification of the contigs, i.e. we are left with fewer but longer contigs. We are using an existing genome assembler called Kermit, which is designed to accept genetic maps as auxiliary long range information. Our contribution is the development of an assembly pipeline that provides Kermit with similar kind of information via optical maps. The initial results of our experiments show that the proposed genome assembly scheme can take advantage of optical maps effectively already during the assembly process to guide the reconstruction of a genome

Helsingin yliopiston digitaalinen arkisto

HGGA : hierarchical guided genome assembler

Author: Salmela Leena
Walve Riku
Publication venue
Publication date: 07/05/2022
Field of study

Background De novo genome assembly typically produces a set of contigs instead of the complete genome. Thus additional data such as genetic linkage maps, optical maps, or Hi-C data is needed to resolve the complete structure of the genome. Most of the previous work uses the additional data to order and orient contigs. Results Here we introduce a framework to guide genome assembly with additional data. Our approach is based on clustering the reads, such that each read in each cluster originates from nearby positions in the genome according to the additional data. These sets are then assembled independently and the resulting contigs are further assembled in a hierarchical manner. We implemented our approach for genetic linkage maps in a tool called HGGA. Conclusions Our experiments on simulated and real Pacific Biosciences long reads and genetic linkage maps show that HGGA produces a more contiguous assembly with less contigs and from 1.2 to 9.8 times higher NGA50 or N50 than a plain assembly of the reads and 1.03 to 6.5 times higher NGA50 or N50 than a previous approach integrating genetic linkage maps with contig assembly. Furthermore, also the correctness of the assembly remains similar or improves as compared to an assembly using only the read data.Peer reviewe

PubMed Central

Helsingin yliopiston digitaalinen arkisto

Disentangled long-read de Bruijn graphs via optical maps

Author: Alipahani Bahar
Boucher Christina
Muggli Martin
Puglisi Simon J.
Salmela Leena
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2017
Field of study

Peer reviewe

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

A clone-free, single molecule map of the domestic cow (Bos taurus) genome.

Author: Bechner Michael
Goldstein Steve
Hernandez-Ortiz Juan
Medrano Juan F
Pape Louise
Patino Diego
Place Michael
Potamousis Konstantinos
Ravindran Prabu
Rincon Gonzalo
Schwartz David C
Zhou Shiguo
Publication venue: eScholarship, University of California
Publication date: 28/08/2015
Field of study

BackgroundThe cattle (Bos taurus) genome was originally selected for sequencing due to its economic importance and unique biology as a model organism for understanding other ruminants, or mammals. Currently, there are two cattle genome sequence assemblies (UMD3.1 and Btau4.6) from groups using dissimilar assembly algorithms, which were complemented by genetic and physical map resources. However, past comparisons between these assemblies revealed substantial differences. Consequently, such discordances have engendered ambiguities when using reference sequence data, impacting genomic studies in cattle and motivating construction of a new optical map resource--BtOM1.0--to guide comparisons and improvements to the current sequence builds. Accordingly, our comprehensive comparisons of BtOM1.0 against the UMD3.1 and Btau4.6 sequence builds tabulate large-to-immediate scale discordances requiring mediation.ResultsThe optical map, BtOM1.0, spanning the B. taurus genome (Hereford breed, L1 Dominette 01449) was assembled from an optical map dataset consisting of 2,973,315 (439 X; raw dataset size before assembly) single molecule optical maps (Rmaps; 1 Rmap = 1 restriction mapped DNA molecule) generated by the Optical Mapping System. The BamHI map spans 2,575.30 Mb and comprises 78 optical contigs assembled by a combination of iterative (using the reference sequence: UMD3.1) and de novo assembly techniques. BtOM1.0 is a high-resolution physical map featuring an average restriction fragment size of 8.91 Kb. Comparisons of BtOM1.0 vs. UMD3.1, or Btau4.6, revealed that Btau4.6 presented far more discordances (7,463) vs. UMD3.1 (4,754). Overall, we found that Btau4.6 presented almost double the number of discordances than UMD3.1 across most of the 6 categories of sequence vs. map discrepancies, which are: COMPLEX (misassembly), DELs (extraneous sequences), INSs (missing sequences), ITs (Inverted/Translocated sequences), ECs (extra restriction cuts) and MCs (missing restriction cuts).ConclusionAlignments of UMD3.1 and Btau4.6 to BtOM1.0 reveal discordances commensurate with previous reports, and affirm the NCBI's current designation of UMD3.1 sequence assembly as the "reference assembly" and the Btau4.6 as the "alternate assembly." The cattle genome optical map, BtOM1.0, when used as a comprehensive and largely independent guide, will greatly assist improvements to existing sequence builds, and later serve as an accurate physical scaffold for studies concerning the comparative genomics of cattle breeds

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Optical map guided genome assembly

Author: A Gurevich
A Samad
A Valouev
AK-Y Leung
B Alipanahi
BK Stöcker
DE Jarvis
ET Dimalanta
FJ Sedlazeck
H Li
H Li
HC Lin
JM Shelton
LM Mendelowitz
MD Muggli
MD Muggli
MD Muggli
MS Waterman
N Daccord
N Nagarajan
R Walve
S Beier
S Koren
S Vij
W Pan
Y Dong
Publication venue
Publication date: 06/07/2020
Field of study

Background The long reads produced by third generation sequencing technologies have significantly boosted the results of genome assembly but still, genome-wide assemblies solely based on read data cannot be produced. Thus, for example, optical mapping data has been used to further improve genome assemblies but it has mostly been applied in a post-processing stage after contig assembly. Results We proposeOpticalKermitwhich directly integrates genome wide optical maps into contig assembly. We show how genome wide optical maps can be used to localize reads on the genome and then we adapt the Kermit method, which originally incorporated genetic linkage maps to the miniasm assembler, to use this information in contig assembly. Our experimental results show that incorporating genome wide optical maps to the contig assembly of miniasm increases NGA50 while the number of misassemblies decreases or stays the same. Furthermore, when compared to the Canu assembler,OpticalKermitproduces an assembly with almost three times higher NGA50 with a lower number of misassemblies on realA. thalianareads. Conclusions OpticalKermitsuccessfully incorporates optical mapping data directly to contig assembly of eukaryotic genomes. Our results show that this is a promising approach to improve the contiguity of genome assemblies.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Reevaluating Assembly Evaluations with Feature Response Curves: GAGE and Assemblathons

Author: Mishra Bud
Narzisi Giuseppe
Vezzi Francesco
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/10/2012
Field of study

In just the last decade, a multitude of bio-technologies and software pipelines have emerged to revolutionize genomics. To further their central goal, they aim to accelerate and improve the quality of de novo whole-genome assembly starting from short DNA reads. However, the performance of each of these tools is contingent on the length and quality of the sequencing data, the structure and complexity of the genome sequence, and the resolution and quality of long-range information. Furthermore, in the absence of any metric that captures the most fundamental "features" of a high-quality assembly, there is no obvious recipe for users to select the most desirable assembler/assembly. International competitions such as Assemblathons or GAGE tried to identify the best assembler(s) and their features. Some what circuitously, the only available approach to gauge de novo assemblies and assemblers relies solely on the availability of a high-quality fully assembled reference genome sequence. Still worse, reference-guided evaluations are often both difficult to analyze, leading to conclusions that are difficult to interpret. In this paper, we circumvent many of these issues by relying upon a tool, dubbed FRCbam, which is capable of evaluating de novo assemblies from the read-layouts even when no reference exists. We extend the FRCurve approach to cases where lay-out information may have been obscured, as is true in many deBruijn-graph-based algorithms. As a by-product, FRCurve now expands its applicability to a much wider class of assemblers -- thus, identifying higher-quality members of this group, their inter-relations as well as sensitivity to carefully selected features, with or without the support of a reference sequence or layout for the reads. The paper concludes by reevaluating several recently conducted assembly competitions and the datasets that have resulted from them.Comment: Submitted to PLoS One. Supplementary material available at http://www.nada.kth.se/~vezzi/publications/supplementary.pdf and http://cs.nyu.edu/mishra/PUBLICATIONS/12.supplementaryFRC.pd

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Kermit: Guided Long Read Assembly using Coloured Overlap Graphs

Author: Rastas Pasi
Salmela Leena
Walve Riku
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
Publication date: 01/01/2018
Field of study

With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes. We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly

Dagstuhl Research Online Publication Server

Kermit : Guided Long Read Assembly using Coloured Overlap Graphs

Author: Rastas Pasi Miikka Antero
Salmela Leena Maija
Walve Riku Mikael
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2018
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

ALGORITHMS FOR THE ALIGNMENT AND VISUALIZATION OF GENOME MAPPING DATA WITH APPLICATIONS TO STRUCTURAL VARIANT DETECTION

Author: Mendelowitz Lee M.
Publication venue
Publication date: 01/01/2015
Field of study

Optical mapping and nanocoding are single molecule restriction mapping systems for interrogating genomic structure at a scale that cannot currently be achieved using DNA sequencing methods. In these mapping experiments, large DNA molecules approximately 500 kb are stretched, immobilized or confined, and then digested with a restriction endonuclease that cuts or nicks the DNA at its cognate sequence. The cut/nick sites are then observed through fluorescent microscopy and machine vision is used to estimate the length of the DNA fragments between consecutive sites. This produces, for each molecule, a barcode-like pattern comprising the ordered list of restriction fragment lengths Despite the promise of the optical mapping and nanocoding systems, there are few open source tools for working with the data generated by these platforms. Most analyses rely on custom in-house software pipelines using proprietary software. In this dissertation we present open source software tools for the alignment and vizualization of restriction mapping data. In this work we first present a review of the optical mapping and nanocoding systems and provide an overview of the current methods for aligning and assembling consensus restriction maps and their related applications. Next, we present the Maligner software for the alignment of a query restriction pattern to a reference pattern. Alignment is a fundamental problem which is the first step in many downstream analyses, such as consensus map assembly or structural variant calling. The Maligner software features both a sensitive dynamic programming implementation and a faster but less sensitive index based mode of alignment. We compare the Maligner software to other available tools for the task of aligning a sequence contig assembly to a reference optical map and for aligning single molecule maps to a reference. Next, we present a portable data visualization web application for visualizing pairwise alignments of restriction maps. Finally, we present updates to the Maligner software to support partial alignments of single molecule maps, allowing for the clustering of compatible split map alignments to identify structural variants

Digital Repository at the University of Maryland