Search CORE

189 research outputs found

On Greedy Algorithms for Binary de Bruijn Sequences

Author: Chang Zuling
Ezerman Martianus Frederic
Fahreza Adamas Aqsa
Publication venue
Publication date: 01/01/2020
Field of study

We propose a general greedy algorithm for binary de Bruijn sequences, called Generalized Prefer-Opposite (GPO) Algorithm, and its modifications. By identifying specific feedback functions and initial states, we demonstrate that most previously-known greedy algorithms that generate binary de Bruijn sequences are particular cases of our new algorithm

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

A machine learning approach to DNA shotgun sequence assembly

Author: Constantinescu Radu-Ionut
Publication venue
Publication date: 01/01/2016
Field of study

Wits Institutional Repository on DSPACE

A maximum likelihood approach to genome assembly

Author: Baruzzo Giacomo
Publication venue
Publication date: 08/04/2022
Field of study

De novo genome assembly is the bioinformatics' problem to reconstruct the original molecule from its sub-sequences, with no previous knowledge on DNA. Inspired by the maximum likelihood approach, recently a new experimental approach was developed. In this thesis, for the first time this new stochastic approach has been implemented into a software assembler. A parallel software has been developed in order to obtain a first experimental validation of the model, testing also some artificial dataope

Padua Thesis and Dissertation Archive

Synteny Paths for Assembly Graphs Comparison

Author: Kolmogorov Mikhail
Polevikov Evgeny
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

Despite the recent developments of long-read sequencing technologies, it is still difficult to produce complete assemblies of eukaryotic genomes in an automated fashion. Genome assembly software typically output assembled fragments (contigs) along with assembly graphs, that encode all possible layouts of these contigs. Graph representation of the assembled genome can be useful for gene discovery, haplotyping, structural variations analysis and other applications. To facilitate the development of new graph-based approaches, it is important to develop algorithms for comparison and evaluation of assembly graphs produced by different software. In this work, we introduce synteny paths: maximal paths of homologous sequence between the compared assembly graphs. We describe Asgan - an algorithm for efficient synteny paths decomposition, and use it to evaluate assembly graphs of various bacterial assemblies produced by different approaches. We then apply Asgan to discover structural variations between the assemblies of 15 Drosophila genomes, and show that synteny paths are robust to contig fragmentation. The Asgan tool is freely available at: https://github.com/epolevikov/Asgan

Dagstuhl Research Online Publication Server

SOPRA: Scaffolding algorithm for paired reads via statistical optimization

Author: Dayarian Adel
Michael Todd P
Sengupta Anirvan M
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background High throughput sequencing (HTS) platforms produce gigabases of short read (<100 bp) data per run. While these short reads are adequate for resequencing applications, <it>de novo </it>assembly of moderate size genomes from such reads remains a significant challenge. These limitations could be partially overcome by utilizing mate pair technology, which provides pairs of short reads separated by a known distance along the genome. Results We have developed SOPRA, a tool designed to exploit the mate pair/paired-end information for assembly of short reads. The main focus of the algorithm is selecting a sufficiently large subset of simultaneously satisfiable mate pair constraints to achieve a balance between the size and the quality of the output scaffolds. Scaffold assembly is presented as an optimization problem for variables associated with vertices and with edges of the contig connectivity graph. Vertices of this graph are individual contigs with edges drawn between contigs connected by mate pairs. Similar graph problems have been invoked in the context of shotgun sequencing and scaffold building for previous generation of sequencing projects. However, given the error-prone nature of HTS data and the fundamental limitations from the shortness of the reads, the ad hoc greedy algorithms used in the earlier studies are likely to lead to poor quality results in the current context. SOPRA circumvents this problem by treating all the constraints on equal footing for solving the optimization problem, the solution itself indicating the problematic constraints (chimeric/repetitive contigs, etc.) to be removed. The process of solving and removing of constraints is iterated till one reaches a core set of consistent constraints. For SOLiD sequencer data, SOPRA uses a dynamic programming approach to robustly translate the color-space assembly to base-space. For assessing the quality of an assembly, we report the no-match/mismatch error rate as well as the rates of various rearrangement errors. Conclusions Applying SOPRA to real data from bacterial genomes, we were able to assemble contigs into scaffolds of significant length (N50 up to 200 Kb) with very few errors introduced in the process. In general, the methodology presented here will allow better scaffold assemblies of any type of mate pair sequencing data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Towards a theory of patches

Author: Amir Amihood
Paryenty Haim
Publication venue: Elsevier B.V.
Publication date: 30/04/2012
Field of study

AbstractMany applications have a need for indexing unstructured data. It turns out that a similar ad-hoc method is being used in many of them – that of considering small particles of the data.In this paper we formalize this concept as a tiling problem and consider the efficiency of dealing with this model in the pattern matching setting.We present an efficient algorithm for the one-dimensional tiling problem, and the one-dimensional tiled pattern matching problem. We prove the two-dimensional problem is hard and then develop an approximation algorithm with an approximation ratio converging to 2. We show that other two-dimensional versions of the problem are also hard, regardless of the number of neighbors a tile has

Elsevier - Publisher Connector