Search CORE

6,930 research outputs found

Simple gene assembly as a rewriting of directed overlap-inclusion graphs

Author: Azimi S
Harju T
Langille M
Petre I
Publication venue: 'Elsevier BV'
Publication date: 27/10/2022
Field of study

The simple intramolecular model for gene assembly in ciliates consists of three molecular operations, simple Id, simple hi and simple dlad. Mathematical models in terms of signed permutations and signed strings proved limited in capturing some of the combinatorial details of the simple gene assembly process. Brijder and Hoogeboom introduced a new model in terms of overlap-inclusion graphs which could describe two of the three operations of the model and their combinatorial properties. To capture the third operation, we extended their framework to directed overlap-inclusion (DOI) graphs in Azimi et al. (2011) [1]. In this paper we introduce DOI graph-based rewriting rules that capture all three operations of the simple gene assembly model and prove that they are equivalent to the string-based formalization of the model. (C) 2012 Elsevier B.V. All rights reserved

UTUPub

The Fibers and Range of Reduction Graphs in Ciliates

Author: A. Bergeron
A. Ehrenfeucht
Hendrik Jan Hoogeboom
J. Setubal
P. Pevzner
R. Brijder
R. Brijder
R. Brijder
Robert Brijder
S. Hannenhalli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/02/2007
Field of study

The biological process of gene assembly has been modeled based on three types of string rewriting rules, called string pointer rules, defined on so-called legal strings. It has been shown that reduction graphs, graphs that are based on the notion of breakpoint graph in the theory of sorting by reversal, for legal strings provide valuable insights into the gene assembly process. We characterize which legal strings obtain the same reduction graph (up to isomorphism), and moreover we characterize which graphs are (isomorphic to) reduction graphs.Comment: 24 pages, 13 figure

arXiv.org e-Print Archive

Crossref

Safe and complete contig assembly via omnitigs

Author: A Bankevich
A Guénoche
AR Rubinov
AS Motahari
C Kingsford
D Haussler
DR Zerbino
E Kapun
E Kapun
ES Lander
G Bresler
G Narzisi
I Lysov
JD Kececioglu
JR Miller
JT Simpson
JT Simpson
K Lam
K Sahlin
L Salmela
M Boetzer
M Boetzer
N Nagarajan
N Nagarajan
N Vyahhi
P Medvedev
P Medvedev
P Medvedev
PA Pevzner
PA Pevzner
R Chikhi
R Chikhi
R Luo
R Uricaru
RM Idury
SL Salzberg
Publication venue
Publication date: 16/08/2016
Field of study

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs -- a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question was never solved: given a genome graph

G

(e.g. a de Bruijn, or a string graph), what are all the strings that can be safely reported from

G

as contigs? In this paper we finally answer this question, and also give a polynomial time algorithm to find them. Our experiments show that these strings, which we call omnitigs, are 66% to 82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.Comment: Full version of the paper in the proceedings of RECOMB 201

arXiv.org e-Print Archive

Crossref

Canonical, Stable, General Mapping using Context Schemes

Author: Haussler David
Novak Adam
Paten Benedict
Rosen Yohei
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/06/2015
Field of study

Motivation: Sequence mapping is the cornerstone of modern genomics. However, most existing sequence mapping algorithms are insufficiently general. Results: We introduce context schemes: a method that allows the unambiguous recognition of a reference base in a query sequence by testing the query for substrings from an algorithmically defined set. Context schemes only map when there is a unique best mapping, and define this criterion uniformly for all reference bases. Mappings under context schemes can also be made stable, so that extension of the query string (e.g. by increasing read length) will not alter the mapping of previously mapped positions. Context schemes are general in several senses. They natively support the detection of arbitrary complex, novel rearrangements relative to the reference. They can scale over orders of magnitude in query sequence length. Finally, they are trivially extensible to more complex reference structures, such as graphs, that incorporate additional variation. We demonstrate empirically the existence of high performance context schemes, and present efficient context scheme mapping algorithms. Availability and Implementation: The software test framework created for this work is available from https://registry.hub.docker.com/u/adamnovak/sequence-graphs/. Contact: [email protected] Supplementary Information: Six supplementary figures and one supplementary section are available with the online version of this article.Comment: Submission for Bioinformatic

arXiv.org e-Print Archive

PubMed Central

eScholarship - University of California