Search CORE

511 research outputs found

Recommended from our members

Sparse Dynamic Programming I: Linear Cost Functions

Author: Eppstein David
Galil Zvi
Giancarlo Raffaele
Italiano Giuseppe F.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1989
Field of study

We consider dynamic programming solutions to a number of different recurrences for sequence comparison and for RNA secondary structure prediction. These recurrences are defined over a number of points that is quadratic in the input size; however only a sparse set matters for the result. We give efficient algorithms for these problems, when the weight functions used in the recurrences are taken to be linear. Our algorithms reduce the best known bounds by a factor almost linear in the density of the problems: when the problems are sparse this results in a substantial speed-up

Columbia University Academic Commons

A basic analysis toolkit for biological sequences

Author: A Aho
A Amir
A Apostolico
A Czumaj
Alessandro Siragusa
B Schieber
BS Baker
D Eppstein
D Eppstein
D Gusfield
D Hirschberg
E Ukkonen
E Ukkonen
EM McCreight
Enrico Siragusa
EW Myers
Filippo Utro
G Landau
G Landau
J Hunt
K Mehlhorn
M Dayhoff
M Leung
M Waterman
M Waterman
MM Klawe
O Gotoh
R Cole
R Giancarlo
Raffaele Giancarlo
S Altshul
S Henikoff
S Henikoff
S Sinha
S Sinha
W Fitch
W Goad
W Miller
Z Galil
Z Galil
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at under the GNU GPL

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Palermo

Sparse Dynamic Programming on DAGs with Small Width

Author: Chikhi Rayan
Gagie Travis
Kuosmanen Anna
Mäkinen Veli
Paavilainen Topi
Tomescu Alexandru I.
Publication venue
Publication date: 01/05/2019
Field of study

The minimum path cover problem asks us to find a minimum-cardinality set of paths that cover all the nodes of a directed acyclic graph (DAG). We study the case when the size k of a minimum path cover is small, that is, when the DAG has a small width. This case is motivated by applications in pan-genomics, where the genomic variation of a population is expressed as a DAG. We observe that classical alignment algorithms exploiting sparse dynamic programming can be extended to the sequence-against-DAG case by mimicking the algorithm for sequences on each path of a minimum path cover and handling an evaluation order anomaly with reachability queries. Namely, we introduce a general framework for DAG-extensions of sparse dynamic programming. This framework produces algorithms that are slower than their counterparts on sequences only by a factor k. We illustrate this on two classical problems extended to DAGs: longest increasing subsequence and longest common subsequence. For the former, we obtain an algorithm with running time O(k vertical bar E vertical bar log vertical bar V vertical bar). This matches the optimal solution to the classical problem variant when the input sequence is modeled as a path. We obtain an analogous result for the longest common subsequence problem. We then apply this technique to the co-linear chaining problem, which is a generalization of the above two problems. The algorithm for this problem turns out to be more involved, needing further ingredients, such as an FM-index tailored for large alphabets and a two-dimensional range search tree modified to support range maximum queries. We also study a general sequence-to-DAG alignment formulation that allows affine gap costs in the sequence. The main ingredient of the proposed framework is a new algorithm for finding a minimum path cover of a DAG (V, E) in O(k vertical bar E vertical bar log vertical bar V vertical bar) time, improving all known time-bounds when k is small and the DAG is not too dense. In addition to boosting the sparse dynamic programming framework, an immediate consequence of this new minimum path cover algorithm is an improved space/time tradeoff for reachability queries in arbitrary directed graphs.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Dynamic String Alignment

Author: Charalampopoulos Panagiotis
Kociumaka Tomasz
Mozes Shay
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

Read alignment using deep neural networks

Author: Shrestha Akash
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2019
Field of study

2019 Spring.Includes bibliographical references.Read alignment is the process of mapping short DNA sequences into the reference genome. With the advent of consecutively evolving "next generation" sequencing technologies, the need for sequence alignment tools appeared. Many scientific communities and the companies marketing the sequencing technologies developed a whole spectrum of read aligners/mappers for different error profiles and read length characteristics. Among the most recent successfully marketed sequencing technologies are Oxford Nanopore and PacBio SMRT sequencing, which are considered top players because of their extremely long reads and low cost. However, the reads may contain error up to 20% that are not generally uniformly distributed. To deal with that level of error rate and read length, proximity preserving hashing techniques, such as Minhash and Minimizers, were utilized to quickly map a read to the target region of the reference sequence. Subsequently, a variant of global or local alignment dynamic programming is then used to give the final alignment. In this research work, we train a Deep Neural Network (DNN) to yield a hashing scheme for the highly erroneous long reads, which is deemed superior to Minhash for mapping the reads. We implemented that idea to build a read alignment tool: DNNAligner. We evaluated the performance of our aligner against the popular read aligners in the bioinformatics community currently — minimap2, bwa-mem and graphmap. Our results show that the performance of DNNAligner is comparable to other tools without any code optimization or integration of other advanced features. Moreover, DNN exhibits superior performance in comparison with Minhashon neighborhood classification

Mountain Scholar (Digital Collections of Colorado and Wyoming)

A Faster Subquadratic Algorithm for the Longest Common Increasing Subsequence Problem

Author: Agrawal Anadi
Gawrychowski Pawe?
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Algorithms and Computation (ISAAC 2020)
Publication date: 01/01/2020
Field of study

The Longest Common Increasing Subsequence (LCIS) is a variant of the classical Longest Common Subsequence (LCS), in which we additionally require the common subsequence to be strictly increasing. While the well-known "Four Russians" technique can be used to find LCS in subquadratic time, it does not seem applicable to LCIS. Recently, Duraj [STACS 2020] used a completely different method based on the combinatorial properties of LCIS to design an

\mathcal{O}(n^2(\log\log n)^2/\log^{1/6}n)

time algorithm. We show that an approach based on exploiting tabulation can be used to construct an asymptotically faster

\mathcal{O}(n^2 \log\log n/\sqrt{\log n})

time algorithm. As our solution avoids using the specific combinatorial properties of LCIS, it can be also adapted for the Longest Common Weakly Increasing Subsequence (LCWIS)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Chaining with Overlaps Revisited

Author: Mäkinen Veli
Sahlin Kristoffer
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2020
Field of study

Chaining algorithms aim to form a semi-global alignment of two sequences based on a set of anchoring local alignments as input. Depending on the optimization criteria and the exact definition of a chain, there are several O(n log n) time algorithms to solve this problem optimally, where n is the number of input anchors. In this paper, we focus on a formulation allowing the anchors to overlap in a chain. This formulation was studied by Shibuya and Kurochkin (WABI 2003), but their algorithm comes with no proof of correctness. We revisit and modify their algorithm to consider a strict definition of precedence relation on anchors, adding the required derivation to convince on the correctness of the resulting algorithm that runs in O(n log2 n) time on anchors formed by exact matches. With the more relaxed definition of precedence relation considered by Shibuya and Kurochkin or when anchors are non-nested such as matches of uniform length (k-mers), the algorithm takes O(n log n) time. We also establish a connection between chaining with overlaps and the widely studied longest common subsequence problem. 2012 ACM Subject Classification Theory of computation ! Pattern matching; Theory of computation ! Dynamic programming; Applied computing ! Genomics.Peer reviewe

arXiv.org e-Print Archive

Publikationer från Stockholms universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Helsingin yliopiston digitaalinen arkisto