Search CORE

8 research outputs found

Optimal Parallel Construction of Minimal Suffix and Factor Automata

Author: Breslauer Dany
Hariharan Ramesh
Publication venue: 'Aarhus University Library'
Publication date: 01/01/1995
Field of study

This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for the efficient manipulation of trees, exploiting the well known relation between suffix and factor automata and suffix trees

CiteSeerX

Tidsskrift.dk (Det Kongelige Bibliotek)

MPG.PuRe

Efficient text fingerprinting via Parikh mapping

Author: Amir Amihood
Apostolico Alberto
Landau Gad M.
Satta Giorgio
Publication venue: Elsevier B.V.
Publication date: 01/01/2003
Field of study

AbstractWe consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ, and S′ is a substring of S, then the fingerprint of S′ is the subset φ of Σ of precisely the symbols appearing in S′. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n|Σ|lognlog|Σ|) and enables answering the following queries: (1)Given an integer k, compute the number of distinct fingerprints of size k in time O(1).(2)Given a set φ⊆Σ, compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(|Σ|logn)

Elsevier - Publisher Connector

Open Access Repository

Archivio istituzionale della ricerca - Università di Padova

Efficient CRCW-PRAM Algorithms Combining Multiple Autonomous Databases

Author: Apostolico Alberto
Publication venue: 'Purdue University (bepress)'
Publication date: 18/02/1991
Field of study

Purdue E-Pubs

Fast parallel algorithms for approximate string matching

Author: Jiang Yi
Publication venue: University of Montana, Maureen and Mike Mansfield Library
Publication date: 01/01/1992
Field of study

University of Montana

Human Genome Analysis

Author: Kratochvíl Jan
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2019
Field of study

Tato diplomová práce se zabývá implementací sufixových automatů, které jsou využity ve vyhledávání řetězců v DNA sekvencích. V první části práce je seznámení s problematikou sekvenování a mapování DNA. Následuje teoretická část popisující datové struktury sufixový strom a sufixové pole využívané ve vyhledávání v textu. Dále je seznámení se sufixovými automaty, na které navazují kompaktní sufixové automaty, návrh a implementace této struktury. Implementace je zaměřena na rozdělení vstupního řetězce na několik podřetězců, kde pro každý tento podřetězec je sestrojen sufixový automat. Bylo provedeno několik experimentů nad implementací této datové struktury. Výsledky experimentů jsou shrnuty v závěru této práce.This thesis describes the implementation of suffix automatons used for string searching on long DNA sequences. The first chapter talks about DNA sequencing and mapping. Then follows a~theoretic primer on the topic of suffix trees and suffix arrays which are widely used for searching over long strings. The next chapter introduces suffix automatons, which are followed by compact suffix automatons, design draft and implementation of this structure. The implementation focuses on splitting the input string into several substrings, where for each substring a suffix automaton is constructed. A~wide number of experiments have been conducted over this data structure. Finally, the results from various experiments are summed up in the closing section.460 - Katedra informatikyvýborn

DSpace at VSB Technical University of Ostrava

Large-scale methods in computational genomics

Author: Kalyanaraman Anantharaman
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2006
Field of study

The explosive growth in biological sequence data coupled with the design and deployment of increasingly high throughput sequencing technologies has created a need for methods capable of processing large-scale sequence data in a time and cost effective manner. In this dissertation, we address this need through the development of faster algorithms, space-efficient methods, and high-performance parallel computing techniques for some key problems in computational genomics;The first problem addressed is the clustering of DNA sequences based on a measure of sequence similarity. Our clustering method: (i) guarantees linear space complexity, in contrast to the quadratic memory requirements of previously developed methods; (ii) identifies sequence pairs containing long maximal matches in the decreasing order of their maximal match lengths in run-time proportional to the sum of input and output sizes; (iii) provides heuristics to significantly reduce the number of pairs evaluated for checking sequence similarity without affecting quality; and (iv) has parallel strategies that provide linear speedup and a proportionate reduction in space per processor. Our approach has significantly enhanced the problem size reach while also drastically reducing the time to solution;The next problem we address is the de novo detection of genomic repeats called Long Terminal Repeat (LTR) retrotransposons. Our algorithm guarantees linear space complexity and produces high quality candidates for prediction in run-time proportional to the sum of input and output sizes. Validation of our approach on the yeast genome demonstrates both superior quality and performance results when compared to previously developed software;In a genome assembly project, fragments sequenced from a target genome are computationally assembled into numerous supersequences called contigs , which are then ordered and oriented into scaffolds . In this dissertation, we introduce a new problem called retroscaffolding for scaffolding contigs based on the knowledge of their LTR retrotransposon content. Through identification of sequencing gaps that span LTR retrotransposons, retroscaffolding provides a mechanism for prioritizing sequencing gaps for finishing purposes;While most of the problems addressed here have been studied previously, the main contribution in this dissertation is the development of methods that can scale to the largest available sequence collections

Digital Repository @ Iowa State University (ISU)

Parallel Construction of a Suffix Tree With Applications

Author: Apostolico A.
Iliopoulos C.
Landau G. M.
Schieber B.
Vishkin U.
Publication venue: 'Purdue University (bepress)'
Publication date: 30/09/1987
Field of study

Purdue E-Pubs