Search CORE

8 research outputs found

Fully-online Construction of Suffix Trees for Multiple Texts

Author: Arimura Hiroki
Inenaga Shunsuke
Takagi Takuya
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Publication date: 01/01/2016
Field of study

We consider fully-online construction of indexing data structures for multiple texts. Let T = {T_1, ..., T_K} be a collection of texts. By fully-online, we mean that a new character can be appended to any text in T at any time. This is a natural generalization of semi-online construction of indexing data structures for multiple texts in which, after a new character is appended to the kth text T_k, then its previous texts T_1, ..., T_k-1 will remain static. Our fully-online scenario arises when we maintain dynamic indexes for multi-sensor data. Let N and sigma denote the total length of texts in T and the alphabet size, respectively. We first show that the algorithm by Blumer et al. [Theoretical Computer Science, 40:31-55, 1985] to construct the directed acyclic word graph (DAWG) for T can readily be extended to our fully-online setting, retaining O(N log sigma)-time and O(N)-space complexities. Then, we give a sophisticated fully-online algorithm which constructs the suffix tree for T in O(N log sigma) time and O(N) space. A key idea of this algorithm is synchronized maintenance of the DAWG and the suffix tree

Dagstuhl Research Online Publication Server

Practical methods for constructing suffix trees

Author: A. Andersson
A. Apostolico
A. Blumer
A. Crauser
A. Delcher
A. Delcher
C.-F. Cheung
E. Hunt
E.M. McCreight
G. Manzini
G. Navarro
J.M. Patel
J.S. Vitter
Jignesh M. Patel
L. Devroye
M. Farach-Colton
M.I. Abouelhoda
R. Apweiler
R. Giegerich
R. Giegerich
Richard A. Hankins
S. Kurtz
S. Kurtz
Sandeep Tata
Yuanyuan Tian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Sequence datasets are ubiquitous in modern life-science applications, and querying sequences is a common and critical operation in many of these applications. The suffix tree is a versatile data structure that can be used to evaluate a wide variety of queries on sequence datasets, including evaluating exact and approximate string matches, and finding repeat patterns. However, methods for constructing suffix trees are often very time-consuming, especially for suffix trees that are large and do not fit in the available main memory. Even when the suffix tree fits in memory, it turns out that the processor cache behavior of theoretically optimal suffix tree construction methods is poor, resulting in poor performance. Currently, there are a large number of algorithms for constructing suffix trees, but the practical tradeoffs in using these algorithms for different scenarios are not well characterized.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/47869/1/778_2005_Article_154.pd

CiteSeerX

Crossref

Deep Blue Documents at the University of Michigan

31th International Symposium on Theoretical Aspects of Computer Science: STACS '14, March 5th to March 8th, 2014, Lyon, France

Author: STACS <31 2014, Lyon>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/03/2014
Field of study

Digitale Bibliothek Thüringen

Declarative Querying For Biological Sequences.

Author: Tata Sandeep
Publication venue
Publication date: 01/01/2007
Field of study

Life science research labs today manage increasing volumes of sequence data. Much of the data management and querying today is accomplished procedurally using Perl, Python, or Java programs that integrate data from different sources and query tools. The dangers of this procedural approach are well known to the database community-- a) severe limitations on the ability to rapidly express queries and b) inefficient query plans due to the lack of sophisticated optimization tools. This situation is likely to get worse with advances in high-throughput technologies that make it easier to quickly produce vast amounts of sequence data. The need for a declarative and efficient system to manage and query biological sequence data is urgent. To address this need, we designed the Periscope/SQ system. Periscope/SQ extends current relational systems to enable sophisticated queries on sequence data and can optimize and execute these queries efficiently. This thesis describes the problems that need to be solved to make it possible to build the Periscope/SQ system. First, we describe the algebraic framework which forms the backbone of Periscope/SQ. Second, we describe algorithms to construct large scale suffix tree indexes for efficiently answering sequence queries. Third, we describe techniques for selectivity estimation and optimization in the context of queries over biological sequences. Next, we demonstrate how some of the techniques developed for Periscope/SQ can be applied to produce a powerful mining algorithm that we call FLAME. Finally, we describe GeneFinder, a biological application built on top of Periscope/SQ. GeneFinder is currently being used to predict the targets of transcription factors. Today, genomic and proteomic sequences are the most abundantly available source of high-quality biological data. By making it possible to declaratively and efficiently query vast amount of sequence data, Periscope/SQ opens the door to vast improvements in the pace of bioinformatics research.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/55670/2/tatas_1.pd

Deep Blue Documents at the University of Michigan

On Provable Security for Complex Systems

Author: Achenbach Dirk
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

We investigate the contribution of cryptographic proofs of security to a systematic security engineering process. To this end we study how to model and prove security for concrete applications in three practical domains: computer networks, data outsourcing, and electronic voting. We conclude that cryptographic proofs of security can benefit a security engineering process in formulating requirements, influencing design, and identifying constraints for the implementation

KITopen