Search CORE

140,425 research outputs found

Prospects and limitations of full-text index structures in genome analysis

Author: Dawyndt Peter
De Baets Bernard
Fack Veerle
Vyverman Michaël
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

Ghent University Academic Bibliography

PubMed Central

Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit

Author: Gilbert Anna C.
Tropp Joel A.
Publication venue
Publication date: 01/01/2007
Field of study

This paper demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with

m

nonzero entries in dimension

d

given

{rm O}(m ln d)

random linear measurements of that signal. This is a massive improvement over previous results, which require

{rm O}(m^{2})

measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems

CiteSeerX

Caltech Authors

Proof of Convergence and Performance Analysis for Sparse Recovery via Zero-point Attracting Projection

Author: Chen Laming
Gu Yuantao
Wang Xiaohan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

A recursive algorithm named Zero-point Attracting Projection (ZAP) is proposed recently for sparse signal reconstruction. Compared with the reference algorithms, ZAP demonstrates rather good performance in recovery precision and robustness. However, any theoretical analysis about the mentioned algorithm, even a proof on its convergence, is not available. In this work, a strict proof on the convergence of ZAP is provided and the condition of convergence is put forward. Based on the theoretical analysis, it is further proved that ZAP is non-biased and can approach the sparse solution to any extent, with the proper choice of step-size. Furthermore, the case of inaccurate measurements in noisy scenario is also discussed. It is proved that disturbance power linearly reduces the recovery precision, which is predictable but not preventable. The reconstruction deviation of

p

-compressible signal is also provided. Finally, numerical simulations are performed to verify the theoretical analysis.Comment: 29 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

CiNCT: Compression and retrieval for massive vehicular trajectories via relative movement labeling

Author: Ishikawa Yoshiharu
Koide Satoshi
Tadokoro Yukihiro
Xiao Chuan
Publication venue
Publication date: 29/09/2017
Field of study

In this paper, we present a compressed data structure for moving object trajectories in a road network, which are represented as sequences of road edges. Unlike existing compression methods for trajectories in a network, our method supports pattern matching and decompression from an arbitrary position while retaining a high compressibility with theoretical guarantees. Specifically, our method is based on FM-index, a fast and compact data structure for pattern matching. To enhance the compression, we incorporate the sparsity of road networks into the data structure. In particular, we present the novel concepts of relative movement labeling and PseudoRank, each contributing to significant reductions in data size and query processing time. Our theoretical analysis and experimental studies reveal the advantages of our proposed method as compared to existing trajectory compression methods and FM-index variants

arXiv.org e-Print Archive

Crossref

Content Based Image Retrieval System Using NOHIS-tree

Author: Taileb Mounira
Publication venue
Publication date: 01/01/2012
Field of study

Content-based image retrieval (CBIR) has been one of the most important research areas in computer vision. It is a widely used method for searching images in huge databases. In this paper we present a CBIR system called NOHIS-Search. The system is based on the indexing technique NOHIS-tree. The two phases of the system are described and the performance of the system is illustrated with the image database ImagEval. NOHIS-Search system was compared to other two CBIR systems; the first that using PDDP indexing algorithm and the second system is that using the sequential search. Results show that NOHIS-Search system outperforms the two other systems.Comment: 6 pages, 10th International Conference on Advances in Mobile Computing & Multimedia (MoMM2012

arXiv.org e-Print Archive

Crossref

Optimum Search Schemes for Approximate String Matching Using Bidirectional FM-Index

Author: Kianfar Kiavash
Luo Haochen
Pockrandt Christopher
Reinert Knut
Torkamandi Bahman
Publication venue
Publication date: 05/03/2018
Field of study

Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. Bidirectional indices have opened new possibilities in this regard allowing the search to start from anywhere within the pattern and extend in both directions. In particular, use of search schemes (partitioning the pattern and searching the pieces in certain orders with given bounds on errors) can yield significant speed-ups. However, finding optimal search schemes is a difficult combinatorial optimization problem. Here for the first time, we propose a mixed integer program (MIP) capable to solve this optimization problem for Hamming distance with given number of pieces. Our experiments show that the optimal search schemes found by our MIP significantly improve the performance of search in bidirectional FM-index upon previous ad-hoc solutions. For example, approximate matching of 101-bp Illumina reads (with two errors) becomes 35 times faster than standard backtracking. Moreover, despite being performed purely in the index, the running time of search using our optimal schemes (for up to two errors) is comparable to the best state-of-the-art aligners, which benefit from combining search in index with in-text verification using dynamic programming. As a result, we anticipate a full-fledged aligner that employs an intelligent combination of search in the bidirectional FM-index using our optimal search schemes and in-text verification using dynamic programming outperforms today's best aligners. The development of such an aligner, called FAMOUS (Fast Approximate string Matching using OptimUm search Schemes), is ongoing as our future work

arXiv.org e-Print Archive

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)