140,425 research outputs found

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

    Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit

    Get PDF
    This paper demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with mm nonzero entries in dimension dd given rmO(mlnd) {rm O}(m ln d) random linear measurements of that signal. This is a massive improvement over previous results, which require rmO(m2){rm O}(m^{2}) measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems

    Proof of Convergence and Performance Analysis for Sparse Recovery via Zero-point Attracting Projection

    Full text link
    A recursive algorithm named Zero-point Attracting Projection (ZAP) is proposed recently for sparse signal reconstruction. Compared with the reference algorithms, ZAP demonstrates rather good performance in recovery precision and robustness. However, any theoretical analysis about the mentioned algorithm, even a proof on its convergence, is not available. In this work, a strict proof on the convergence of ZAP is provided and the condition of convergence is put forward. Based on the theoretical analysis, it is further proved that ZAP is non-biased and can approach the sparse solution to any extent, with the proper choice of step-size. Furthermore, the case of inaccurate measurements in noisy scenario is also discussed. It is proved that disturbance power linearly reduces the recovery precision, which is predictable but not preventable. The reconstruction deviation of pp-compressible signal is also provided. Finally, numerical simulations are performed to verify the theoretical analysis.Comment: 29 pages, 6 figure

    CiNCT: Compression and retrieval for massive vehicular trajectories via relative movement labeling

    Full text link
    In this paper, we present a compressed data structure for moving object trajectories in a road network, which are represented as sequences of road edges. Unlike existing compression methods for trajectories in a network, our method supports pattern matching and decompression from an arbitrary position while retaining a high compressibility with theoretical guarantees. Specifically, our method is based on FM-index, a fast and compact data structure for pattern matching. To enhance the compression, we incorporate the sparsity of road networks into the data structure. In particular, we present the novel concepts of relative movement labeling and PseudoRank, each contributing to significant reductions in data size and query processing time. Our theoretical analysis and experimental studies reveal the advantages of our proposed method as compared to existing trajectory compression methods and FM-index variants

    Content Based Image Retrieval System Using NOHIS-tree

    Full text link
    Content-based image retrieval (CBIR) has been one of the most important research areas in computer vision. It is a widely used method for searching images in huge databases. In this paper we present a CBIR system called NOHIS-Search. The system is based on the indexing technique NOHIS-tree. The two phases of the system are described and the performance of the system is illustrated with the image database ImagEval. NOHIS-Search system was compared to other two CBIR systems; the first that using PDDP indexing algorithm and the second system is that using the sequential search. Results show that NOHIS-Search system outperforms the two other systems.Comment: 6 pages, 10th International Conference on Advances in Mobile Computing & Multimedia (MoMM2012

    Optimum Search Schemes for Approximate String Matching Using Bidirectional FM-Index

    Full text link
    Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. Bidirectional indices have opened new possibilities in this regard allowing the search to start from anywhere within the pattern and extend in both directions. In particular, use of search schemes (partitioning the pattern and searching the pieces in certain orders with given bounds on errors) can yield significant speed-ups. However, finding optimal search schemes is a difficult combinatorial optimization problem. Here for the first time, we propose a mixed integer program (MIP) capable to solve this optimization problem for Hamming distance with given number of pieces. Our experiments show that the optimal search schemes found by our MIP significantly improve the performance of search in bidirectional FM-index upon previous ad-hoc solutions. For example, approximate matching of 101-bp Illumina reads (with two errors) becomes 35 times faster than standard backtracking. Moreover, despite being performed purely in the index, the running time of search using our optimal schemes (for up to two errors) is comparable to the best state-of-the-art aligners, which benefit from combining search in index with in-text verification using dynamic programming. As a result, we anticipate a full-fledged aligner that employs an intelligent combination of search in the bidirectional FM-index using our optimal search schemes and in-text verification using dynamic programming outperforms today's best aligners. The development of such an aligner, called FAMOUS (Fast Approximate string Matching using OptimUm search Schemes), is ongoing as our future work
    corecore