10,833 research outputs found

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

    Engineering Parallel String Sorting

    Get PDF
    We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

    Concepts and their Use for Modelling Objects and References in Programming Languages

    Full text link
    In the paper a new programming construct, called concept, is introduced. Concept is pair of two classes: a reference class and an object class. Instances of the reference classes are passed-by-value and are intended to represent objects. Instances of the object class are passed-by-reference. An approach to programming where concepts are used instead of classes is called concept-oriented programming (CoP). In CoP objects are represented and accessed indirectly by means of references. The structure of concepts describes a hierarchical space with a virtual address system. The paper describes this new approach to programming including such mechanisms as reference resolution, complex references, method interception, dual methods, life-cycle management inheritance and polymorphism.Comment: 43 pages. Related papers: http://conceptoriented.com

    Testing Embedded Memories in Telecommunication Systems

    Get PDF
    Extensive system testing is mandatory nowadays to achieve high product quality. Telecommunication systems are particularly sensitive to such a requirement; to maintain market competitiveness, manufacturers need to combine reduced costs, shorter life cycles, advanced technologies, and high quality. Moreover, strict reliability constraints usually impose very low fault latencies and a high degree of fault detection for both permanent and transient faults. This article analyzes major problems related to testing complex telecommunication systems, with particular emphasis on their memory modules, often so critical from the reliability point of view. In particular, advanced BIST-based solutions are analyzed, and two significant industrial case studies presente

    Lightweight LCP Construction for Very Large Collections of Strings

    Full text link
    The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from "next-generation" DNA sequencing (NGS) technologies. In this paper we present the first lightweight algorithm (called extLCP) for the simultaneous computation of the longest common prefix array and the Burrows-Wheeler transform of a very large collection of strings having any length. The computation is realized by performing disk data accesses only via sequential scans, and the total disk space usage never needs more than twice the output size, excluding the disk space required for the input. Moreover, extLCP allows to compute also the suffix array of the strings of the collection, without any other further data structure is needed. Finally, we test our algorithm on real data and compare our results with another tool capable to work in external memory on large collections of strings.Comment: This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ The final version of this manuscript is in press in Journal of Discrete Algorithm
    corecore