1,943 research outputs found

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    An Open Guide to Data Structures and Algorithms

    Get PDF
    This textbook serves as a gentle introduction for undergraduates to theoretical concepts in data structures and algorithms in computer science while providing coverage of practical implementation (coding) issues. The field of computer science (CS) supports a multitude of essential technologies in science, engineering, and communication as a social medium. The varied and interconnected nature of computer technology permeates countless career paths making CS a popular and growing major program. Mastery of the science behind computer science relies on an understanding of the theory of algorithms and data structures. These concepts underlie the fundamental tradeoffs that dictate performance in terms of speed, memory usage, and programming complexity that separate novice programmers from professional practitioners

    Cross-Document Pattern Matching

    Get PDF
    We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time bounds that either do not depend at all on the pattern size or depend on it in a very limited way (doubly logarithmic). As a side result, we propose an improved solution to the weighted level ancestor problem

    Parallel Sparse Matrix-Matrix Multiplication

    Get PDF
    The thesis investigates the BLAS-3 routine of sparse matrix-matrix multiplication (SpGEMM) based on the outer product method. Sev- eral algorithmic approaches have been implemented and empirically an- alyzed. The experiments have shown that an algorithm presented by Gustavson [22] outperforms other alternatives. In this work we propose optimization techniques that improve the scalability and the cache efficiency of the Gustavson’s algorithm for large matrices. Our approach succeeded to reduce the cache misses by more than a factor of five and to improve the net running time by 30% with some instances. The thesis also presents an algorithm for flops estima- tion, which can be used to determine an upper bound for the density of the result matrix. Furthermore, the work analyzes and empirically evaluates techniques for parallelization of the multiplication in a shared memory model by using Intel TBB and OpenMP. We investigate the cache efficiency of the algorithm in a parallel setting and compare several approaches for load balancing of the computation

    Advancing Protocol Diversity in Network Security Monitoring

    Get PDF
    With information technology entering new fields and levels of deployment, e.g., in areas of energy, mobility, and production, network security monitoring needs to be able to cope with those environments and their evolution. However, state-of-the-art Network Security Monitors (NSMs) typically lack the necessary flexibility to handle the diversity of the packet-oriented layers below the abstraction of TCP/IP connections. In this work, we advance the software architecture of a network security monitor to facilitate the flexible integration of lower-layer protocol dissectors while maintaining required performance levels. We proceed in three steps: First, we identify the challenges for modular packet-level analysis, present a refined NSM architecture to address them and specify requirements for its implementation. Second, we evaluate the performance of data structures to be used for protocol dispatching, implement the proposed design into the popular open-source NSM Zeek and assess its impact on the monitor performance. Our experiments show that hash-based data structures for dispatching introduce a significant overhead while array-based approaches qualify for practical application. Finally, we demonstrate the benefits of the proposed architecture and implementation by migrating Zeek\u27s previously hard-coded stack of link and internet layer protocols to the new interface. Furthermore, we implement dissectors for non-IP based industrial communication protocols and leverage them to realize attack detection strategies from recent applied research. We integrate the proposed architecture into the Zeek open-source project and publish the implementation to support the scientific community as well as practitioners, promoting the transfer of research into practice
    • …
    corecore