764 research outputs found

    Computing Covers under Substring Consistent Equivalence Relations

    Full text link
    Covers are a kind of quasiperiodicity in strings. A string CC is a cover of another string TT if any position of TT is inside some occurrence of CC in TT. The shortest and longest cover arrays of TT have the lengths of the shortest and longest covers of each prefix of TT, respectively. The literature has proposed linear-time algorithms computing longest and shortest cover arrays taking border arrays as input. An equivalence relation \approx over strings is called a substring consistent equivalence relation (SCER) iff XYX \approx Y implies (1) X=Y|X| = |Y| and (2) X[i:j]Y[i:j]X[i:j] \approx Y[i:j] for all 1ijX1 \le i \le j \le |X|. In this paper, we generalize the notion of covers for SCERs and prove that existing algorithms to compute the shortest cover array and the longest cover array of a string TT under the identity relation will work for any SCERs taking the accordingly generalized border arrays.Comment: 16 page

    Comparison of Knuth Morris Pratt and Boyer Moore algorithms for a web-based dictionary of computer terms

    Get PDF
    Computer students need a dictionary of computer terms to deepen lectures. In developing dictionary applications, the term computer will choose the fastest and most efficient memory algorithm. The comparison algorithm is Knuth Morris Pratt (KMP) and Boyer Moore (BM) algorithm. Based on previous research, the KMP algorithm has a better performance compared to other string matching algorithms. However, other studies have concluded that the BM algorithm has better performance. Besides, the Zhu-Takaoka algorithm is more efficient than the KMP algorithm in dictionary development. The BM algorithm has the same search concept as the Zhu-Takaoka algorithm. The determination of the fastest and most efficient algorithm in this study uses the Exponential Comparison Method (ECM). ECM sets criteria for when searching and using the memory in the search process. The results of the comparison of the KMP and BM algorithm are the search time for the BM algorithm is 37.9%, and the KMP algorithm is 62.1%. The results of the use of search memory for the KMP algorithm are 50.6%, and the BM algorithm is 49.4%. The total ECM score shows that the BM algorithm is 0.55% better than the KMP algorithm

    Distillating knowledge about SCOTCH

    Get PDF
    The design of the Scotch library for static mapping, graph partitioning and sparse matrix ordering is highly modular, so as to allow users and potential contributors to tweak it and add easily new static mapping, graph bipartitioning, vertex separation or graph ordering methods to match their particular needs. The purpose of this tutorial is twofold. It will start with a description of the interface of Scotch, presenting its visible objects and data structures. Then, we will step into the API mirror and have a look at the inside: the internal representation of graphs, mappings and orderings, and the basic sequential and parallel building blocks: graph induction, graph coarsening which can be re-used by third-party software. As an example, we will show how to add a simple genetic algorithm routine to the graph bipartitioning methods

    Lightweight BWT and LCP merging via the gap algorithm

    Get PDF
    Recently, Holt and McMillan [Bioinformatics 2014, ACM-BCB 2014] have proposed a simple and elegant algorithm to merge the Burrows-Wheeler transforms of a collection of strings. In this paper we show that their algorithm can be improved so that, in addition to the BWTs, it also merges the Longest Common Prefix (LCP) arrays. Because of its small memory footprint this new algorithm can be used for the final merge of BWT and LCP arrays computed by a faster but memory intensive construction algorithm

    Towards a secure and efficient search over encrypted cloud data

    Get PDF
    Includes bibliographical references.2016 Summer.Cloud computing enables new types of services where the computational and network resources are available online through the Internet. One of the most popular services of cloud computing is data outsourcing. For reasons of cost and convenience, public as well as private organizations can now outsource their large amounts of data to the cloud and enjoy the benefits of remote storage and management. At the same time, confidentiality of remotely stored data on untrusted cloud server is a big concern. In order to reduce these concerns, sensitive data, such as, personal health records, emails, income tax and financial reports, are usually outsourced in encrypted form using well-known cryptographic techniques. Although encrypted data storage protects remote data from unauthorized access, it complicates some basic, yet essential data utilization services such as plaintext keyword search. A simple solution of downloading the data, decrypting and searching locally is clearly inefficient since storing data in the cloud is meaningless unless it can be easily searched and utilized. Thus, cloud services should enable efficient search on encrypted data to provide the benefits of a first-class cloud computing environment. This dissertation is concerned with developing novel searchable encryption techniques that allow the cloud server to perform multi-keyword ranked search as well as substring search incorporating position information. We present results that we have accomplished in this area, including a comprehensive evaluation of existing solutions and searchable encryption schemes for ranked search and substring position search

    Probabilistic Record Linkage with Elliptic Curve Operations

    Get PDF
    Federated query processing for an electronic health record infrastructure enables large epidemiology studies using data integrated from geographically dispersed medical institutions. However, government imposed privacy regulations prohibit disclosure of patient\u27s health record outside the context of clinical care, thereby making it difficult to determine which records correspond to the same entity in the process of query aggregation. Privacy-preserving record linkage is an actively pursued research area to facilitate the linkage of database records under the constraints of regulations that do not allow the linkage agents to learn sensitive identities of record owners. In earlier works, scalability has been shown to be possible using traditional cryptographic transformations such as Pohlig-Hellman ciphers, precomputations, data parallelism, and probabilistic key reuse approaches. This work proposes further optimizations to improve the runtime of a linkage exercise by adopting elliptic curve based transformations that are mostly additive and multiplicative, instead of exponentiations. The elliptic curve operations are used to improve the precomputation time, eliminate memory intensive comparisons of encrypted values and introduce data structures to detect negative comparisons. This method of record linkage is able to link data sets of the order of a million rows within 15 minutes. The approach has been gauged using synthetic and real world demographics data with parametric studies. We have also assessed the residual privacy risk of the proposed approach

    Matching Statistics Speed up BWT Construction

    Get PDF
    corecore