Search CORE

16 research outputs found

A New Class of Searchable and Provably Highly Compressible String Transformations

Author: Giancarlo Raffaele
Manzini Giovanni
Rosone Giovanna
Sciortino Marinella
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)
Publication date: 01/01/2019
Field of study

The Burrows-Wheeler Transform is a string transformation that plays a fundamental role for the design of self-indexing compressed data structures. Over the years, researchers have successfully extended this transformation outside the domains of strings. However, efforts to find non-trivial alternatives of the original, now 25 years old, Burrows-Wheeler string transformation have met limited success. In this paper we bring new lymph to this area by introducing a whole new family of transformations that have all the "myriad virtues" of the BWT: they can be computed and inverted in linear time, they produce provably highly compressible strings, and they support linear time pattern search directly on the transformed string. This new family is a special case of a more general class of transformations based on context adaptive alphabet orderings, a concept introduced here. This more general class includes also the Alternating BWT, another invertible string transforms recently introduced in connection with a generalization of Lyndon words

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Dagstuhl Research Online Publication Server

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Archivio istituzionale della ricerca - Università di Palermo

A new class of string transformations for compressed text indexing

Author: Giancarlo R.
Manzini G.
Restivo A.
Rosone G.
Sciortino M.
Publication venue: Elsevier Inc.
Publication date: 01/01/2023
Field of study

Introduced about thirty years ago in the field of data compression, the Burrows-Wheeler Transform (BWT) is a string transformation that, besides being a booster of the performance of memoryless compressors, plays a fundamental role in the design of efficient self-indexing compressed data structures. Finding other string transformations with the same remarkable properties of BWT has been a challenge for many researchers for a long time. In this paper, we introduce a whole class of new string transformations, called local orderings-based transformations, which have all the “myriad virtues” of BWT. As a further result, we show that such new string transformations can be used for the construction of the recently introduced r-index, which makes them suitable also for highly repetitive collections. In this context, we consider the problem of finding, for a given string, the BWT variant that minimizes the number of runs in the transformed string

Archivio istituzionale della ricerca - Università di Palermo

The Alternating BWT: An algorithmic perspective

Author: Giancarlo R.
Manzini G.
Restivo A.
Rosone G.
Sciortino M.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression. It has become a fundamental tool for designing self-indexing data structures, with important applications in several areas in science and engineering. The Alternating Burrows-Wheeler Transform (ABWT) is another transformation recently introduced in Gessel et al. (2012) [21] and studied in the field of Combinatorics on Words. It is analogous to the BWT, except that it uses an alternating lexicographical order instead of the usual one. Building on results in Giancarlo et al. (2018) [23], where we have shown that BWT and ABWT are part of a larger class of reversible transformations, here we provide a combinatorial and algorithmic study of the novel transform ABWT. We establish a deep analogy between BWT and ABWT by proving they are the only ones in the above mentioned class to be rank-invertible, a novel notion guaranteeing efficient invertibility. In addition, we show that the backward-search procedure can be efficiently generalized to the ABWT; this result implies that also the ABWT can be used as a basis for efficient compressed full text indices. Finally, we prove that the ABWT can be efficiently computed by using a combination of the Difference Cover suffix sorting algorithm (K\ue4rkk\ue4inen et al., 2006 [28]) with a linear time algorithm for finding the minimal cyclic rotation of a word with respect to the alternating lexicographical order

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

Variable-order reference-free variant discovery with the Burrows-Wheeler Transform

Author: Pisanti Nadia
Prezza Nicola
Rosone Giovanna
Sciortino Marinella
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

International audienceBackground: In [Prezza et al., AMB 2019], a new reference-free and alignment-free framework for the detection of SNPs was suggested and tested. The framework, based on the Burrows-Wheeler Transform (BWT), significantly improves sensitivity and precision of previous de Bruijn graphs based tools by overcoming several of their limitations, namely: (i) the need to establish a fixed value, usually small, for the order k, (ii) the loss of important information such as k-mer coverage and adjacency of k-mers within the same read, and (iii) bad performance in repeated regions longer than k bases. The preliminary tool, however, was able to identify only SNPs and it was too slow and memory consuming due to the use of additional heavy data structures (namely, the Suffix and LCP arrays), besides the BWT. Results: In this paper, we introduce a new algorithm and the corresponding tool ebwt2InDel that (i) extend the framework of [Prezza et al., AMB 2019] to detect also INDELs, and (ii) implements recent algorithmic findings that allow to perform the whole analysis using just the BWT, thus reducing the working space by one order of magnitude and allowing the analysis of full genomes. Finally, we describe a simple strategy for effectively parallelizing our tool for SNP detection only. On a 24-cores machine, the parallel version of our tool is one order of magnitude faster than the sequential one. The tool ebwt2InDel is available at github.com/nicolaprezza/ebwt2InDel. Conclusions: Results on a synthetic dataset covered at 30x (Human chromosome 1) show that our tool is indeed able to find up to 83% of the SNPs and 72% of the existing INDELs. These percentages considerably improve the 71% of SNPs and 51% of INDELs found by the state-of-the art tool based on de Bruijn graphs. We furthermore repor

INRIA a CCSD electronic archive server

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Algorithms and Lower Bounds for Ordering Problems on Strings

Author: Gibney Daniel
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2021
Field of study

This dissertation presents novel algorithms and conditional lower bounds for a collection of string and text-compression-related problems. These results are unified under the theme of ordering constraint satisfaction. Utilizing the connections to ordering constraint satisfaction, we provide hardness results and algorithms for the following: recognizing a type of labeled graph amenable to text-indexing known as Wheeler graphs, minimizing the number of maximal unary substrings occurring in the Burrows-Wheeler Transformation of a text, minimizing the number of factors occurring in the Lyndon factorization of a text, and finding an optimal reference string for relative Lempel-Ziv encoding

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Mathematics & Statistics 2017 APR Self-Study & Documents

Author: University of New Mexico
Publication venue: UNM Digital Repository
Publication date: 01/04/2017
Field of study

UNM Mathematics & Statistics APR self-study report, review team report, response report, and initial action plan for Spring 2017, fulfilling requirements of the Higher Learning Commission

Efficient Data-Oblivious Computation

Author: Nayak Kartik Ravidas
Publication venue
Publication date: 01/01/2018
Field of study

The rapid increase in the amount of data stored by cloud servers has resulted in growing privacy concerns for users. First, although keeping data encrypted at all times is an attractive approach to privacy, encryption may preclude mining and learning useful patterns from data. Second, companies are unable to distribute proprietary programs to other parties without risking the loss of their private code when those programs are reverse engineered. A challenge underlying both those problems is that how data is accessed — even when that data is encrypted — can leak secret information. Oblivious RAM is a well studied cryptographic primitive that can be used to solve the underlying challenge of hiding data-access patterns. In this dissertation, we improve Oblivious RAMs and oblivious algorithms asymptotically. We then show how to apply our novel oblivious algorithms to build systems that enable privacy-preserving computation on encrypted data and program obfuscation. Specifically, the first part of this dissertation shows two efficient Oblivious RAM algorithms: 1) The first algorithm achieves sub-logarithmic bandwidth blowup while only incurring an inexpensive XOR computation for performing Private Information Retrieval operations, and 2) The second algorithm is the first perfectly-secure Oblivious Parallel RAM with

O(\log^3 N )

bandwidth blowup,

O((\log m + \log \log N)\log N)

depth blowup, and

O(1)

space blowup when the PRAM has

m

CPUs and stores

N

blocks of data. The second part of this dissertation describes two systems — HOP and GraphSC — that address the problem of computing on private data and the distribution of proprietary programs. HOP is a system that achieves simulation-secure obfuscation of RAM programs assuming secure hardware. It is the first prototype implementation of a provably secure virtual black-box (VBB) obfuscation scheme in any model under any assumptions. GraphSC is a system that allows cloud servers to run a class of data-mining and machine-learning algorithms over users’ data without learning anything about that data. GraphSC brings efficient, parallel secure computation to programmers by allowing them to express computation tasks using the GraphLab abstraction. It is backed by the first non-trivial parallel oblivious algorithms that outperform generic Oblivious RAMs

Digital Repository at the University of Maryland

Novel Approaches to Preserving Utility in Privacy Enhancing Technologies

Author: Mohammady Meisam
Publication venue
Publication date: 01/10/2020
Field of study

Significant amount of individual information are being collected and analyzed today through a wide variety of applications across different industries. While pursuing better utility by discovering knowledge from the data, an individual’s privacy may be compromised during an analysis: corporate networks monitor their online behavior, advertising companies collect and share their private information, and cybercriminals cause financial damages through security breaches. To this end, the data typically goes under certain anonymization techniques, e.g., CryptoPAn [Computer Networks’04], which replaces real IP addresses with prefix-preserving pseudonyms, or Differentially Private (DP) [ICALP’06] techniques which modify the answer to a query by adding a zero-mean noise distributed according to, e.g., a Laplace distribution. Unfortunately, most such techniques either are vulnerable to adversaries with prior knowledge, e.g., some network flows in the data, or require heavy data sanitization or perturbation, both of which may result in a significant loss of data utility. Therefore, the fundamental trade-off between privacy and utility (i.e., analysis accuracy) has attracted significant attention in various settings [ICALP’06, ACM CCS’14]. In line with this track of research, in this dissertation we aim to build utility-maximized and privacy-preserving tools for Internet communications. Such tools can be employed not only by dissidents and whistleblowers, but also by ordinary Internet users on a daily basis. To this end, we combine the development of practical systems with rigorous theoretical analysis, and incorporate techniques from various disciplines such as computer networking, cryptography, and statistical analysis. During the research, we proposed three different frameworks in some well-known settings outlined in the following. First, we propose The Multi-view Approach to preserve both privacy and utility in network trace anonymization, Second, The R2DP Approach which is a novel technique on differentially private mechanism design with maximized utility, and Third, The DPOD Approach that is a novel framework on privacy preserving Anomaly detection in the outsourcing setting

Concordia University Research Repository

Distributed Query Execution With Strong Privacy Guarantees

Author: Papadimitriou Antonios
Publication venue: ScholarlyCommons
Publication date: 01/01/2017
Field of study

As the Internet evolves, we find more applications that involve data originating from multiple sources, and spanning machines located all over the world. Such wide distribution of sensitive data increases the risk of information leakage, and may sometimes inhibit useful applications. For instance, even though banks could share data to detect systemic threats in the US financial network, they hesitate to do so because it can leak business secrets to their competitors. Encryption is an effective way to preserve data confidentiality, but eliminates all processing capabilities. Some approaches enable processing on encrypted data, but they usually have security weaknesses, such as data leakage through side-channels, or require expensive cryptographic computations. In this thesis, we present techniques that address the above limitations. First, we present an efficient symmetric homomorphic encryption scheme, which can aggregate encrypted data at an unprecedented scale. Second, we present a way to efficiently perform secure computations on distributed graphs. To accomplish this, we express large computations as a series of small, parallelizable vertex programs, whose state is safely transferred between vertices using a new cryptographic protocol. Finally, we propose using differential privacy to strengthen the security of trusted processors: noise is added to the side-channels, so that no adversary can extract useful information about individual users. Our experimental results suggest that the presented techniques achieve order-of-magnitude performance improvements over previous approaches, in scenarios such as the business intelligence application of a large corporation and the detection of systemic threats in the US financial network

ScholarlyCommons@Penn