Search CORE

17,464 research outputs found

Recommended from our members

Parallel data compression

Author: Hirschberg Daniel S.
Stauffer Lynn M.
Publication venue: eScholarship, University of California
Publication date: 01/05/1991
Field of study

Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested

eScholarship - University of California

Handling Massive N-Gram Datasets Efficiently

Author: Pibiri Giulio Ermanno
Venturini Rossano
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/06/2018
Field of study

This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and estimation, that is computing the probability distribution of the strings from a large textual source. Regarding the problem of indexing, we describe compressed, exact and lossless data structures that achieve, at the same time, high space reductions and no time degradation with respect to state-of-the-art solutions and related software packages. In particular, we present a compressed trie data structure in which each word following a context of fixed length k, i.e., its preceding k words, is encoded as an integer whose value is proportional to the number of words that follow such context. Since the number of words following a given context is typically very small in natural languages, we lower the space of representation to compression levels that were never achieved before. Despite the significant savings in space, our technique introduces a negligible penalty at query time. Regarding the problem of estimation, we present a novel algorithm for estimating modified Kneser-Ney language models, that have emerged as the de-facto choice for language modeling in both academia and industry, thanks to their relatively low perplexity performance. Estimating such models from large textual sources poses the challenge of devising algorithms that make a parsimonious use of the disk. The state-of-the-art algorithm uses three sorting steps in external memory: we show an improved construction that requires only one sorting step thanks to exploiting the properties of the extracted n-gram strings. With an extensive experimental analysis performed on billions of n-grams, we show an average improvement of 4.5X on the total running time of the state-of-the-art approach.Comment: Published in ACM Transactions on Information Systems (TOIS), February 2019, Article No: 2

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

When private set intersection meets big data : an efficient and scalable protocol

Author: Changyu Dong
Hewlett Packard Labs
Liqun Chen
Zikai Wen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Large scale data processing brings new challenges to the design of privacy-preserving protocols: how to meet the increasing requirements of speed and throughput of modern applications, and how to scale up smoothly when data being protected is big. Efficiency and scalability become critical criteria for privacy preserving protocols in the age of Big Data. In this paper, we present a new Private Set Intersection (PSI) protocol that is extremely efficient and highly scalable compared with existing protocols. The protocol is based on a novel approach that we call oblivious Bloom intersection. It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be parallelized easily. The protocol has two versions: a basic protocol and an enhanced protocol, the security of the two variants is analyzed and proved in the semi-honest model and the malicious model respectively. A prototype of the basic protocol has been built. We report the result of performance evaluation and compare it against the two previously fastest PSI protocols. Our protocol is orders of magnitude faster than these two protocols. To compute the intersection of two million-element sets, our protocol needs only 41 seconds (80-bit security) and 339 seconds (256-bit security) on moderate hardware in parallel mode

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

High accuracy computation with linear analog optical systems: a critical study

Author: Athale Ravindra A.
Psaltis Demetri
Publication venue: Optical Society of America
Publication date: 15/09/1986
Field of study

High accuracy optical processors based on the algorithm of digital multiplication by analog convolution (DMAC) are studied for ultimate performance limitations. Variations of optical processors that perform high accuracy vector-vector inner products are studied in abstract and with specific examples. It is concluded that the use of linear analog optical processors in performing digital computations with DMAC leads to impractical requirements for the accuracy of analog optical systems and the complexity of postprocessing electronics

Caltech Authors

Quantum Random Access Codes with Shared Randomness

Author: Ambainis Andris
Leung Debbie
Mancinska Laura
Ozols Maris
Publication venue
Publication date: 01/01/2008
Field of study

We consider a communication method, where the sender encodes n classical bits into 1 qubit and sends it to the receiver who performs a certain measurement depending on which of the initial bits must be recovered. This procedure is called (n,1,p) quantum random access code (QRAC) where p > 1/2 is its success probability. It is known that (2,1,0.85) and (3,1,0.79) QRACs (with no classical counterparts) exist and that (4,1,p) QRAC with p > 1/2 is not possible. We extend this model with shared randomness (SR) that is accessible to both parties. Then (n,1,p) QRAC with SR and p > 1/2 exists for any n > 0. We give an upper bound on its success probability (the known (2,1,0.85) and (3,1,0.79) QRACs match this upper bound). We discuss some particular constructions for several small values of n. We also study the classical counterpart of this model where n bits are encoded into 1 bit instead of 1 qubit and SR is used. We give an optimal construction for such codes and find their success probability exactly--it is less than in the quantum case. Interactive 3D quantum random access codes are available on-line at http://home.lanet.lv/~sd20008/racs .Comment: 51 pages, 33 figures. New sections added: 1.2, 3.5, 3.8.2, 5.4 (paper appears to be shorter because of smaller margins). Submitted as M.Math thesis at University of Waterloo by M

arXiv.org e-Print Archive

CiteSeerX

From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz

Author: Biswas Rupak
Hadfield Stuart
O'Gorman Bryan
Rieffel Eleanor G.
Venturelli Davide
Wang Zhihui
Publication venue: 'MDPI AG'
Publication date: 01/02/2019
Field of study

The next few years will be exciting as prototype universal quantum processors emerge, enabling implementation of a wider variety of algorithms. Of particular interest are quantum heuristics, which require experimentation on quantum hardware for their evaluation, and which have the potential to significantly expand the breadth of quantum computing applications. A leading candidate is Farhi et al.'s Quantum Approximate Optimization Algorithm, which alternates between applying a cost-function-based Hamiltonian and a mixing Hamiltonian. Here, we extend this framework to allow alternation between more general families of operators. The essence of this extension, the Quantum Alternating Operator Ansatz, is the consideration of general parametrized families of unitaries rather than only those corresponding to the time-evolution under a fixed local Hamiltonian for a time specified by the parameter. This ansatz supports the representation of a larger, and potentially more useful, set of states than the original formulation, with potential long-term impact on a broad array of application areas. For cases that call for mixing only within a desired subspace, refocusing on unitaries rather than Hamiltonians enables more efficiently implementable mixers than was possible in the original framework. Such mixers are particularly useful for optimization problems with hard constraints that must always be satisfied, defining a feasible subspace, and soft constraints whose violation we wish to minimize. More efficient implementation enables earlier experimental exploration of an alternating operator approach to a wide variety of approximate optimization, exact optimization, and sampling problems. Here, we introduce the Quantum Alternating Operator Ansatz, lay out design criteria for mixing operators, detail mappings for eight problems, and provide brief descriptions of mappings for diverse problems.Comment: 51 pages, 2 figures. Revised to match journal pape

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Memory-based parallel data output controller

Author: Niswander J. K.
Stattel R. J.
Publication venue
Publication date: 06/03/1984
Field of study

A memory-based parallel data output controller employs associative memories and memory mapping to decommutate multiple channels of telemetry data. The output controller contains a random access memory (RAM) which has at least as many address locations as there are channels. A word counter addresses the RAM which provides as it outputs an encoded peripheral device number and a MSB/LSB-first flag. The encoded device number and a bit counter address a second RAM which contains START and STOP flags to pick out the required bits from the specified word number. The LSB/MSB, START and STOP flags, along with the serial input digital data go to a control block which selectively fills a shift register used to drive the parallel data output bus

NASA Technical Reports Server

Side-Information Coding with Turbo Codes and its Application to Quantum Key Distribution

Author: Cerf Nicolas J.
Nguyen Kim-Chi
Van Assche Gilles
Publication venue
Publication date: 01/01/2004
Field of study

Turbo coding is a powerful class of forward error correcting codes, which can achieve performances close to the Shannon limit. The turbo principle can be applied to the problem of side-information source coding, and we investigate here its application to the reconciliation problem occurring in a continuous-variable quantum key distribution protocol.Comment: 3 pages, submitted to ISITA 200

arXiv.org e-Print Archive

CiteSeerX

DI-fusion