17,464 research outputs found
Recommended from our members
Parallel data compression
Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested
Handling Massive N-Gram Datasets Efficiently
This paper deals with the two fundamental problems concerning the handling of
large n-gram language models: indexing, that is compressing the n-gram strings
and associated satellite data without compromising their retrieval speed; and
estimation, that is computing the probability distribution of the strings from
a large textual source. Regarding the problem of indexing, we describe
compressed, exact and lossless data structures that achieve, at the same time,
high space reductions and no time degradation with respect to state-of-the-art
solutions and related software packages. In particular, we present a compressed
trie data structure in which each word following a context of fixed length k,
i.e., its preceding k words, is encoded as an integer whose value is
proportional to the number of words that follow such context. Since the number
of words following a given context is typically very small in natural
languages, we lower the space of representation to compression levels that were
never achieved before. Despite the significant savings in space, our technique
introduces a negligible penalty at query time. Regarding the problem of
estimation, we present a novel algorithm for estimating modified Kneser-Ney
language models, that have emerged as the de-facto choice for language modeling
in both academia and industry, thanks to their relatively low perplexity
performance. Estimating such models from large textual sources poses the
challenge of devising algorithms that make a parsimonious use of the disk. The
state-of-the-art algorithm uses three sorting steps in external memory: we show
an improved construction that requires only one sorting step thanks to
exploiting the properties of the extracted n-gram strings. With an extensive
experimental analysis performed on billions of n-grams, we show an average
improvement of 4.5X on the total running time of the state-of-the-art approach.Comment: Published in ACM Transactions on Information Systems (TOIS), February
2019, Article No: 2
When private set intersection meets big data : an efficient and scalable protocol
Large scale data processing brings new challenges to the design of privacy-preserving protocols: how to meet the increasing requirements of speed and throughput of modern applications, and how to scale up smoothly when data being protected is big. Efficiency and scalability become critical criteria for privacy preserving protocols in the age of Big Data. In this paper, we present a new Private Set Intersection (PSI) protocol that is extremely efficient and highly scalable compared with existing protocols. The protocol is based on a novel approach that we call oblivious Bloom intersection. It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be parallelized easily. The protocol has two versions: a basic protocol and an enhanced protocol, the security of the two variants is analyzed and proved in the semi-honest model and the malicious model respectively. A prototype of the basic protocol has been built. We report the result of performance evaluation and compare it against the two previously fastest PSI protocols. Our protocol is orders of magnitude faster than these two protocols. To compute the intersection of two million-element sets, our protocol needs only 41 seconds (80-bit security) and 339 seconds (256-bit security) on moderate hardware in parallel mode
High accuracy computation with linear analog optical systems: a critical study
High accuracy optical processors based on the algorithm of digital multiplication by analog convolution (DMAC) are studied for ultimate performance limitations. Variations of optical processors that perform high accuracy vector-vector inner products are studied in abstract and with specific examples. It is concluded that the use of linear analog optical processors in performing digital computations with DMAC leads to impractical requirements for the accuracy of analog optical systems and the complexity of postprocessing electronics
Quantum Random Access Codes with Shared Randomness
We consider a communication method, where the sender encodes n classical bits
into 1 qubit and sends it to the receiver who performs a certain measurement
depending on which of the initial bits must be recovered. This procedure is
called (n,1,p) quantum random access code (QRAC) where p > 1/2 is its success
probability. It is known that (2,1,0.85) and (3,1,0.79) QRACs (with no
classical counterparts) exist and that (4,1,p) QRAC with p > 1/2 is not
possible.
We extend this model with shared randomness (SR) that is accessible to both
parties. Then (n,1,p) QRAC with SR and p > 1/2 exists for any n > 0. We give an
upper bound on its success probability (the known (2,1,0.85) and (3,1,0.79)
QRACs match this upper bound). We discuss some particular constructions for
several small values of n.
We also study the classical counterpart of this model where n bits are
encoded into 1 bit instead of 1 qubit and SR is used. We give an optimal
construction for such codes and find their success probability exactly--it is
less than in the quantum case.
Interactive 3D quantum random access codes are available on-line at
http://home.lanet.lv/~sd20008/racs .Comment: 51 pages, 33 figures. New sections added: 1.2, 3.5, 3.8.2, 5.4 (paper
appears to be shorter because of smaller margins). Submitted as M.Math thesis
at University of Waterloo by M
From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz
The next few years will be exciting as prototype universal quantum processors
emerge, enabling implementation of a wider variety of algorithms. Of particular
interest are quantum heuristics, which require experimentation on quantum
hardware for their evaluation, and which have the potential to significantly
expand the breadth of quantum computing applications. A leading candidate is
Farhi et al.'s Quantum Approximate Optimization Algorithm, which alternates
between applying a cost-function-based Hamiltonian and a mixing Hamiltonian.
Here, we extend this framework to allow alternation between more general
families of operators. The essence of this extension, the Quantum Alternating
Operator Ansatz, is the consideration of general parametrized families of
unitaries rather than only those corresponding to the time-evolution under a
fixed local Hamiltonian for a time specified by the parameter. This ansatz
supports the representation of a larger, and potentially more useful, set of
states than the original formulation, with potential long-term impact on a
broad array of application areas. For cases that call for mixing only within a
desired subspace, refocusing on unitaries rather than Hamiltonians enables more
efficiently implementable mixers than was possible in the original framework.
Such mixers are particularly useful for optimization problems with hard
constraints that must always be satisfied, defining a feasible subspace, and
soft constraints whose violation we wish to minimize. More efficient
implementation enables earlier experimental exploration of an alternating
operator approach to a wide variety of approximate optimization, exact
optimization, and sampling problems. Here, we introduce the Quantum Alternating
Operator Ansatz, lay out design criteria for mixing operators, detail mappings
for eight problems, and provide brief descriptions of mappings for diverse
problems.Comment: 51 pages, 2 figures. Revised to match journal pape
Memory-based parallel data output controller
A memory-based parallel data output controller employs associative memories and memory mapping to decommutate multiple channels of telemetry data. The output controller contains a random access memory (RAM) which has at least as many address locations as there are channels. A word counter addresses the RAM which provides as it outputs an encoded peripheral device number and a MSB/LSB-first flag. The encoded device number and a bit counter address a second RAM which contains START and STOP flags to pick out the required bits from the specified word number. The LSB/MSB, START and STOP flags, along with the serial input digital data go to a control block which selectively fills a shift register used to drive the parallel data output bus
Side-Information Coding with Turbo Codes and its Application to Quantum Key Distribution
Turbo coding is a powerful class of forward error correcting codes, which can
achieve performances close to the Shannon limit. The turbo principle can be
applied to the problem of side-information source coding, and we investigate
here its application to the reconciliation problem occurring in a
continuous-variable quantum key distribution protocol.Comment: 3 pages, submitted to ISITA 200
- ā¦