17,464 research outputs found

    Handling Massive N-Gram Datasets Efficiently

    Get PDF
    This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and estimation, that is computing the probability distribution of the strings from a large textual source. Regarding the problem of indexing, we describe compressed, exact and lossless data structures that achieve, at the same time, high space reductions and no time degradation with respect to state-of-the-art solutions and related software packages. In particular, we present a compressed trie data structure in which each word following a context of fixed length k, i.e., its preceding k words, is encoded as an integer whose value is proportional to the number of words that follow such context. Since the number of words following a given context is typically very small in natural languages, we lower the space of representation to compression levels that were never achieved before. Despite the significant savings in space, our technique introduces a negligible penalty at query time. Regarding the problem of estimation, we present a novel algorithm for estimating modified Kneser-Ney language models, that have emerged as the de-facto choice for language modeling in both academia and industry, thanks to their relatively low perplexity performance. Estimating such models from large textual sources poses the challenge of devising algorithms that make a parsimonious use of the disk. The state-of-the-art algorithm uses three sorting steps in external memory: we show an improved construction that requires only one sorting step thanks to exploiting the properties of the extracted n-gram strings. With an extensive experimental analysis performed on billions of n-grams, we show an average improvement of 4.5X on the total running time of the state-of-the-art approach.Comment: Published in ACM Transactions on Information Systems (TOIS), February 2019, Article No: 2

    When private set intersection meets big data : an efficient and scalable protocol

    Get PDF
    Large scale data processing brings new challenges to the design of privacy-preserving protocols: how to meet the increasing requirements of speed and throughput of modern applications, and how to scale up smoothly when data being protected is big. Efficiency and scalability become critical criteria for privacy preserving protocols in the age of Big Data. In this paper, we present a new Private Set Intersection (PSI) protocol that is extremely efficient and highly scalable compared with existing protocols. The protocol is based on a novel approach that we call oblivious Bloom intersection. It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be parallelized easily. The protocol has two versions: a basic protocol and an enhanced protocol, the security of the two variants is analyzed and proved in the semi-honest model and the malicious model respectively. A prototype of the basic protocol has been built. We report the result of performance evaluation and compare it against the two previously fastest PSI protocols. Our protocol is orders of magnitude faster than these two protocols. To compute the intersection of two million-element sets, our protocol needs only 41 seconds (80-bit security) and 339 seconds (256-bit security) on moderate hardware in parallel mode

    High accuracy computation with linear analog optical systems: a critical study

    Get PDF
    High accuracy optical processors based on the algorithm of digital multiplication by analog convolution (DMAC) are studied for ultimate performance limitations. Variations of optical processors that perform high accuracy vector-vector inner products are studied in abstract and with specific examples. It is concluded that the use of linear analog optical processors in performing digital computations with DMAC leads to impractical requirements for the accuracy of analog optical systems and the complexity of postprocessing electronics

    Quantum Random Access Codes with Shared Randomness

    Full text link
    We consider a communication method, where the sender encodes n classical bits into 1 qubit and sends it to the receiver who performs a certain measurement depending on which of the initial bits must be recovered. This procedure is called (n,1,p) quantum random access code (QRAC) where p > 1/2 is its success probability. It is known that (2,1,0.85) and (3,1,0.79) QRACs (with no classical counterparts) exist and that (4,1,p) QRAC with p > 1/2 is not possible. We extend this model with shared randomness (SR) that is accessible to both parties. Then (n,1,p) QRAC with SR and p > 1/2 exists for any n > 0. We give an upper bound on its success probability (the known (2,1,0.85) and (3,1,0.79) QRACs match this upper bound). We discuss some particular constructions for several small values of n. We also study the classical counterpart of this model where n bits are encoded into 1 bit instead of 1 qubit and SR is used. We give an optimal construction for such codes and find their success probability exactly--it is less than in the quantum case. Interactive 3D quantum random access codes are available on-line at http://home.lanet.lv/~sd20008/racs .Comment: 51 pages, 33 figures. New sections added: 1.2, 3.5, 3.8.2, 5.4 (paper appears to be shorter because of smaller margins). Submitted as M.Math thesis at University of Waterloo by M

    From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz

    Full text link
    The next few years will be exciting as prototype universal quantum processors emerge, enabling implementation of a wider variety of algorithms. Of particular interest are quantum heuristics, which require experimentation on quantum hardware for their evaluation, and which have the potential to significantly expand the breadth of quantum computing applications. A leading candidate is Farhi et al.'s Quantum Approximate Optimization Algorithm, which alternates between applying a cost-function-based Hamiltonian and a mixing Hamiltonian. Here, we extend this framework to allow alternation between more general families of operators. The essence of this extension, the Quantum Alternating Operator Ansatz, is the consideration of general parametrized families of unitaries rather than only those corresponding to the time-evolution under a fixed local Hamiltonian for a time specified by the parameter. This ansatz supports the representation of a larger, and potentially more useful, set of states than the original formulation, with potential long-term impact on a broad array of application areas. For cases that call for mixing only within a desired subspace, refocusing on unitaries rather than Hamiltonians enables more efficiently implementable mixers than was possible in the original framework. Such mixers are particularly useful for optimization problems with hard constraints that must always be satisfied, defining a feasible subspace, and soft constraints whose violation we wish to minimize. More efficient implementation enables earlier experimental exploration of an alternating operator approach to a wide variety of approximate optimization, exact optimization, and sampling problems. Here, we introduce the Quantum Alternating Operator Ansatz, lay out design criteria for mixing operators, detail mappings for eight problems, and provide brief descriptions of mappings for diverse problems.Comment: 51 pages, 2 figures. Revised to match journal pape

    Memory-based parallel data output controller

    Get PDF
    A memory-based parallel data output controller employs associative memories and memory mapping to decommutate multiple channels of telemetry data. The output controller contains a random access memory (RAM) which has at least as many address locations as there are channels. A word counter addresses the RAM which provides as it outputs an encoded peripheral device number and a MSB/LSB-first flag. The encoded device number and a bit counter address a second RAM which contains START and STOP flags to pick out the required bits from the specified word number. The LSB/MSB, START and STOP flags, along with the serial input digital data go to a control block which selectively fills a shift register used to drive the parallel data output bus

    Side-Information Coding with Turbo Codes and its Application to Quantum Key Distribution

    Full text link
    Turbo coding is a powerful class of forward error correcting codes, which can achieve performances close to the Shannon limit. The turbo principle can be applied to the problem of side-information source coding, and we investigate here its application to the reconciliation problem occurring in a continuous-variable quantum key distribution protocol.Comment: 3 pages, submitted to ISITA 200
    • ā€¦
    corecore