1,977 research outputs found

    Pseudo-Deterministic Streaming

    Get PDF
    A pseudo-deterministic algorithm is a (randomized) algorithm which, when run multiple times on the same input, with high probability outputs the same result on all executions. Classic streaming algorithms, such as those for finding heavy hitters, approximate counting, ?_2 approximation, finding a nonzero entry in a vector (for turnstile algorithms) are not pseudo-deterministic. For example, in the instance of finding a nonzero entry in a vector, for any known low-space algorithm A, there exists a stream x so that running A twice on x (using different randomness) would with high probability result in two different entries as the output. In this work, we study whether it is inherent that these algorithms output different values on different executions. That is, we ask whether these problems have low-memory pseudo-deterministic algorithms. For instance, we show that there is no low-memory pseudo-deterministic algorithm for finding a nonzero entry in a vector (given in a turnstile fashion), and also that there is no low-dimensional pseudo-deterministic sketching algorithm for ?_2 norm estimation. We also exhibit problems which do have low memory pseudo-deterministic algorithms but no low memory deterministic algorithm, such as outputting a nonzero row of a matrix, or outputting a basis for the row-span of a matrix. We also investigate multi-pseudo-deterministic algorithms: algorithms which with high probability output one of a few options. We show the first lower bounds for such algorithms. This implies that there are streaming problems such that every low space algorithm for the problem must have inputs where there are many valid outputs, all with a significant probability of being outputted

    Dynamics on expanding spaces: modeling the emergence of novelties

    Full text link
    Novelties are part of our daily lives. We constantly adopt new technologies, conceive new ideas, meet new people, experiment with new situations. Occasionally, we as individuals, in a complicated cognitive and sometimes fortuitous process, come up with something that is not only new to us, but to our entire society so that what is a personal novelty can turn into an innovation at a global level. Innovations occur throughout social, biological and technological systems and, though we perceive them as a very natural ingredient of our human experience, little is known about the processes determining their emergence. Still the statistical occurrence of innovations shows striking regularities that represent a starting point to get a deeper insight in the whole phenomenology. This paper represents a small step in that direction, focusing on reviewing the scientific attempts to effectively model the emergence of the new and its regularities, with an emphasis on more recent contributions: from the plain Simon's model tracing back to the 1950s, to the newest model of Polya's urn with triggering of one novelty by another. What seems to be key in the successful modelling schemes proposed so far is the idea of looking at evolution as a path in a complex space, physical, conceptual, biological, technological, whose structure and topology get continuously reshaped and expanded by the occurrence of the new. Mathematically it is very interesting to look at the consequences of the interplay between the "actual" and the "possible" and this is the aim of this short review.Comment: 25 pages, 10 figure

    Weighted Reservoir Sampling from Distributed Streams

    Get PDF
    We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights are equal, is well studied, and admits tight upper and lower bounds on message complexity. For weighted sampling with replacement, there is a simple reduction to unweighted sampling with replacement. However, in many applications the stream has only a few heavy items which may dominate a random sample when chosen with replacement. Weighted sampling \textit{without replacement} (weighted SWOR) eludes this issue, since such heavy items can be sampled at most once. In this work, we present the first message-optimal algorithm for weighted SWOR from a distributed stream. Our algorithm also has optimal space and time complexity. As an application of our algorithm for weighted SWOR, we derive the first distributed streaming algorithms for tracking \textit{heavy hitters with residual error}. Here the goal is to identify stream items that contribute significantly to the residual stream, once the heaviest items are removed. Residual heavy hitters generalize the notion of 1\ell_1 heavy hitters and are important in streams that have a skewed distribution of weights. In addition to the upper bound, we also provide a lower bound on the message complexity that is nearly tight up to a log(1/ϵ)\log(1/\epsilon) factor. Finally, we use our weighted sampling algorithm to improve the message complexity of distributed L1L_1 tracking, also known as count tracking, which is a widely studied problem in distributed streaming. We also derive a tight message lower bound, which closes the message complexity of this fundamental problem.Comment: To appear in PODS 201

    Probabilistic linkage without personal information successfully linked national clinical datasets: Linkage of national clinical datasets without patient identifiers using probabilistic methods.

    Get PDF
    BACKGROUND: Probabilistic linkage can link patients from different clinical databases without the need for personal information. If accurate linkage can be achieved, it would accelerate the use of linked datasets to address important clinical and public health questions. OBJECTIVE: We developed a step-by-step process for probabilistic linkage of national clinical and administrative datasets without personal information, and validated it against deterministic linkage using patient identifiers. STUDY DESIGN AND SETTING: We used electronic health records from the National Bowel Cancer Audit (NBOCA) and Hospital Episode Statistics (HES) databases for 10,566 bowel cancer patients undergoing emergency surgery in the English National Health Service. RESULTS: Probabilistic linkage linked 81.4% of NBOCA records to HES, versus 82.8% using deterministic linkage. No systematic differences were seen between patients that were and were not linked, and regression models for mortality and length of hospital stay according to patient and tumour characteristics were not sensitive to the linkage approach. CONCLUSION: Probabilistic linkage was successful in linking national clinical and administrative datasets for patients undergoing a major surgical procedure. It allows analysts outside highly secure data environments to undertake linkage while minimising costs and delays, protecting data security, and maintaining linkage quality

    Determinisitic Optical Fock State Generation

    Get PDF
    We present a scheme for the deterministic generation of N-photon Fock states from N three-level atoms in a high-finesse optical cavity. The method applies an external laser pulsethat generates an NN-photon output state while adiabatically keeping the atom-cavity system within a subspace of optically dark states. We present analytical estimates of the error due to amplitude leakage from these dark states for general N, and compare it with explicit results of numerical simulations for N \leq 5. The method is shown to provide a robust source of N-photon states under a variety of experimental conditions and is suitable for experimental implementation using a cloud of cold atoms magnetically trapped in a cavity. The resulting N-photon states have potential applications in fundamental studies of non-classical states and in quantum information processing.Comment: 25 pages, 9 figure
    corecore