9 research outputs found

    Affirmative sampling: theory and applications

    Get PDF
    Affirmative Sampling is a practical and efficient novel algorithm to obtain random samples of distinct elements from a data stream. Its most salient feature is that the size S of the sample will, on expectation, grow with the (unknown) number n of distinct elements in the data stream. As any distinct element has the same probability to be sampled, and the sample size is greater when the “diversity” (the number of distinct elements) is greater, the samples that Affirmative Sampling delivers are more representative than those produced by any scheme where the sample size is fixed a priori - hence its name. Our algorithm is straightforward to implement, and several implementations already exist.This work has been supported by funds from the MOTION Project (Project PID2020-112581GB-C21) of the Spanish Ministry of Science & Innovation MCIN/AEI/10.13039/501100011033, and by Princeton University, and its Department of Computer Science.Peer ReviewedPostprint (published version

    Fiat–Shamir Transformation of Multi-Round Interactive Proofs (Extended Version)

    Get PDF
    The celebrated Fiat–Shamir transformation turns any public-coin interactive proof into a non-interactive one, which inherits the main security properties (in the random oracle model) of the interactive version. While originally considered in the context of 3-move public-coin interactive proofs, i.e., so-called Σ-protocols, it is now applied to multi-round protocols as well. Unfortunately, the security loss for a (2μ+1)-move protocol is, in general, approximately Qμ, where Q is the number of oracle queries performed by the attacker. In general, this is the best one can hope for, as it is easy to see that this loss applies to the μ-fold sequential repetition of Σ-protocols, but it raises the question whether certain (natural) classes of interactive proofs feature a milder security loss. In this work, we give positive and negative results on this question. On the positive side, we show that for (k1_1,…,kμ)-special-sound protocols (which cover a broad class of use cases), the knowledge error degrades linearly in Q, instead of Qμ. On the negative side, we show that for t-fold parallel repetitions of typical (k1_1,…,kμ)-special-sound protocols with t≥μ (and assuming for simplicity that t and Q are integer multiples of μ), there is an attack that results in a security loss of approximately 12\frac{1}{2}Qμμ+t^{μ+t}

    Random sampling with a reservoir

    Full text link

    Ανίχνευση συμβάντων και δειγματοληψία δεξαμενής σε ασύρματα δίκτυα αισθητήρων

    Get PDF
    Στόχος της παρούσας πτυχιακής εργασίας είναι η ανάλυση των τιμών θερμοκρασίας που αποστέλλει ένα ασύρματο δίκτυο αισθητήρων, χρησιμοποιώντας δύο διαφορετικές υλοποιήσεις. Η πρώτη, αφορά στην εφαρμογή του αλγορίθμου Shewhart για κάθε απεσταλμένη τιμή, σε πραγματικό χρόνο, ενώ η δεύτερη αφορά στη συσσώρευση τιμών και την εξαγωγή δείγματος, μέσω ενός αλγορίθμου δειγματοληψίας δεξαμενής και έπειτα την τροφοδότηση του μεγίστου (ή μέσου όρου) αυτού του δείγματος, ως είσοδο στον αλγόριθμο Shewhart. Τέλος, η αποτελεσματικότητα των δύο υλοποιήσεων συγκρίνεται με τη βοήθεια της απόστασης Hamming.The main aim of this thesis is to analyze the temperature values sent over a wireless sensor network using two different implementations. The first one implements Shewhart algorithm processing every value sent in real time, whereas the second one, accumulates the values and exports a sample using a reservoir algorithm. From that sample, called reservoir, the max or the average value is extracted and then forwarded as input in the Shewhart algorithm. Finally, the efficiency of the two outputs is compared by means of Hamming distance

    Fiat–Shamir transformation of multi-round interactive proofs (extended version)

    Get PDF
    The celebrated Fiat–Shamir transformation turns any public-coin interactive proof into a non-interactive one, which inherits the main security properties (in the random oracle model) of the interactive version. While originally considered in the context of 3-move public-coin interactive proofs, i.e., so-called Σ -protocols, it is now applied to multi-round protocols as well. Unfortunately, the security loss for a (2 μ+ 1) -move protocol is, in general, approximately Qμ , where Q is the number of oracle queries performed by the attacker. In general, this is the best one can hope for, as it is easy to see that this loss applies to the μ -fold sequential repetition of Σ -protocols, but it raises the question whether certain (natural) classes of interactive proofs feature a milder security loss. In this work, we give positive and negative results on this question. On the positive side, we show that for (k1, … , kμ) -special-sound protocols (which cover a broad class of use cases), the knowledge error degrades linearly in Q, instead of Qμ . On the negative side, we show that for t-fold parallel repetitions of typical (k1, … , kμ) -special-sound protocols with t≥ μ (and assuming for simplicity that t and Q are integer multiples of μ), there is an attack that results in a security loss of approximately 12Qμ/μμ+t

    Sampling Algorithms for Evolving Datasets

    Get PDF
    Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such samples are widely used to speed up the processing of analytic queries and data-mining tasks, to enhance query optimization, and to facilitate information integration. Most of the existing work on database sampling focuses on how to create or exploit a random sample of a static database, that is, a database that does not change over time. The assumption of a static database, however, severely limits the applicability of these techniques in practice, where data is often not static but continuously evolving. In order to maintain the statistical validity of the sample, any changes to the database have to be appropriately reflected in the sample. In this thesis, we study efficient methods for incrementally maintaining a uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions, updates, and deletions. We consider instances of the maintenance problem that arise when sampling from an evolving set, from an evolving multiset, from the distinct items in an evolving multiset, or from a sliding window over a data stream. Our algorithms completely avoid any accesses to the base data and can be several orders of magnitude faster than algorithms that do rely on such expensive accesses. The improved efficiency of our algorithms comes at virtually no cost: the resulting samples are provably uniform and only a small amount of auxiliary information is associated with the sample. We show that the auxiliary information not only facilitates efficient maintenance, but it can also be exploited to derive unbiased, low-variance estimators for counts, sums, averages, and the number of distinct items in the underlying dataset. In addition to sample maintenance, we discuss methods that greatly improve the flexibility of random sampling from a system's point of view. More specifically, we initiate the study of algorithms that resize a random sample upwards or downwards. Our resizing algorithms can be exploited to dynamically control the size of the sample when the dataset grows or shrinks; they facilitate resource management and help to avoid under- or oversized samples. Furthermore, in large-scale databases with data being distributed across several remote locations, it is usually infeasible to reconstruct the entire dataset for the purpose of sampling. To address this problem, we provide efficient algorithms that directly combine the local samples maintained at each location into a sample of the global dataset. We also consider a more general problem, where the global dataset is defined as an arbitrary set or multiset expression involving the local datasets, and provide efficient solutions based on hashing

    Engineering truly automated data integration and translation systems

    Get PDF
    This thesis presents an automated, data-driven integration process for relational databases. Whereas previous integration methods assumed a large amount of user involvement as well as the availability of database meta-data, we make no use of meta-data and little end user input. This is done using a novel join and translation finding algorithm that searches for the proper key / foreign key relationships while inferring the instance transformations from one database to another. Because we rely only on the relations that bind the attributes together, we make no use of the database schema information. A novel searching method allows us to search the database for relevant objects without requiring server side indexes or cooperative databases

    A note on sampling a tape-file

    No full text
    corecore