10,723 research outputs found

    Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals

    Full text link
    Max-stable random sketches can be computed efficiently on fast streaming positive data sets by using only sequential access to the data. They can be used to answer point and Lp-norm queries for the signal. There is an intriguing connection between the so-called p-stable (or sum-stable) and the max-stable sketches. Rigorous performance guarantees through error-probability estimates are derived and the algorithmic implementation is discussed

    Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information

    Full text link
    Random sampling is an essential tool in the processing and transmission of data. It is used to summarize data too large to store or manipulate and meet resource constraints on bandwidth or battery power. Estimators that are applied to the sample facilitate fast approximate processing of queries posed over the original data and the value of the sample hinges on the quality of these estimators. Our work targets data sets such as request and traffic logs and sensor measurements, where data is repeatedly collected over multiple {\em instances}: time periods, locations, or snapshots. We are interested in queries that span multiple instances, such as distinct counts and distance measures over selected records. These queries are used for applications ranging from planning to anomaly and change detection. Unbiased low-variance estimators are particularly effective as the relative error decreases with the number of selected record keys. The Horvitz-Thompson estimator, known to minimize variance for sampling with "all or nothing" outcomes (which reveals exacts value or no information on estimated quantity), is not optimal for multi-instance operations for which an outcome may provide partial information. We present a general principled methodology for the derivation of (Pareto) optimal unbiased estimators over sampled instances and aim to understand its potential. We demonstrate significant improvement in estimate accuracy of fundamental queries for common sampling schemes.Comment: This is a full version of a PODS 2011 pape

    Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behaviour

    Full text link
    Rapport, the close and harmonious relationship in which interaction partners are "in sync" with each other, was shown to result in smoother social interactions, improved collaboration, and improved interpersonal outcomes. In this work, we are first to investigate automatic prediction of low rapport during natural interactions within small groups. This task is challenging given that rapport only manifests in subtle non-verbal signals that are, in addition, subject to influences of group dynamics as well as inter-personal idiosyncrasies. We record videos of unscripted discussions of three to four people using a multi-view camera system and microphones. We analyse a rich set of non-verbal signals for rapport detection, namely facial expressions, hand motion, gaze, speaker turns, and speech prosody. Using facial features, we can detect low rapport with an average precision of 0.7 (chance level at 0.25), while incorporating prior knowledge of participants' personalities can even achieve early prediction without a drop in performance. We further provide a detailed analysis of different feature sets and the amount of information contained in different temporal segments of the interactions.Comment: 12 pages, 6 figure

    Norm, Point, and Distance Estimation Over Multiple Signals Using Max-Stable Distributions

    Full text link

    AMS Without 4-Wise Independence on Product Domains

    Get PDF
    In their seminal work, Alon, Matias, and Szegedy introduced several sketching techniques, including showing that 4-wise independence is sufficient to obtain good approximations of the second frequency moment. In this work, we show that their sketching technique can be extended to product domains [n]k[n]^k by using the product of 4-wise independent functions on [n][n]. Our work extends that of Indyk and McGregor, who showed the result for k=2k = 2. Their primary motivation was the problem of identifying correlations in data streams. In their model, a stream of pairs (i,j)∈[n]2(i,j) \in [n]^2 arrive, giving a joint distribution (X,Y)(X,Y), and they find approximation algorithms for how close the joint distribution is to the product of the marginal distributions under various metrics, which naturally corresponds to how close XX and YY are to being independent. By using our technique, we obtain a new result for the problem of approximating the ℓ2\ell_2 distance between the joint distribution and the product of the marginal distributions for kk-ary vectors, instead of just pairs, in a single pass. Our analysis gives a randomized algorithm that is a (1±ϵ)(1 \pm \epsilon) approximation (with probability 1−δ1-\delta) that requires space logarithmic in nn and mm and proportional to 3k3^k
    • …
    corecore