10,723 research outputs found
Max-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals
Max-stable random sketches can be computed efficiently on fast streaming
positive data sets by using only sequential access to the data. They can be
used to answer point and Lp-norm queries for the signal. There is an intriguing
connection between the so-called p-stable (or sum-stable) and the max-stable
sketches. Rigorous performance guarantees through error-probability estimates
are derived and the algorithmic implementation is discussed
Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information
Random sampling is an essential tool in the processing and transmission of
data. It is used to summarize data too large to store or manipulate and meet
resource constraints on bandwidth or battery power. Estimators that are applied
to the sample facilitate fast approximate processing of queries posed over the
original data and the value of the sample hinges on the quality of these
estimators.
Our work targets data sets such as request and traffic logs and sensor
measurements, where data is repeatedly collected over multiple {\em instances}:
time periods, locations, or snapshots.
We are interested in queries that span multiple instances, such as distinct
counts and distance measures over selected records. These queries are used for
applications ranging from planning to anomaly and change detection.
Unbiased low-variance estimators are particularly effective as the relative
error decreases with the number of selected record keys.
The Horvitz-Thompson estimator, known to minimize variance for sampling with
"all or nothing" outcomes (which reveals exacts value or no information on
estimated quantity), is not optimal for multi-instance operations for which an
outcome may provide partial information.
We present a general principled methodology for the derivation of (Pareto)
optimal unbiased estimators over sampled instances and aim to understand its
potential. We demonstrate significant improvement in estimate accuracy of
fundamental queries for common sampling schemes.Comment: This is a full version of a PODS 2011 pape
Detecting Low Rapport During Natural Interactions in Small Groups from Non-Verbal Behaviour
Rapport, the close and harmonious relationship in which interaction partners
are "in sync" with each other, was shown to result in smoother social
interactions, improved collaboration, and improved interpersonal outcomes. In
this work, we are first to investigate automatic prediction of low rapport
during natural interactions within small groups. This task is challenging given
that rapport only manifests in subtle non-verbal signals that are, in addition,
subject to influences of group dynamics as well as inter-personal
idiosyncrasies. We record videos of unscripted discussions of three to four
people using a multi-view camera system and microphones. We analyse a rich set
of non-verbal signals for rapport detection, namely facial expressions, hand
motion, gaze, speaker turns, and speech prosody. Using facial features, we can
detect low rapport with an average precision of 0.7 (chance level at 0.25),
while incorporating prior knowledge of participants' personalities can even
achieve early prediction without a drop in performance. We further provide a
detailed analysis of different feature sets and the amount of information
contained in different temporal segments of the interactions.Comment: 12 pages, 6 figure
AMS Without 4-Wise Independence on Product Domains
In their seminal work, Alon, Matias, and Szegedy introduced several sketching
techniques, including showing that 4-wise independence is sufficient to obtain
good approximations of the second frequency moment. In this work, we show that
their sketching technique can be extended to product domains by using
the product of 4-wise independent functions on . Our work extends that of
Indyk and McGregor, who showed the result for . Their primary motivation
was the problem of identifying correlations in data streams. In their model, a
stream of pairs arrive, giving a joint distribution ,
and they find approximation algorithms for how close the joint distribution is
to the product of the marginal distributions under various metrics, which
naturally corresponds to how close and are to being independent. By
using our technique, we obtain a new result for the problem of approximating
the distance between the joint distribution and the product of the
marginal distributions for -ary vectors, instead of just pairs, in a single
pass. Our analysis gives a randomized algorithm that is a
approximation (with probability ) that requires space logarithmic in
and and proportional to
- …