Recording the dynamics of unscripted human interactions in the wild is
challenging due to the delicate trade-offs between several factors: participant
privacy, ecological validity, data fidelity, and logistical overheads. To
address these, following a 'datasets for the community by the community' ethos,
we propose the Conference Living Lab (ConfLab): a new concept for multimodal
multisensor data collection of in-the-wild free-standing social conversations.
For the first instantiation of ConfLab described here, we organized a real-life
professional networking event at a major international conference. Involving 48
conference attendees, the dataset captures a diverse mix of status,
acquaintance, and networking motivations. Our capture setup improves upon the
data fidelity of prior in-the-wild datasets while retaining privacy
sensitivity: 8 videos (1920x1080, 60 fps) from a non-invasive overhead view,
and custom wearable sensors with onboard recording of body motion (full 9-axis
IMU), privacy-preserving low-frequency audio (1250 Hz), and Bluetooth-based
proximity. Additionally, we developed custom solutions for distributed hardware
synchronization at acquisition, and time-efficient continuous annotation of
body keypoints and actions at high sampling rates. Our benchmarks showcase some
of the open research tasks related to in-the-wild privacy-preserving social
data analysis: keypoints detection from overhead camera views, skeleton-based
no-audio speaker detection, and F-formation detection.Comment: v2 is the version submitted to Neurips 2022 Datasets and Benchmarks
Trac