648 research outputs found
Augmenting and Sharing Memory with eyeBlog
eyeBlog is an automatic personal video recording and publishing system. It consists of ECSGlasses [1], which are a pair of glasses augmented with a wireless eye contact and glyph sensing camera, and a web application that visualizes the video from the ECSGlasses camera as chronologically delineated blog entries. The blog format allows for easy annotation, grading, cataloging and searching of video segments by the wearer or anyone else with internet access. eyeBlog reduces the editing effort of video bloggers by recording video only when something of interest is registered by the camera. Interest is determined by a combination of independent methods. For example, recording can automatically be triggered upon detection of eye contact towards the wearer of the glasses, allowing all face-to-face interactions to be recorded. Recording can also be triggered by the detection of image patterns such as glyphs in the frame of the camera. This allows the wearer to record their interactions with any object that has an associated unique marker. Finally, by pressing a button the user can manually initiate recording
PrivacEye: Privacy-Preserving Head-Mounted Eye Tracking Using Egocentric Scene Image and Eye Movement Features
Eyewear devices, such as augmented reality displays, increasingly integrate
eye tracking but the first-person camera required to map a user's gaze to the
visual scene can pose a significant threat to user and bystander privacy. We
present PrivacEye, a method to detect privacy-sensitive everyday situations and
automatically enable and disable the eye tracker's first-person camera using a
mechanical shutter. To close the shutter in privacy-sensitive situations, the
method uses a deep representation of the first-person video combined with rich
features that encode users' eye movements. To open the shutter without visual
input, PrivacEye detects changes in users' eye movements alone to gauge changes
in the "privacy level" of the current situation. We evaluate our method on a
first-person video dataset recorded in daily life situations of 17
participants, annotated by themselves for privacy sensitivity, and show that
our method is effective in preserving privacy in this challenging setting.Comment: 10 pages, 6 figures, supplementary materia
Covariance Intersection to Improve the Robustness of the Photoplethysmogram Derived Respiratory Rate
Respiratory rate (RR) can be estimated from the photoplethysmogram (PPG)
recorded by optical sensors in wearable devices. The fusion of estimates from
different PPG features has lead to an increase in accuracy, but also reduced
the numbers of available final estimates due to discarding of unreliable data.
We propose a novel, tunable fusion algorithm using covariance intersection to
estimate the RR from PPG (CIF). The algorithm is adaptive to the number of
available feature estimates and takes each estimates' trustworthiness into
account. In a benchmarking experiment using the CapnoBase dataset with
reference RR from capnography, we compared the CIF against the state-of-the-art
Smart Fusion (SF) algorithm. The median root mean square error was 1.4
breaths/min for the CIF and 1.8 breaths/min for the SF. The CIF significantly
increased the retention rate distribution of all recordings from 0.46 to 0.90
(p 0.001). The agreement with the reference RR was high with a Pearson's
correlation coefficient of 0.94, a bias of 0.3 breaths/min, and limits of
agreement of -4.6 and 5.2 breaths/min. In addition, the algorithm was
computationally efficient. Therefore, CIF could contribute to a more robust RR
estimation from wearable PPG recordings.Comment: accepted to EMBC 202
Sensing, interpreting, and anticipating human social behaviour in the real world
Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie
ConfLab: A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions in the Wild
Recording the dynamics of unscripted human interactions in the wild is
challenging due to the delicate trade-offs between several factors: participant
privacy, ecological validity, data fidelity, and logistical overheads. To
address these, following a 'datasets for the community by the community' ethos,
we propose the Conference Living Lab (ConfLab): a new concept for multimodal
multisensor data collection of in-the-wild free-standing social conversations.
For the first instantiation of ConfLab described here, we organized a real-life
professional networking event at a major international conference. Involving 48
conference attendees, the dataset captures a diverse mix of status,
acquaintance, and networking motivations. Our capture setup improves upon the
data fidelity of prior in-the-wild datasets while retaining privacy
sensitivity: 8 videos (1920x1080, 60 fps) from a non-invasive overhead view,
and custom wearable sensors with onboard recording of body motion (full 9-axis
IMU), privacy-preserving low-frequency audio (1250 Hz), and Bluetooth-based
proximity. Additionally, we developed custom solutions for distributed hardware
synchronization at acquisition, and time-efficient continuous annotation of
body keypoints and actions at high sampling rates. Our benchmarks showcase some
of the open research tasks related to in-the-wild privacy-preserving social
data analysis: keypoints detection from overhead camera views, skeleton-based
no-audio speaker detection, and F-formation detection.Comment: v2 is the version submitted to Neurips 2022 Datasets and Benchmarks
Trac
A Modular Approach for Synchronized Wireless Multimodal Multisensor Data Acquisition in Highly Dynamic Social Settings
Existing data acquisition literature for human behavior research provides
wired solutions, mainly for controlled laboratory setups. In uncontrolled
free-standing conversation settings, where participants are free to walk
around, these solutions are unsuitable. While wireless solutions are employed
in the broadcasting industry, they can be prohibitively expensive. In this
work, we propose a modular and cost-effective wireless approach for
synchronized multisensor data acquisition of social human behavior. Our core
idea involves a cost-accuracy trade-off by using Network Time Protocol (NTP) as
a source reference for all sensors. While commonly used as a reference in
ubiquitous computing, NTP is widely considered to be insufficiently accurate as
a reference for video applications, where Precision Time Protocol (PTP) or
Global Positioning System (GPS) based references are preferred. We argue and
show, however, that the latency introduced by using NTP as a source reference
is adequate for human behavior research, and the subsequent cost and modularity
benefits are a desirable trade-off for applications in this domain. We also
describe one instantiation of the approach deployed in a real-world experiment
to demonstrate the practicality of our setup in-the-wild.Comment: 9 pages, 8 figures, Proceedings of the 28th ACM International
Conference on Multimedia (MM '20), October 12--16, 2020, Seattle, WA, USA.
First two authors contributed equall
- …