37,392 research outputs found
Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition
We present a unified framework for understanding human social behaviors in
raw image sequences. Our model jointly detects multiple individuals, infers
their social actions, and estimates the collective actions with a single
feed-forward pass through a neural network. We propose a single architecture
that does not rely on external detection algorithms but rather is trained
end-to-end to generate dense proposal maps that are refined via a novel
inference scheme. The temporal consistency is handled via a person-level
matching Recurrent Neural Network. The complete model takes as input a sequence
of frames and outputs detections along with the estimates of individual actions
and collective activities. We demonstrate state-of-the-art performance of our
algorithm on multiple publicly available benchmarks
Event-based Face Detection and Tracking in the Blink of an Eye
We present the first purely event-based method for face detection using the
high temporal resolution of an event-based camera. We will rely on a new
feature that has never been used for such a task that relies on detecting eye
blinks. Eye blinks are a unique natural dynamic signature of human faces that
is captured well by event-based sensors that rely on relative changes of
luminance. Although an eye blink can be captured with conventional cameras, we
will show that the dynamics of eye blinks combined with the fact that two eyes
act simultaneously allows to derive a robust methodology for face detection at
a low computational cost and high temporal resolution. We show that eye blinks
have a unique temporal signature over time that can be easily detected by
correlating the acquired local activity with a generic temporal model of eye
blinks that has been generated from a wide population of users. We furthermore
show that once the face is reliably detected it is possible to apply a
probabilistic framework to track the spatial position of a face for each
incoming event while updating the position of trackers. Results are shown for
several indoor and outdoor experiments. We will also release an annotated data
set that can be used for future work on the topic
- …