4,345 research outputs found
Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition
A major emerging challenge is how to protect people's privacy as cameras and
computer vision are increasingly integrated into our daily lives, including in
smart devices inside homes. A potential solution is to capture and record just
the minimum amount of information needed to perform a task of interest. In this
paper, we propose a fully-coupled two-stream spatiotemporal architecture for
reliable human action recognition on extremely low resolution (e.g., 12x16
pixel) videos. We provide an efficient method to extract spatial and temporal
features and to aggregate them into a robust feature representation for an
entire action video sequence. We also consider how to incorporate high
resolution videos during training in order to build better low resolution
action recognition models. We evaluate on two publicly-available datasets,
showing significant improvements over the state-of-the-art.Comment: 9 pagers, 5 figures, published in WACV 201
Privacy-Preserving Action Recognition via Motion Difference Quantization
The widespread use of smart computer vision systems in our personal spaces
has led to an increased consciousness about the privacy and security risks that
these systems pose. On the one hand, we want these systems to assist in our
daily lives by understanding their surroundings, but on the other hand, we want
them to do so without capturing any sensitive information. Towards this
direction, this paper proposes a simple, yet robust privacy-preserving encoder
called BDQ for the task of privacy-preserving human action recognition that is
composed of three modules: Blur, Difference, and Quantization. First, the input
scene is passed to the Blur module to smoothen the edges. This is followed by
the Difference module to apply a pixel-wise intensity subtraction between
consecutive frames to highlight motion features and suppress obvious high-level
privacy attributes. Finally, the Quantization module is applied to the motion
difference frames to remove the low-level privacy attributes. The BDQ
parameters are optimized in an end-to-end fashion via adversarial training such
that it learns to allow action recognition attributes while inhibiting privacy
attributes. Our experiments on three benchmark datasets show that the proposed
encoder design can achieve state-of-the-art trade-off when compared with
previous works. Furthermore, we show that the trade-off achieved is at par with
the DVS sensor-based event cameras. Code available at:
https://github.com/suakaw/BDQ_PrivacyAR.Comment: ECCV 202
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
Anyone here? Smart embedded low-resolution omnidirectional video sensor to measure room occupancy
In this paper, we present a room occupancy sensing solution with unique
properties: (i) It is based on an omnidirectional vision camera, capturing rich
scene info over a wide angle, enabling to count the number of people in a room
and even their position. (ii) Although it uses a camera-input, no privacy
issues arise because its extremely low image resolution, rendering people
unrecognisable. (iii) The neural network inference is running entirely on a
low-cost processing platform embedded in the sensor, reducing the privacy risk
even further. (iv) Limited manual data annotation is needed, because of the
self-training scheme we propose. Such a smart room occupancy rate sensor can be
used in e.g. meeting rooms and flex-desks. Indeed, by encouraging flex-desking,
the required office space can be reduced significantly. In some cases, however,
a flex-desk that has been reserved remains unoccupied without an update in the
reservation system. A similar problem occurs with meeting rooms, which are
often under-occupied. By optimising the occupancy rate a huge reduction in
costs can be achieved. Therefore, in this paper, we develop such system which
determines the number of people present in office flex-desks and meeting rooms.
Using an omnidirectional camera mounted in the ceiling, combined with a person
detector, the company can intelligently update the reservation system based on
the measured occupancy. Next to the optimisation and embedded implementation of
such a self-training omnidirectional people detection algorithm, in this work
we propose a novel approach that combines spatial and temporal image data,
improving performance of our system on extreme low-resolution images
Action recognition using single-pixel time-of-flight detection
Action recognition is a challenging task that plays an important role in many robotic systems, which highly depend on visual input feeds. However, due to privacy concerns, it is important to find a method which can recognise actions without using visual feed. In this paper, we propose a concept for detecting actions while preserving the test subject's privacy. Our proposed method relies only on recording the temporal evolution of light pulses scattered back from the scene. Such data trace to record one action contains a sequence of one-dimensional arrays of voltage values acquired by a single-pixel detector at 1 GHz repetition rate. Information about both the distance to the object and its shape are embedded in the traces. We apply machine learning in the form of recurrent neural networks for data analysis and demonstrate successful action recognition. The experimental results show that our proposed method could achieve on average 96.47 % accuracy on the actions walking forward, walking backwards, sitting down, standing up and waving hand, using recurrent neural network
Artificial Intelligence Enabled Methods for Human Action Recognition using Surveillance Videos
Computer vision applications have been attracting researchers and academia. It is more so with cloud computing resources enabling such applications. Analysing video surveillance applications became an important research area due to its widespread applications. For instance, CCTV camera are used in public places in order to monitor situations, identify any theft or crime instances. In presence of thousands of such surveillance videos streaming simultaneously, manual analysis is very tedious and time consuming task. There is need for automated approach for analysis and giving notifications or findings to officers concerned. It is very useful to police and investigation agencies to ascertain facts, recover evidences and even exploit digital forensics. In this context, this paper throws light on different methods of human action recognition (HAR) using machine learning (ML) and deep learning (DL) that come under Artificial Intelligence (AI). It also reviews methods on privacy preserving action recognition and Generative Adversarial Networks (GANs). This paper also provides different datasets being used for human action recognition research besides giving an account of research gaps that help in pursuing further research in the area of human action recognition
Bullying10K: A Large-Scale Neuromorphic Dataset towards Privacy-Preserving Bullying Recognition
The prevalence of violence in daily life poses significant threats to
individuals' physical and mental well-being. Using surveillance cameras in
public spaces has proven effective in proactively deterring and preventing such
incidents. However, concerns regarding privacy invasion have emerged due to
their widespread deployment. To address the problem, we leverage Dynamic Vision
Sensors (DVS) cameras to detect violent incidents and preserve privacy since it
captures pixel brightness variations instead of static imagery. We introduce
the Bullying10K dataset, encompassing various actions, complex movements, and
occlusions from real-life scenarios. It provides three benchmarks for
evaluating different tasks: action recognition, temporal action localization,
and pose estimation. With 10,000 event segments, totaling 12 billion events and
255 GB of data, Bullying10K contributes significantly by balancing violence
detection and personal privacy persevering. And it also poses a challenge to
the neuromorphic dataset. It will serve as a valuable resource for training and
developing privacy-protecting video systems. The Bullying10K opens new
possibilities for innovative approaches in these domains.Comment: Accepted at the 37th Conference on Neural Information Processing
Systems (NeurIPS 2023) Track on Datasets and Benchmark
- …