1,614 research outputs found
Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges
Pedestrian detection has become a cornerstone for several high-level tasks,
including autonomous driving, intelligent transportation, and traffic
surveillance. There are several works focussed on pedestrian detection using
visible images, mainly in the daytime. However, this task is very intriguing
when the environmental conditions change to poor lighting or nighttime.
Recently, new ideas have been spurred to use alternative sources, such as Far
InfraRed (FIR) temperature sensor feeds for detecting pedestrians in low-light
conditions. This study comprehensively reviews recent developments in low-light
pedestrian detection approaches. It systematically categorizes and analyses
various algorithms from region-based to non-region-based and graph-based
learning methodologies by highlighting their methodologies, implementation
issues, and challenges. It also outlines the key benchmark datasets that can be
used for research and development of advanced pedestrian detection algorithms,
particularly in low-light situation
ActAR: Actor-Driven Pose Embeddings for Video Action Recognition
Human action recognition (HAR) in videos is one of the core tasks of video
understanding. Based on video sequences, the goal is to recognize actions
performed by humans. While HAR has received much attention in the visible
spectrum, action recognition in infrared videos is little studied. Accurate
recognition of human actions in the infrared domain is a highly challenging
task because of the redundant and indistinguishable texture features present in
the sequence. Furthermore, in some cases, challenges arise from the irrelevant
information induced by the presence of multiple active persons not contributing
to the actual action of interest. Therefore, most existing methods consider a
standard paradigm that does not take into account these challenges, which is in
some part due to the ambiguous definition of the recognition task in some
cases. In this paper, we propose a new method that simultaneously learns to
recognize efficiently human actions in the infrared spectrum, while
automatically identifying the key-actors performing the action without using
any prior knowledge or explicit annotations. Our method is composed of three
stages. In the first stage, optical flow-based key-actor identification is
performed. Then for each key-actor, we estimate key-poses that will guide the
frame selection process. A scale-invariant encoding process along with embedded
pose filtering are performed in order to enhance the quality of action
representations. Experimental results on InfAR dataset show that our proposed
model achieves promising recognition performance and learns useful action
representations
Physical Adversarial Attacks for Surveillance: A Survey
Modern automated surveillance techniques are heavily reliant on deep learning
methods. Despite the superior performance, these learning systems are
inherently vulnerable to adversarial attacks - maliciously crafted inputs that
are designed to mislead, or trick, models into making incorrect predictions. An
adversary can physically change their appearance by wearing adversarial
t-shirts, glasses, or hats or by specific behavior, to potentially avoid
various forms of detection, tracking and recognition of surveillance systems;
and obtain unauthorized access to secure properties and assets. This poses a
severe threat to the security and safety of modern surveillance systems. This
paper reviews recent attempts and findings in learning and designing physical
adversarial attacks for surveillance applications. In particular, we propose a
framework to analyze physical adversarial attacks and provide a comprehensive
survey of physical adversarial attacks on four key surveillance tasks:
detection, identification, tracking, and action recognition under this
framework. Furthermore, we review and analyze strategies to defend against the
physical adversarial attacks and the methods for evaluating the strengths of
the defense. The insights in this paper present an important step in building
resilience within surveillance systems to physical adversarial attacks
Meta-Transformer: A Unified Framework for Multimodal Learning
Multimodal learning aims to build models that can process and relate
information from multiple modalities. Despite years of development in this
field, it still remains challenging to design a unified network for processing
various modalities ( natural language, 2D images, 3D point
clouds, audio, video, time series, tabular data) due to the inherent gaps among
them. In this work, we propose a framework, named Meta-Transformer, that
leverages a encoder to perform multimodal perception without
any paired multimodal training data. In Meta-Transformer, the raw input data
from various modalities are mapped into a shared token space, allowing a
subsequent encoder with frozen parameters to extract high-level semantic
features of the input data. Composed of three main components: a unified data
tokenizer, a modality-shared encoder, and task-specific heads for downstream
tasks, Meta-Transformer is the first framework to perform unified learning
across 12 modalities with unpaired data. Experiments on different benchmarks
reveal that Meta-Transformer can handle a wide range of tasks including
fundamental perception (text, image, point cloud, audio, video), practical
application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph,
tabular, and time-series). Meta-Transformer indicates a promising future for
developing unified multimodal intelligence with transformers. Code will be
available at https://github.com/invictus717/MetaTransformerComment: Project website: https://kxgong.github.io/meta_transformer
- …