3,197 research outputs found
Non-contact Multimodal Indoor Human Monitoring Systems: A Survey
Indoor human monitoring systems leverage a wide range of sensors, including
cameras, radio devices, and inertial measurement units, to collect extensive
data from users and the environment. These sensors contribute diverse data
modalities, such as video feeds from cameras, received signal strength
indicators and channel state information from WiFi devices, and three-axis
acceleration data from inertial measurement units. In this context, we present
a comprehensive survey of multimodal approaches for indoor human monitoring
systems, with a specific focus on their relevance in elderly care. Our survey
primarily highlights non-contact technologies, particularly cameras and radio
devices, as key components in the development of indoor human monitoring
systems. Throughout this article, we explore well-established techniques for
extracting features from multimodal data sources. Our exploration extends to
methodologies for fusing these features and harnessing multiple modalities to
improve the accuracy and robustness of machine learning models. Furthermore, we
conduct comparative analysis across different data modalities in diverse human
monitoring tasks and undertake a comprehensive examination of existing
multimodal datasets. This extensive survey not only highlights the significance
of indoor human monitoring systems but also affirms their versatile
applications. In particular, we emphasize their critical role in enhancing the
quality of elderly care, offering valuable insights into the development of
non-contact monitoring solutions applicable to the needs of aging populations.Comment: 19 pages, 5 figure
RGB-D And Thermal Sensor Fusion: A Systematic Literature Review
In the last decade, the computer vision field has seen significant progress
in multimodal data fusion and learning, where multiple sensors, including
depth, infrared, and visual, are used to capture the environment across diverse
spectral ranges. Despite these advancements, there has been no systematic and
comprehensive evaluation of fusing RGB-D and thermal modalities to date. While
autonomous driving using LiDAR, radar, RGB, and other sensors has garnered
substantial research interest, along with the fusion of RGB and depth
modalities, the integration of thermal cameras and, specifically, the fusion of
RGB-D and thermal data, has received comparatively less attention. This might
be partly due to the limited number of publicly available datasets for such
applications. This paper provides a comprehensive review of both,
state-of-the-art and traditional methods used in fusing RGB-D and thermal
camera data for various applications, such as site inspection, human tracking,
fault detection, and others. The reviewed literature has been categorised into
technical areas, such as 3D reconstruction, segmentation, object detection,
available datasets, and other related topics. Following a brief introduction
and an overview of the methodology, the study delves into calibration and
registration techniques, then examines thermal visualisation and 3D
reconstruction, before discussing the application of classic feature-based
techniques as well as modern deep learning approaches. The paper concludes with
a discourse on current limitations and potential future research directions. It
is hoped that this survey will serve as a valuable reference for researchers
looking to familiarise themselves with the latest advancements and contribute
to the RGB-DT research field.Comment: 33 pages, 20 figure
Introduction to multimodal scene understanding
A fundamental goal of computer vision is to discover the semantic information within a given scene, commonly referred to as scene understanding. The overall goal is to find a mapping to derive semantic information from sensor data, which is an extremely challenging task, partially due to the ambiguities in the appearance of the data. However, the majority of the scene understanding tasks tackled so far are mainly involving visual modalities only. In this book, we aim at providing an overview of recent advances in algorithms and applications that involve multiple sources of information for scene understanding. In this context, deep learning models are particularly suitable for combining multiple modalities and, as a matter of fact, many contributions are dealing with such architectures to take benefit of all data streams and obtain optimal performances. We conclude this book’s introduction by a concise description of the rest of the chapters therein contained. They are focused at providing an understanding of the state-of-the-art, open problems, and future directions related to multimodal scene understanding as a scientific discipline.</p
Anti-social behavior detection in audio-visual surveillance systems
In this paper we propose a general purpose framework for
detection of unusual events. The proposed system is based on the unsupervised method for unusual scene detection in web{cam images that was introduced in [1]. We extend their algorithm to accommodate data from different modalities and introduce the concept of time-space blocks. In addition, we evaluate early and late fusion techniques for our audio-visual data features. The experimental results on 192 hours of data show that data fusion of audio and video outperforms using a single modality
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Simultaneous Localization and Mapping (SLAM)consists in the concurrent
construction of a model of the environment (the map), and the estimation of the
state of the robot moving within it. The SLAM community has made astonishing
progress over the last 30 years, enabling large-scale real-world applications,
and witnessing a steady transition of this technology to industry. We survey
the current state of SLAM. We start by presenting what is now the de-facto
standard formulation for SLAM. We then review related work, covering a broad
set of topics including robustness and scalability in long-term mapping, metric
and semantic representations for mapping, theoretical performance guarantees,
active SLAM and exploration, and other new frontiers. This paper simultaneously
serves as a position paper and tutorial to those who are users of SLAM. By
looking at the published research with a critical eye, we delineate open
challenges and new research issues, that still deserve careful scientific
investigation. The paper also contains the authors' take on two questions that
often animate discussions during robotics conferences: Do robots need SLAM? and
Is SLAM solved
"Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection
Stress impacts our physical and mental health as well as our social life. A
passive and contactless indoor stress monitoring system can unlock numerous
important applications such as workplace productivity assessment, smart homes,
and personalized mental health monitoring. While the thermal signatures from a
user's body captured by a thermal camera can provide important information
about the "fight-flight" response of the sympathetic and parasympathetic
nervous system, relying solely on thermal imaging for training a stress
prediction model often lead to overfitting and consequently a suboptimal
performance. This paper addresses this challenge by introducing ThermaStrain, a
novel co-teaching framework that achieves high-stress prediction performance by
transferring knowledge from the wearable modality to the contactless thermal
modality. During training, ThermaStrain incorporates a wearable electrodermal
activity (EDA) sensor to generate stress-indicative representations from
thermal videos, emulating stress-indicative representations from a wearable EDA
sensor. During testing, only thermal sensing is used, and stress-indicative
patterns from thermal data and emulated EDA representations are extracted to
improve stress assessment. The study collected a comprehensive dataset with
thermal video and EDA data under various stress conditions and distances.
ThermaStrain achieves an F1 score of 0.8293 in binary stress classification,
outperforming the thermal-only baseline approach by over 9%. Extensive
evaluations highlight ThermaStrain's effectiveness in recognizing
stress-indicative attributes, its adaptability across distances and stress
scenarios, real-time executability on edge platforms, its applicability to
multi-individual sensing, ability to function on limited visibility and
unfamiliar conditions, and the advantages of its co-teaching approach.Comment: 29 page
- …