86 research outputs found
Intelligent Multi-Modal Sensing-Communication Integration: Synesthesia of Machines
In the era of sixth-generation (6G) wireless communications, integrated
sensing and communications (ISAC) is recognized as a promising solution to
upgrade the physical system by endowing wireless communications with sensing
capability. Existing ISAC is mainly oriented to static scenarios with
radio-frequency (RF) sensors being the primary participants, thus lacking a
comprehensive environment feature characterization and facing a severe
performance bottleneck in dynamic environments. To date, extensive surveys on
ISAC have been conducted but are limited to summarizing RF-based radar sensing.
Currently, some research efforts have been devoted to exploring multi-modal
sensing-communication integration but still lack a comprehensive review.
Therefore, we generalize the concept of ISAC inspired by human synesthesia to
establish a unified framework of intelligent multi-modal sensing-communication
integration and provide a comprehensive review under such a framework in this
paper. The so-termed Synesthesia of Machines (SoM) gives the clearest cognition
of such intelligent integration and details its paradigm for the first time. We
commence by justifying the necessity of the new paradigm. Subsequently, we
offer a definition of SoM and zoom into the detailed paradigm, which is
summarized as three operation modes. To facilitate SoM research, we overview
the prerequisite of SoM research, i.e., mixed multi-modal (MMM) datasets. Then,
we introduce the mapping relationships between multi-modal sensing and
communications. Afterward, we cover the technological review on
SoM-enhance-based and SoM-concert-based applications. To corroborate the
superiority of SoM, we also present simulation results related to dual-function
waveform and predictive beamforming design. Finally, we propose some potential
directions to inspire future research efforts.Comment: This paper has been accepted by IEEE Communications Surveys &
Tutorial
Non-contact Multimodal Indoor Human Monitoring Systems: A Survey
Indoor human monitoring systems leverage a wide range of sensors, including
cameras, radio devices, and inertial measurement units, to collect extensive
data from users and the environment. These sensors contribute diverse data
modalities, such as video feeds from cameras, received signal strength
indicators and channel state information from WiFi devices, and three-axis
acceleration data from inertial measurement units. In this context, we present
a comprehensive survey of multimodal approaches for indoor human monitoring
systems, with a specific focus on their relevance in elderly care. Our survey
primarily highlights non-contact technologies, particularly cameras and radio
devices, as key components in the development of indoor human monitoring
systems. Throughout this article, we explore well-established techniques for
extracting features from multimodal data sources. Our exploration extends to
methodologies for fusing these features and harnessing multiple modalities to
improve the accuracy and robustness of machine learning models. Furthermore, we
conduct comparative analysis across different data modalities in diverse human
monitoring tasks and undertake a comprehensive examination of existing
multimodal datasets. This extensive survey not only highlights the significance
of indoor human monitoring systems but also affirms their versatile
applications. In particular, we emphasize their critical role in enhancing the
quality of elderly care, offering valuable insights into the development of
non-contact monitoring solutions applicable to the needs of aging populations.Comment: 19 pages, 5 figure
Recommended from our members
From active to passive spatial acoustic sensing and applications
The active acoustic sensing system emits modulated acoustic waves and analyzes reflection signals. It is dominant in acoustic spatial sensing. On the other side, the passive acoustic sensing system receives and investigates nature sounds directly. It is good at semantic tasks but has weak performance on spatial sensing. In this dissertation, we manage to bridge three gaps in existing systems. They are the gap between the assumption of signal processing algorithms and the real acoustic environment, the gap between powerful active spatial sensing and limited passive spatial sensing, and the gap between the semantic features and spatial information. We evolve the acoustic sensing system design and extend the functionalities by three novel systems.
First, we develop a fully active spatial sensing system DeepRange which can adapt to the real environment easily. We develop an effective mechanism to generate synthetic training data that captures noise, speaker/mic distortion, and interference in the signals. It removes the need of collecting a large volume of data. We then design a deep range neural network (DRNet) to estimate the distance from raw acoustic signals. It is inspired by signal processing that an ultra-long convolution kernel size helps to combat noise and interference. The model is fully trained over synthetic data, but it can achieve sub-centimeter error robustly in real data despite various environments, background noise, interference, and mobile phone models.
Second, we develop a fused active and passive spatial sensing system for speech separation noted as Spatial Aware Multi-task learning-based Separation (SAMS). We leverage both active sensing and passive sensing to improve AoA estimation and jointly optimize the semantic task and the spatial task. SAMS estimates the spatial location and extracts speech for the target user during teleconferencing simultaneously. We first generate fine-grained spatial embeddings from the userâs voice and inaudible tracking sound, which contains the userâs position and rich multipath information. Furthermore, we develop a deep neural network with multi-task learning to jointly optimize source separation and location. We significantly speed up inference to provide a real-time guarantee.
Finally, we deeply fuse the semantic features and spatial cues to combat the interference and noise in the real environment as well as enable depth sensing in a fully passive setup. Inspired by the âflash-to-bangâ phenomenon (i.e.hearing the thunder after seeing the lightning), we propose FBDepth to measure the depth of the sound source. We formulate the problem as an audio-visual event localization task for collision events. Specifically, FBDepth first aligns correspondence between the video track and audio track to locate the target object and target sound in a coarse granularity. Based on the observation of moving objectsâ trajectories, it proposes to estimate the intersection of optical flow before and after the collision to locate video events in time. It feeds the estimated timestamp of the video event and the other modalities for the final depth estimation. We use a mobile phone to collect the 3.6K+ video clips involving 24 different objects at up to 60m. FBDepth shows superior performance especially at a long range compared to monocular and stereo methods.Computer Science
Walking Speed Detection from 5G prototype System
While most RF-sensing approaches proposed in the literature rely on short-distance indoor point-to-point instrumentation, actual large-scale installation of RF sensing suggests the use of ubiquitously available cellular systems. In particular, the 5th generation of the wireless communication standard (5G) is envisioned as a universal communication means also for Internet of Things devices.
This thesis presents an investigation of device-free environmental perception capabilities in a 5G prototype system in two cases; walking speed and human presence detection, and elaborate a comparison with the former case and acceleration sensing analysis. This thesis attempts to analyze the perception capabilities of 5G system in order to recognize human mostly common activities and presence detection near transceiver devices which the instrumentation exploits a device-free system capable of detect activities without carrying devices capitalizing on environmental RF-noise. This is done via the study of existing and related literature. After that, the implementation and evaluation of walking speed and presence detection is described in details. In addition, evaluation consists of utilizing a prototypical 5G system with 52 OFDM carriers over 12.48 MHz bandwidth at 3.45 GHz, which we consider the impact of the number and choice of channels and compare the recognition performance with acceleration-based sensing. It was concluded that in realistic settings with five subjects, accurate recognition of activities and environmental situations can be a reliable implicit service of future 5G installations
Multi-User Gesture Recognition with Radar Technology
The aim of this work is the development of a Radar system for consumer applications. It is capable of tracking multiple people in a room and offers a touchless human-machine interface for purposes that range from entertainment to hygiene
- âŠ