3,146 research outputs found
Multi-sensor data fusion techniques for RPAS detect, track and avoid
Accurate and robust tracking of objects is of growing interest amongst the computer vision scientific community. The ability of a multi-sensor system to detect and track objects, and accurately predict their future trajectory is critical in the context of mission- and safety-critical applications. Remotely Piloted Aircraft System (RPAS) are currently not equipped to routinely access all classes of airspace since certified Detect-and-Avoid (DAA) systems are yet to be developed. Such capabilities can be achieved by incorporating both cooperative and non-cooperative DAA functions, as well as providing enhanced communications, navigation and surveillance (CNS) services. DAA is highly dependent on the performance of CNS systems for Detection, Tacking and avoiding (DTA) tasks and maneuvers. In order to perform an effective detection of objects, a number of high performance, reliable and accurate avionics sensors and systems are adopted including non-cooperative sensors (visual and thermal cameras, Laser radar (LIDAR) and acoustic sensors) and cooperative systems (Automatic Dependent Surveillance-Broadcast (ADS-B) and Traffic Collision Avoidance System (TCAS)). In this paper the sensors and system information candidates are fully exploited in a Multi-Sensor Data Fusion (MSDF) architecture. An Unscented Kalman Filter (UKF) and a more advanced Particle Filter (PF) are adopted to estimate the state vector of the objects based for maneuvering and non-maneuvering DTA tasks. Furthermore, an artificial neural network is conceptualised/adopted to exploit the use of statistical learning methods, which acts to combined information obtained from the UKF and PF. After describing the MSDF architecture, the key mathematical models for data fusion are presented. Conceptual studies are carried out on visual and thermal image fusion architectures
Multi-camera and Multi-modal Sensor Fusion, an Architecture Overview
Proceedings of: Forth International Workshop on User-Centric Technologies and applications (CONTEXTS 2010). Valencia, 07-10 September , 2010.This paper outlines an architecture formulti-camera andmulti-modal sensor fusion.We define a high-level architecture in which image sensors like standard color, thermal, and time of flight cameras can be fused with high accuracy location systems based on UWB, Wifi, Bluetooth or RFID technologies. This architecture is specially well-suited for indoor environments, where such heterogeneous sensors usually coexists. The main advantage of such a system is that a combined nonredundant output is provided for all the detected targets. The fused output includes in its simplest form the location of each target, including additional features depending of the sensors involved in the target detection, e.g., location plus thermal information. This way, a surveillance or context-aware system obtains more accurate and complete information than only using one kind of technologyThis work was supported in part by Projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, SINPROB, CAM CONTEXTS S2009/TIC-1485 and DPS2008-07029-C02-02Publicad
Architecture, Protocols, and Algorithms for Location-Aware Services in Beyond 5G Networks
The automotive and railway industries are rapidly transforming with a strong
drive towards automation and digitalization, with the goal of increased
convenience, safety, efficiency, and sustainability. Since assisted and fully
automated automotive and train transport services increasingly rely on
vehicle-to-everything communications, and high-accuracy real-time positioning,
it is necessary to continuously maintain high-accuracy localization, even in
occlusion scenes such as tunnels, urban canyons, or areas covered by dense
foliage. In this paper, we review the 5G positioning framework of the 3rd
Generation Partnership Project in terms of methods and architecture and propose
enhancements to meet the stringent requirements imposed by the transport
industry. In particular, we highlight the benefit of fusing cellular and sensor
measurements and discuss required architecture and protocol support for
achieving this at the network side. We also propose a positioning framework to
fuse cellular network measurements with measurements by onboard sensors. We
illustrate the viability of the proposed fusion-based positioning approach
using a numerical example.Comment: 7 pages, 5 figures, accepted for publication in IEEE Communications
Standards Magazin
Online Video Instance Segmentation via Robust Context Fusion
Video instance segmentation (VIS) aims at classifying, segmenting and
tracking object instances in video sequences. Recent transformer-based neural
networks have demonstrated their powerful capability of modeling
spatio-temporal correlations for the VIS task. Relying on video- or clip-level
input, they suffer from high latency and computational cost. We propose a
robust context fusion network to tackle VIS in an online fashion, which
predicts instance segmentation frame-by-frame with a few preceding frames. To
acquire the precise and temporal-consistent prediction for each frame
efficiently, the key idea is to fuse effective and compact context from
reference frames into the target frame. Considering the different effects of
reference and target frames on the target prediction, we first summarize
contextual features through importance-aware compression. A transformer encoder
is adopted to fuse the compressed context. Then, we leverage an
order-preserving instance embedding to convey the identity-aware information
and correspond the identities to predicted instance masks. We demonstrate that
our robust fusion network achieves the best performance among existing online
VIS methods and is even better than previously published clip-level methods on
the Youtube-VIS 2019 and 2021 benchmarks. In addition, visual objects often
have acoustic signatures that are naturally synchronized with them in
audio-bearing video recordings. By leveraging the flexibility of our context
fusion network on multi-modal data, we further investigate the influence of
audios on the video-dense prediction task, which has never been discussed in
existing works. We build up an Audio-Visual Instance Segmentation dataset, and
demonstrate that acoustic signals in the wild scenarios could benefit the VIS
task
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Combining heterogeneous inputs for the development of adaptive and multimodal interaction systems
In this paper we present a novel framework for the integration of visual sensor networks and speech-based interfaces. Our proposal follows the standard reference architecture in fusion systems (JDL), and combines different techniques related to Artificial Intelligence, Natural Language Processing and User Modeling to provide an enhanced interaction with their users. Firstly, the framework integrates a Cooperative Surveillance Multi-Agent System (CS-MAS), which includes several types of autonomous agents working in a coalition to track and make inferences on the positions of the targets. Secondly, enhanced conversational agents facilitate human-computer interaction by means of speech interaction. Thirdly, a statistical methodology allows modeling the user conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. A technique is proposed to facilitate the multimodal fusion of these information sources and consider the result for the decision of the next system action.This work was supported in part by Projects MEyC TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02, CAM CONTEXTS S2009/TIC-1485Publicad
Robust sensor fusion in real maritime surveillance scenarios
8 pages, 14 figures.-- Proceedings of: 13th International Conference on Information Fusion (FUSION'2010), Edinburgh, Scotland, UK, Jul 26-29, 2010).This paper presents the design and evaluation
of a sensor fusion system for maritime surveillance. The system must exploit the complementary AIS-radar sensing technologies to synthesize a reliable surveillance
picture using a highly efficient implementation to operate in dense scenarios. The paper highlights the
realistic effects taken into account for robust data combination and system scalability.This work was supported in part by a national project with NUCLEO CC, and research projects CICYT TEC2008-06732-C02-02/TEC, CICYT TIN2008-06742-C02-02/TSI, SINPROB, CAM CONTEXTS S2009/TIC-1485 and DPS2008-07029-C02-02.Publicad
- âŠ