482 research outputs found
CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and
their resolution () is low for practical applications.
Actually, only visible cameras are deployed in many practical systems, and the
newly designed neuromorphic cameras may have different resolutions. The latest
neuromorphic sensors can output high-definition event streams, but it is very
difficult to achieve strict alignment between events and frames on both spatial
and temporal views. Therefore, how to achieve accurate tracking with unaligned
neuromorphic and visible sensors is a valuable but unresearched problem. In
this work, we formally propose the task of object tracking using unaligned
neuromorphic and visible cameras. We build the first unaligned frame-event
dataset CRSOT collected with a specially built data acquisition system, which
contains 1,030 high-definition RGB-Event video pairs, 304,974 video frames. In
addition, we propose a novel unaligned object tracking framework that can
realize robust tracking even using the loosely aligned RGB-Event data.
Specifically, we extract the template and search regions of RGB and Event data
and feed them into a unified ViT backbone for feature embedding. Then, we
propose uncertainty perception modules to encode the RGB and Event features,
respectively, then, we propose a modality uncertainty fusion module to
aggregate the two modalities. These three branches are jointly optimized in the
training phase. Extensive experiments demonstrate that our tracker can
collaborate the dual modalities for high-performance tracking even without
strictly temporal and spatial alignment. The source code, dataset, and
pre-trained models will be released at
https://github.com/Event-AHU/Cross_Resolution_SOT.Comment: In Peer Revie
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
Tracking using bio-inspired event cameras has drawn more and more attention
in recent years. Existing works either utilize aligned RGB and event data for
accurate tracking or directly learn an event-based tracker. The first category
needs more cost for inference and the second one may be easily influenced by
noisy events or sparse spatial resolution. In this paper, we propose a novel
hierarchical knowledge distillation framework that can fully utilize
multi-modal / multi-view information during training to facilitate knowledge
transfer, enabling us to achieve high-speed and low-latency visual tracking
during testing by using only event signals. Specifically, a teacher
Transformer-based multi-modal tracking framework is first trained by feeding
the RGB frame and event stream simultaneously. Then, we design a new
hierarchical knowledge distillation strategy which includes pairwise
similarity, feature representation, and response maps-based knowledge
distillation to guide the learning of the student Transformer network.
Moreover, since existing event-based tracking datasets are all low-resolution
(), we propose the first large-scale high-resolution () dataset named EventVOT. It contains 1141 videos and covers a wide
range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc.
Extensive experiments on both low-resolution (FE240hz, VisEvent, COESOT), and
our newly proposed high-resolution EventVOT dataset fully validated the
effectiveness of our proposed method. The dataset, evaluation toolkit, and
source code are available on
\url{https://github.com/Event-AHU/EventVOT_Benchmark
A Temporal Densely Connected Recurrent Network for Event-based Human Pose Estimation
Event camera is an emerging bio-inspired vision sensors that report per-pixel
brightness changes asynchronously. It holds noticeable advantage of high
dynamic range, high speed response, and low power budget that enable it to best
capture local motions in uncontrolled environments. This motivates us to unlock
the potential of event cameras for human pose estimation, as the human pose
estimation with event cameras is rarely explored. Due to the novel paradigm
shift from conventional frame-based cameras, however, event signals in a time
interval contain very limited information, as event cameras can only capture
the moving body parts and ignores those static body parts, resulting in some
parts to be incomplete or even disappeared in the time interval. This paper
proposes a novel densely connected recurrent architecture to address the
problem of incomplete information. By this recurrent architecture, we can
explicitly model not only the sequential but also non-sequential geometric
consistency across time steps to accumulate information from previous frames to
recover the entire human bodies, achieving a stable and accurate human pose
estimation from event data. Moreover, to better evaluate our model, we collect
a large scale multimodal event-based dataset that comes with human pose
annotations, which is by far the most challenging one to the best of our
knowledge. The experimental results on two public datasets and our own dataset
demonstrate the effectiveness and strength of our approach. Code can be
available online for facilitating the future research
Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric
Combining the Color and Event cameras (also called Dynamic Vision Sensors,
DVS) for robust object tracking is a newly emerging research topic in recent
years. Existing color-event tracking framework usually contains multiple
scattered modules which may lead to low efficiency and high computational
complexity, including feature extraction, fusion, matching, interactive
learning, etc. In this paper, we propose a single-stage backbone network for
Color-Event Unified Tracking (CEUTrack), which achieves the above functions
simultaneously. Given the event points and RGB frames, we first transform the
points into voxels and crop the template and search regions for both
modalities, respectively. Then, these regions are projected into tokens and
parallelly fed into the unified Transformer backbone network. The output
features will be fed into a tracking head for target object localization. Our
proposed CEUTrack is simple, effective, and efficient, which achieves over 75
FPS and new SOTA performance. To better validate the effectiveness of our model
and address the data deficiency of this task, we also propose a generic and
large-scale benchmark dataset for color-event tracking, termed COESOT, which
contains 90 categories and 1354 video sequences. Additionally, a new evaluation
metric named BOC is proposed in our evaluation toolkit to evaluate the
prominence with respect to the baseline methods. We hope the newly proposed
method, dataset, and evaluation metric provide a better platform for
color-event-based tracking. The dataset, toolkit, and source code will be
released on: \url{https://github.com/Event-AHU/COESOT}
A Survey of Computer Vision Methods for 2D Object Detection from Unmanned Aerial Vehicles
The spread of Unmanned Aerial Vehicles (UAVs) in the last decade revolutionized many applications fields. Most investigated research topics focus on increasing autonomy during operational campaigns, environmental monitoring, surveillance, maps, and labeling. To achieve such complex goals, a high-level module is exploited to build semantic knowledge leveraging the outputs of the low-level module that takes data acquired from multiple sensors and extracts information concerning what is sensed. All in all, the detection of the objects is undoubtedly the most important low-level task, and the most employed sensors to accomplish it are by far RGB cameras due to costs, dimensions, and the wide literature on RGB-based object detection. This survey presents recent advancements in 2D object detection for the case of UAVs, focusing on the differences, strategies, and trade-offs between the generic problem of object detection, and the adaptation of such solutions for operations of the UAV. Moreover, a new taxonomy that considers different heights intervals and driven by the methodological approaches introduced by the works in the state of the art instead of hardware, physical and/or technological constraints is proposed
Advances in Automated Driving Systems
Electrification, automation of vehicle control, digitalization and new mobility are the mega-trends in automotive engineering, and they are strongly connected. While many demonstrations for highly automated vehicles have been made worldwide, many challenges remain in bringing automated vehicles to the market for private and commercial use. The main challenges are as follows: reliable machine perception; accepted standards for vehicle-type approval and homologation; verification and validation of the functional safety, especially at SAE level 3+ systems; legal and ethical implications; acceptance of vehicle automation by occupants and society; interaction between automated and human-controlled vehicles in mixed traffic; human–machine interaction and usability; manipulation, misuse and cyber-security; the system costs of hard- and software and development efforts. This Special Issue was prepared in the years 2021 and 2022 and includes 15 papers with original research related to recent advances in the aforementioned challenges. The topics of this Special Issue cover: Machine perception for SAE L3+ driving automation; Trajectory planning and decision-making in complex traffic situations; X-by-Wire system components; Verification and validation of SAE L3+ systems; Misuse, manipulation and cybersecurity; Human–machine interactions, driver monitoring and driver-intention recognition; Road infrastructure measures for the introduction of SAE L3+ systems; Solutions for interactions between human- and machine-controlled vehicles in mixed traffic
BIO-INSPIRED MOTION PERCEPTION: FROM GANGLION CELLS TO AUTONOMOUS VEHICLES
Animals are remarkable at navigation, even in extreme situations. Through motion perception, animals compute their own movements (egomotion) and find other objects (prey, predator, obstacles) and their motions in the environment. Analogous to animals, artificial systems such as robots also need to know where they are relative to structure and segment obstacles to avoid collisions. Even though substantial progress has been made in the development of artificial visual systems, they still struggle to achieve robust and generalizable solutions. To this end, I propose a bio-inspired framework that narrows the gap between natural and artificial systems.
The standard approaches in robot motion perception seek to reconstruct a three-dimensional model of the scene and then use this model to estimate egomotion and object segmentation. However, the scene reconstruction process is data-heavy and computationally expensive and fails to deal with high-speed and dynamic scenarios. On the contrary, biological visual systems excel in the aforementioned difficult situation by extracting only minimal information sufficient for motion perception tasks. I derive minimalist/purposive ideas from biological processes throughout this thesis and develop mathematical solutions for robot motion perception problems.
In this thesis, I develop a full range of solutions that utilize bio-inspired motion representation and learning approaches for motion perception tasks. Particularly, I focus on egomotion estimation and motion segmentation tasks. I have four main contributions: 1. First, I introduce NFlowNet, a neural network to estimate normal flow (bio-inspired motion filters). Normal flow estimation presents a new avenue for solving egomotion in a robust and qualitative framework. 2. Utilizing normal flow, I propose the DiffPoseNet framework to estimate egomotion by formulating the qualitative constraint in a differentiable optimization layer, which allows for end-to-end learning. 3. Further, utilizing a neuromorphic event camera, a retina-inspired vision sensor, I develop 0-MMS, a model-based optimization approach that employs event spikes to segment the scene into multiple moving parts in high-speed dynamic lighting scenarios. 4. To improve the precision of event-based motion perception across time, I develop SpikeMS, a novel bio-inspired learning approach that fully capitalizes on the rich temporal information in event spikes
Towards Interoperable Research Infrastructures for Environmental and Earth Sciences
This open access book summarises the latest developments on data management in the EU H2020 ENVRIplus project, which brought together more than 20 environmental and Earth science research infrastructures into a single community. It provides readers with a systematic overview of the common challenges faced by research infrastructures and how a ‘reference model guided’ engineering approach can be used to achieve greater interoperability among such infrastructures in the environmental and earth sciences. The 20 contributions in this book are structured in 5 parts on the design, development, deployment, operation and use of research infrastructures. Part one provides an overview of the state of the art of research infrastructure and relevant e-Infrastructure technologies, part two discusses the reference model guided engineering approach, the third part presents the software and tools developed for common data management challenges, the fourth part demonstrates the software via several use cases, and the last part discusses the sustainability and future directions
Vision-based legged robot navigation: localisation, local planning, learning
The recent advances in legged locomotion control have made legged robots walk up staircases, go deep into underground caves, and walk in the forest. Nevertheless, autonomously achieving this task is still a challenge. Navigating and acomplishing missions in the wild relies not only on robust low-level controllers but also higher-level representations and perceptual systems that are aware of the robot's capabilities.
This thesis addresses the navigation problem for legged robots. The contributions are four systems designed to exploit unique characteristics of these platforms, from the sensing setup to their advanced mobility skills over different terrain. The systems address localisation, scene understanding, and local planning, and advance the capabilities of legged robots in challenging environments.
The first contribution tackles localisation with multi-camera setups available on legged platforms. It proposes a strategy to actively switch between the cameras and stay localised while operating in a visual teach and repeat context---in spite of transient changes in the environment. The second contribution focuses on local planning, effectively adding a safety layer for robot navigation. The approach uses a local map built on-the-fly to generate efficient vector field representations that enable fast and reactive navigation. The third contribution demonstrates how to improve local planning in natural environments by learning robot-specific traversability from demonstrations. The approach leverages classical and learning-based methods to enable online, onboard traversability learning. These systems are demonstrated via different robot deployments on industrial facilities, underground mines, and parklands.
The thesis concludes by presenting a real-world application: an autonomous forest inventory system with legged robots. This last contribution presents a mission planning system for autonomous surveying as well as a data analysis pipeline to extract forestry attributes. The approach was experimentally validated in a field campaign in Finland, evidencing the potential that legged platforms offer for future applications in the wild
- …