2,637 research outputs found
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Effective Target Aware Visual Navigation for UAVs
In this paper we propose an effective vision-based navigation method that
allows a multirotor vehicle to simultaneously reach a desired goal pose in the
environment while constantly facing a target object or landmark. Standard
techniques such as Position-Based Visual Servoing (PBVS) and Image-Based Visual
Servoing (IBVS) in some cases (e.g., while the multirotor is performing fast
maneuvers) do not allow to constantly maintain the line of sight with a target
of interest. Instead, we compute the optimal trajectory by solving a non-linear
optimization problem that minimizes the target re-projection error while
meeting the UAV's dynamic constraints. The desired trajectory is then tracked
by means of a real-time Non-linear Model Predictive Controller (NMPC): this
implicitly allows the multirotor to satisfy both the required constraints. We
successfully evaluate the proposed approach in many real and simulated
experiments, making an exhaustive comparison with a standard approach.Comment: Conference paper at "European Conference on Mobile Robotics" (ECMR)
201
A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors Data for Automatic Multimodal Human Activity Recognition System
This paper presents a novel multimodal human activity recognition system. It
uses a two-stream decision level fusion of vision and inertial sensors. In the
first stream, raw RGB frames are passed to a part affinity field-based pose
estimation network to detect the keypoints of the user. These keypoints are
then pre-processed and inputted in a sliding window fashion to a specially
designed convolutional neural network for the spatial feature extraction
followed by regularized LSTMs to calculate the temporal features. The outputs
of LSTM networks are then inputted to fully connected layers for
classification. In the second stream, data obtained from inertial sensors are
pre-processed and inputted to regularized LSTMs for the feature extraction
followed by fully connected layers for the classification. At this stage, the
SoftMax scores of two streams are then fused using the decision level fusion
which gives the final prediction. Extensive experiments are conducted to
evaluate the performance. Four multimodal standard benchmark datasets (UP-Fall
detection, UTD-MHAD, Berkeley-MHAD, and C-MHAD) are used for experimentations.
The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and
95.9 % respectively on the UP-Fall Detection, UTDMHAD, Berkeley-MHAD, and
C-MHAD datasets. These results are far superior than the current
state-of-the-art methods
HETEROGENEOUS MULTI-SENSOR FUSION FOR 2D AND 3D POSE ESTIMATION
Sensor fusion is a process in which data from different sensors is combined to acquire an output that cannot be obtained from individual sensors. This dissertation first considers a 2D image level real world problem from rail industry and proposes a novel solution using sensor fusion, then proceeds further to the more complicated 3D problem of multi sensor fusion for UAV pose estimation.
One of the most important safety-related tasks in the rail industry is an early detection of defective rolling stock components. Railway wheels and wheel bearings are two components prone to damage due to their interactions with the brakes and railway track, which makes them a high priority when rail industry investigates improvements to current detection processes. The main contribution of this dissertation in this area is development of a computer vision method for automatically detecting the defective wheels that can potentially become a replacement for the current manual inspection procedure. The algorithm fuses images taken by wayside thermal and vision cameras and uses the outcome for the wheel defect detection. As a byproduct, the process will also include a method for detecting hot bearings from the same images. We evaluate our algorithm using simulated and real data images from UPRR in North America and it will be shown in this dissertation that using sensor fusion techniques the accuracy of the malfunction detection can be improved.
After the 2D application, the more complicated 3D application is addressed. Precise, robust and consistent localization is an important subject in many areas of science such as vision-based control, path planning, and SLAM. Each of different sensors employed to estimate the pose have their strengths and weaknesses. Sensor fusion is a known approach that combines the data measured by different sensors to achieve a more accurate or complete pose estimation and to cope with sensor outages. In this dissertation, a new approach to 3D pose estimation for a UAV in an unknown GPS-denied environment is presented. The proposed algorithm fuses the data from an IMU, a camera, and a 2D LiDAR to achieve accurate localization. Among the employed sensors, LiDAR has not received proper attention in the past; mostly because a 2D LiDAR can only provide pose estimation in its scanning plane and thus it cannot obtain full pose estimation in a 3D environment. A novel method is introduced in this research that enables us to employ a 2D LiDAR to improve the full 3D pose estimation accuracy acquired from an IMU and a camera. To the best of our knowledge 2D LiDAR has never been employed for 3D localization without a prior map and it is shown in this dissertation that our method can significantly improve the precision of the localization algorithm. The proposed approach is evaluated and justified by simulation and real world experiments
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
- …