2,637 research outputs found

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Effective Target Aware Visual Navigation for UAVs

    Full text link
    In this paper we propose an effective vision-based navigation method that allows a multirotor vehicle to simultaneously reach a desired goal pose in the environment while constantly facing a target object or landmark. Standard techniques such as Position-Based Visual Servoing (PBVS) and Image-Based Visual Servoing (IBVS) in some cases (e.g., while the multirotor is performing fast maneuvers) do not allow to constantly maintain the line of sight with a target of interest. Instead, we compute the optimal trajectory by solving a non-linear optimization problem that minimizes the target re-projection error while meeting the UAV's dynamic constraints. The desired trajectory is then tracked by means of a real-time Non-linear Model Predictive Controller (NMPC): this implicitly allows the multirotor to satisfy both the required constraints. We successfully evaluate the proposed approach in many real and simulated experiments, making an exhaustive comparison with a standard approach.Comment: Conference paper at "European Conference on Mobile Robotics" (ECMR) 201

    A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors Data for Automatic Multimodal Human Activity Recognition System

    Full text link
    This paper presents a novel multimodal human activity recognition system. It uses a two-stream decision level fusion of vision and inertial sensors. In the first stream, raw RGB frames are passed to a part affinity field-based pose estimation network to detect the keypoints of the user. These keypoints are then pre-processed and inputted in a sliding window fashion to a specially designed convolutional neural network for the spatial feature extraction followed by regularized LSTMs to calculate the temporal features. The outputs of LSTM networks are then inputted to fully connected layers for classification. In the second stream, data obtained from inertial sensors are pre-processed and inputted to regularized LSTMs for the feature extraction followed by fully connected layers for the classification. At this stage, the SoftMax scores of two streams are then fused using the decision level fusion which gives the final prediction. Extensive experiments are conducted to evaluate the performance. Four multimodal standard benchmark datasets (UP-Fall detection, UTD-MHAD, Berkeley-MHAD, and C-MHAD) are used for experimentations. The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and 95.9 % respectively on the UP-Fall Detection, UTDMHAD, Berkeley-MHAD, and C-MHAD datasets. These results are far superior than the current state-of-the-art methods

    HETEROGENEOUS MULTI-SENSOR FUSION FOR 2D AND 3D POSE ESTIMATION

    Get PDF
    Sensor fusion is a process in which data from different sensors is combined to acquire an output that cannot be obtained from individual sensors. This dissertation first considers a 2D image level real world problem from rail industry and proposes a novel solution using sensor fusion, then proceeds further to the more complicated 3D problem of multi sensor fusion for UAV pose estimation. One of the most important safety-related tasks in the rail industry is an early detection of defective rolling stock components. Railway wheels and wheel bearings are two components prone to damage due to their interactions with the brakes and railway track, which makes them a high priority when rail industry investigates improvements to current detection processes. The main contribution of this dissertation in this area is development of a computer vision method for automatically detecting the defective wheels that can potentially become a replacement for the current manual inspection procedure. The algorithm fuses images taken by wayside thermal and vision cameras and uses the outcome for the wheel defect detection. As a byproduct, the process will also include a method for detecting hot bearings from the same images. We evaluate our algorithm using simulated and real data images from UPRR in North America and it will be shown in this dissertation that using sensor fusion techniques the accuracy of the malfunction detection can be improved. After the 2D application, the more complicated 3D application is addressed. Precise, robust and consistent localization is an important subject in many areas of science such as vision-based control, path planning, and SLAM. Each of different sensors employed to estimate the pose have their strengths and weaknesses. Sensor fusion is a known approach that combines the data measured by different sensors to achieve a more accurate or complete pose estimation and to cope with sensor outages. In this dissertation, a new approach to 3D pose estimation for a UAV in an unknown GPS-denied environment is presented. The proposed algorithm fuses the data from an IMU, a camera, and a 2D LiDAR to achieve accurate localization. Among the employed sensors, LiDAR has not received proper attention in the past; mostly because a 2D LiDAR can only provide pose estimation in its scanning plane and thus it cannot obtain full pose estimation in a 3D environment. A novel method is introduced in this research that enables us to employ a 2D LiDAR to improve the full 3D pose estimation accuracy acquired from an IMU and a camera. To the best of our knowledge 2D LiDAR has never been employed for 3D localization without a prior map and it is shown in this dissertation that our method can significantly improve the precision of the localization algorithm. The proposed approach is evaluated and justified by simulation and real world experiments

    Semantic Visual Localization

    Full text link
    Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, e.g., in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes
    • …
    corecore