Search CORE

87 research outputs found

Motion Segmentation from a Moving Monocular Camera

Author: Huang Yuxiang
Zelek John
Publication venue
Publication date: 24/09/2023
Field of study

Identifying and segmenting moving objects from a moving monocular camera is difficult when there is unknown camera motion, different types of object motions and complex scene structures. To tackle these challenges, we take advantage of two popular branches of monocular motion segmentation approaches: point trajectory based and optical flow based methods, by synergistically fusing these two highly complementary motion cues at object level. By doing this, we are able to model various complex object motions in different scene structures at once, which has not been achieved by existing methods. We first obtain object-specific point trajectories and optical flow mask for each common object in the video, by leveraging the recent foundational models in object recognition, segmentation and tracking. We then construct two robust affinity matrices representing the pairwise object motion affinities throughout the whole video using epipolar geometry and the motion information provided by optical flow. Finally, co-regularized multi-view spectral clustering is used to fuse the two affinity matrices and obtain the final clustering. Our method shows state-of-the-art performance on the KT3DMoSeg dataset, which contains complex motions and scene structures. Being able to identify moving objects allows us to remove them for map building when using visual SLAM or SFM.Comment: Accepted by IROS 2023 Workshop on Robotic Perception And Mapping: Frontier Vision and Learning Technique

arXiv.org e-Print Archive

Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency

Author: Im Sunghoon
Kweon In So
Lee Seokju
Lin Stephen
Publication venue
Publication date: 04/02/2021
Field of study

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our technical contributions are three-fold. First, we highlight the fundamental difference between inverse and forward projection while modeling the individual motion of each rigid object, and propose a geometrically correct projection pipeline using a neural forward projection module. Second, we design a unified instance-aware photometric and geometric consistency loss that holistically imposes self-supervisory signals for every background and object region. Lastly, we introduce a general-purpose auto-annotation scheme using any off-the-shelf instance segmentation and optical flow models to produce video instance segmentation maps that will be utilized as input to our training pipeline. These proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI and Cityscapes dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are available at https://github.com/SeokjuLee/Insta-DM .Comment: Accepted to AAAI 2021. Code/dataset/models are available at https://github.com/SeokjuLee/Insta-DM. arXiv admin note: substantial text overlap with arXiv:1912.0935

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

DGIST Library Institutional Repository

EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity

Author: Jiang Zijie
Okutomi Masatoshi
Publication venue
Publication date: 03/09/2023
Field of study

Self-supervised monocular scene flow estimation, aiming to understand both 3D structures and 3D motions from two temporally consecutive monocular images, has received increasing attention for its simple and economical sensor setup. However, the accuracy of current methods suffers from the bottleneck of less-efficient network architecture and lack of motion rigidity for regularization. In this paper, we propose a superior model named EMR-MSF by borrowing the advantages of network architecture design under the scope of supervised learning. We further impose explicit and robust geometric constraints with an elaborately constructed ego-motion aggregation module where a rigidity soft mask is proposed to filter out dynamic regions for stable ego-motion estimation using static regions. Moreover, we propose a motion consistency loss along with a mask regularization loss to fully exploit static regions. Several efficient training strategies are integrated including a gradient detachment technique and an enhanced view synthesis process for better performance. Our proposed method outperforms the previous self-supervised works by a large margin and catches up to the performance of supervised methods. On the KITTI scene flow benchmark, our approach improves the SF-all metric of the state-of-the-art self-supervised monocular method by 44% and demonstrates superior performance across sub-tasks including depth and visual odometry, amongst other self-supervised single-task or multi-task methods.Comment: To appear at ICCV 202

arXiv.org e-Print Archive

Automatic vehicle detection and tracking in aerial video

Author: Xiyan Chen (6015296)
Publication venue
Publication date: 01/01/2016
Field of study

This thesis is concerned with the challenging tasks of automatic and real-time vehicle detection and tracking from aerial video. The aim of this thesis is to build an automatic system that can accurately localise any vehicles that appear in aerial video frames and track the target vehicles with trackers. Vehicle detection and tracking have many applications and this has been an active area of research during recent years; however, it is still a challenge to deal with certain realistic environments. This thesis develops vehicle detection and tracking algorithms which enhance the robustness of detection and tracking beyond the existing approaches. The basis of the vehicle detection system proposed in this thesis has different object categorisation approaches, with colour and texture features in both point and area template forms. The thesis also proposes a novel Self-Learning Tracking and Detection approach, which is an extension to the existing Tracking Learning Detection (TLD) algorithm. There are a number of challenges in vehicle detection and tracking. The most difficult challenge of detection is distinguishing and clustering the target vehicle from the background objects and noises. Under certain conditions, the images captured from Unmanned Aerial Vehicles (UAVs) are also blurred; for example, turbulence may make the vehicle shake during flight. This thesis tackles these challenges by applying integrated multiple feature descriptors for real-time processing. In this thesis, three vehicle detection approaches are proposed: the HSV-GLCM feature approach, the ISM-SIFT feature approach and the FAST-HoG approach. The general vehicle detection approaches used have highly flexible implicit shape representations. They are based on training samples in both positive and negative sets and use updated classifiers to distinguish the targets. It has been found that the detection results attained by using HSV-GLCM texture features can be affected by blurring problems; the proposed detection algorithms can further segment the edges of the vehicles from the background. Using the point descriptor feature can solve the blurring problem, however, the large amount of information contained in point descriptors can lead to processing times that are too long for real-time applications. So the FAST-HoG approach combining the point feature and the shape feature is proposed. This new approach is able to speed up the process that attains the real-time performance. Finally, a detection approach using HoG with the FAST feature is also proposed. The HoG approach is widely used in object recognition, as it has a strong ability to represent the shape vector of the object. However, the original HoG feature is sensitive to the orientation of the target; this method improves the algorithm by inserting the direction vectors of the targets. For the tracking process, a novel tracking approach was proposed, an extension of the TLD algorithm, in order to track multiple targets. The extended approach upgrades the original system, which can only track a single target, which must be selected before the detection and tracking process. The greatest challenge to vehicle tracking is long-term tracking. The target object can change its appearance during the process and illumination and scale changes can also occur. The original TLD feature assumed that tracking can make errors during the tracking process, and the accumulation of these errors could cause tracking failure, so the original TLD proposed using a learning approach in between the tracking and the detection by adding a pair of inspectors (positive and negative) to constantly estimate errors. This thesis extends the TLD approach with a new detection method in order to achieve multiple-target tracking. A Forward and Backward Tracking approach has been proposed to eliminate tracking errors and other problems such as occlusion. The main purpose of the proposed tracking system is to learn the features of the targets during tracking and re-train the detection classifier for further processes. This thesis puts particular emphasis on vehicle detection and tracking in different extreme scenarios such as crowed highway vehicle detection, blurred images and changes in the appearance of the targets. Compared with currently existing detection and tracking approaches, the proposed approaches demonstrate a robust increase in accuracy in each scenario

Loughborough University Institutional Repository

Recommended from our members

Holoscopic 3D perception for autonomous vehicles

Author: Cao Chuqi
Publication venue: Brunel University London
Publication date: 01/01/2021
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonAutonomous mobile platforms are going to be huge part of the future transportation and autonomous navigation is the critical part of autonomous platforms. An autonomous mobile platform navigates the vehicle by perceiving the environment through the sensors mount on the vehicle, and acting on the data it receives from these sensors by making sense of the environmental and surroundings. As a result, an autonomous mobile platform consists of localisation aka positioning and path planning. Both of them require very accurate sensor measurements. In terms of accuracy, sensor can generally be divided into two groups (a) High accuracy sensors like the state-of-the-art in LiDAR and vision sensors e.g. mobile-eye sensor. (b) Low accuracy sensors whereas GPS (accurate within 2-10 metres) sensor and IMU (suffering from drifts) could be fused to improve the other method of positioning. These are expensive process due to offline map creation. To deal with low accuracy sensors, researchers normally use very complex models, which again run into performance reliability and consistency issue. Furthermore, it is common believe, that when navigating autonomously, perception and situation cognisance is an important component to navigate safely and there have been a huge research on AI enabled perception such as Mobile Eye and Tesla car which uses 2D cameras for its perception. In this research, an innovative method is proposed to use rich vision sensor holoscopic 3D camera for environment perception with artificial intelligent algorithms to observe road objects and learn their 3D behavioural for reliable detection and recognition. The sensor provides rich information - 3D cubic visual information about the environment including the very valuable “depth information” to imitate third coordinate of real world. To learn the objects, different AI algorithms are studied and in particular deep learning model is proposed that provides a reasonable good result. To evaluate the innovative holoscopic 3D sensor, we applied to face recognition challenge under different face expression where 2D images are considered to fail. However the holoscopic 3D sensor outperform and delivered outstanding performance by recognising faces under different expression by only training on the neutral face using a simple AI algorithm. Then we design and develop holoscopic perception database of 200000 frames for autonomous car. The experimental result has shown a promising result that AI algorithm, particularly deep learning algorithm learns effectively from holoscopic 3D content compared to traditional 2D images even those DL models which were designed for visual features yet holoscopic 3D images contain motion data which shall be exploited

Brunel University Research Archive

USegScene: Unsupervised Learning of Depth, Optical Flow and Ego-Motion with Semantic Guidance and Coupled Networks

Author: Burgard Wolfram
Vertens Johan
Publication venue
Publication date: 15/07/2022
Field of study

In this paper we propose USegScene, a framework for semantically guided unsupervised learning of depth, optical flow and ego-motion estimation for stereo camera images using convolutional neural networks. Our framework leverages semantic information for improved regularization of depth and optical flow maps, multimodal fusion and occlusion filling considering dynamic rigid object motions as independent SE(3) transformations. Furthermore, complementary to pure photo-metric matching, we propose matching of semantic features, pixel-wise classes and object instance borders between the consecutive images. In contrast to previous methods, we propose a network architecture that jointly predicts all outputs using shared encoders and allows passing information across the task-domains, e.g., the prediction of optical flow can benefit from the prediction of the depth. Furthermore, we explicitly learn the depth and optical flow occlusion maps inside the network, which are leveraged in order to improve the predictions in therespective regions. We present results on the popular KITTI dataset and show that our approach outperforms other methods by a large margin

arXiv.org e-Print Archive

Review of constraints on vision-based gesture recognition for human–computer interaction

Author: Bhuyan M. K.
Chakraborty Biplab Ketan
MacDorman Karl F.
Sarma Debajit
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2018
Field of study

The ability of computers to recognise hand gestures visually is essential for progress in human-computer interaction. Gesture recognition has applications ranging from sign language to medical assistance to virtual reality. However, gesture recognition is extremely challenging not only because of its diverse contexts, multiple interpretations, and spatio-temporal variations but also because of the complex non-rigid properties of the hand. This study surveys major constraints on vision-based gesture recognition occurring in detection and pre-processing, representation and feature extraction, and recognition. Current challenges are explored in detail

Crossref

IUPUIScholarWorks

Directory of Open Access Journals