Search CORE

197 research outputs found

Learning to Interpret Fluid Type Phenomena via Images

Author: Thapa Simron
Publication venue: LSU Digital Commons
Publication date: 18/08/2021
Field of study

Learning to interpret fluid-type phenomena via images is a long-standing challenging problem in computer vision. The problem becomes even more challenging when the fluid medium is highly dynamic and refractive due to its transparent nature. Here, we consider imaging through such refractive fluid media like water and air. For water, we design novel supervised learning-based algorithms to recover its 3D surface as well as the highly distorted underground patterns. For air, we design a state-of-the-art unsupervised learning algorithm to predict the distortion-free image given a short sequence of turbulent images. Specifically, we design a deep neural network that estimates the depth and normal maps of a fluid surface by analyzing the refractive distortion of a reference background pattern. Regarding the recovery of severely downgraded underwater images due to the refractive distortions caused by water surface fluctuations, we present the distortion-guided network (DG-Net) for restoring distortion-free underwater images. The key idea is to use a distortion map to guide network training. The distortion map models the pixel displacement caused by water refraction. Furthermore, we present a novel unsupervised network to recover the latent distortion-free image. The key idea is to model non-rigid distortions as deformable grids. Our network consists of a grid deformer that estimates the distortion field and an image generator that outputs the distortion-free image. By leveraging the positional encoding operator, we can simplify the network structure while maintaining fine spatial details in the recovered images. We also develop a combinational deep neural network that can simultaneously perform recovery of the latent distortion-free image as well as 3D reconstruction of the transparent and dynamic fluid surface. Through extensive experiments on simulated and real captured fluid images, we demonstrate that our proposed deep neural networks outperform the current state-of-the-art on solving specific tasks

Louisiana State University

Human robot interaction in a crowded environment

Author: Valibeik Salman
Valibeik Salman
Publication venue: Medicine, Imperial College London
Publication date: 01/06/2010
Field of study

Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

Spiral - Imperial College Digital Repository

Deep Learning Approaches Assessment for Underwater Scene Understanding and Egomotion Estimation

Author: Bernardo Gomes Teixeira
Publication venue
Publication date: 13/09/2019
Field of study

Smart environment monitoring through micro unmanned aerial vehicles

Author: Pannone Daniele
Publication venue
Publication date: 28/02/2019
Field of study

In recent years, the improvements of small-scale Unmanned Aerial Vehicles (UAVs) in terms of flight time, automatic control, and remote transmission are promoting the development of a wide range of practical applications. In aerial video surveillance, the monitoring of broad areas still has many challenges due to the achievement of different tasks in real-time, including mosaicking, change detection, and object detection. In this thesis work, a small-scale UAV based vision system to maintain regular surveillance over target areas is proposed. The system works in two modes. The first mode allows to monitor an area of interest by performing several flights. During the first flight, it creates an incremental geo-referenced mosaic of an area of interest and classifies all the known elements (e.g., persons) found on the ground by an improved Faster R-CNN architecture previously trained. In subsequent reconnaissance flights, the system searches for any changes (e.g., disappearance of persons) that may occur in the mosaic by a histogram equalization and RGB-Local Binary Pattern (RGB-LBP) based algorithm. If present, the mosaic is updated. The second mode, allows to perform a real-time classification by using, again, our improved Faster R-CNN model, useful for time-critical operations. Thanks to different design features, the system works in real-time and performs mosaicking and change detection tasks at low-altitude, thus allowing the classification even of small objects. The proposed system was tested by using the whole set of challenging video sequences contained in the UAV Mosaicking and Change Detection (UMCD) dataset and other public datasets. The evaluation of the system by well-known performance metrics has shown remarkable results in terms of mosaic creation and updating, as well as in terms of change detection and object detection

Archivio della ricerca- Università di Roma La Sapienza

Underwater image and video dehazing with pure haze region segmentation

Author: Cavallaro Andrea
Chittka Lars
Emberton Simon
Publication venue: 'Elsevier BV'
Publication date: 01/09/2017
Field of study

© 2017 The Authors Underwater scenes captured by cameras are plagued with poor contrast and a spectral distortion, which are the result of the scattering and absorptive properties of water. In this paper we present a novel dehazing method that improves visibility in images and videos by detecting and segmenting image regions that contain only water. The colour of these regions, which we refer to as pure haze regions, is similar to the haze that is removed during the dehazing process. Moreover, we propose a semantic white balancing approach for illuminant estimation that uses the dominant colour of the water to address the spectral distortion present in underwater scenes. To validate the results of our method and compare them to those obtained with state-of-the-art approaches, we perform extensive subjective evaluation tests using images captured in a variety of water types and underwater videos captured onboard an underwater vehicle

Optical Imaging and Image Restoration Techniques for Deep Ocean Mapping: A Comprehensive Survey

Author: Köser Kevin
Nakath David
She Mengkun
Song Yifan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/04/2022
Field of study

Visual systems are receiving increasing attention in underwater applications. While the photogrammetric and computer vision literature so far has largely targeted shallow water applications, recently also deep sea mapping research has come into focus. The majority of the seafloor, and of Earth’s surface, is located in the deep ocean below 200 m depth, and is still largely uncharted. Here, on top of general image quality degradation caused by water absorption and scattering, additional artificial illumination of the survey areas is mandatory that otherwise reside in permanent darkness as no sunlight reaches so deep. This creates unintended non-uniform lighting patterns in the images and non-isotropic scattering effects close to the camera. If not compensated properly, such effects dominate seafloor mosaics and can obscure the actual seafloor structures. Moreover, cameras must be protected from the high water pressure, e.g. by housings with thick glass ports, which can lead to refractive distortions in images. Additionally, no satellite navigation is available to support localization. All these issues render deep sea visual mapping a challenging task and most of the developed methods and strategies cannot be directly transferred to the seafloor in several kilometers depth. In this survey we provide a state of the art review of deep ocean mapping, starting from existing systems and challenges, discussing shallow and deep water models and corresponding solutions. Finally, we identify open issues for future lines of research

딥러닝에 기초한 효과적인 Visual Odometry 개선 방법

Author: 정원영
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 이범희.Understanding the three-dimensional environment is one of the most important issues in robotics and computer vision. For this purpose, sensors such as a lidar, a ultrasound, infrared devices, an inertial measurement unit (IMU) and cameras are used, individually or simultaneously, through sensor fusion. Among these sensors, in recent years, researches for use of visual sensors, which can obtain a lot of information at a low price, have been actively underway. Understanding of the 3D environment using cameras includes depth restoration, optical/scene flow estimation, and visual odometry (VO). Among them, VO estimates location of a camera and maps the surrounding environment, while a camera-equipped robot or person travels. This technology must be preceded by other tasks such as path planning and collision avoidance. Also, it can be applied to practical applications such as autonomous driving, augmented reality (AR), unmanned aerial vehicle (UAV) control, and 3D modeling. So far, researches on various VO algorithms have been proposed. Initial VO researches were conducted by filtering poses of robot and map features. Because of the disadvantage of the amount of computation being too large and errors are accumulated, a method using a keyframe was studied. Traditional VO can be divided into a feature-based method and a direct method. Methods using features obtain pose transformation between two images through feature extraction and matching. Direct methods directly compare the intensity of image pixels to obtain poses that minimize the sum of photometric errors. Recently, due to the development of deep learning skills, many studies have been conducted to apply deep learning to VO. Deep learning-based VO, like other fields using deep learning with images, first extracts convolutional neural network (CNN) features and calculates pose transformation between images. Deep learning-based VO can be divided into supervised learning-based and unsupervised learning-based. For VO, using supervised learning, a neural network is trained using ground truth poses, and the unsupervised learning-based method learns poses using only image sequences without given ground truth values. While existing research papers show decent performance, the image datasets used in these studies are all composed of high quality and clear images obtained using expensive cameras. There are also algorithms that can be operated only if non-image information such as exposure time, nonlinear response functions, and camera parameters is provided. In order for VO to be more widely applied to real-world application problems, odometry estimation should be performed even if the datasets are incomplete. Therefore, in this dissertation, two methods are proposed to improve VO performance using deep learning. First, I adopt a super-resolution (SR) technique to improve the performance of VO using images with low-resolution and noises. The existing SR techniques have mainly focused on increasing image resolution rather than execution time. However, a real-time property is very important for VO. Therefore, the SR network should be designed considering the execution time, resolution increment, and noise reduction in this case. Conducting a VO after passing through this SR network, a higher performance VO can be carried out, than using original images. Experimental results using the TUM dataset show that the proposed method outperforms the conventional VO and other SR methods. Second, I propose a fully unsupervised learning-based VO that performs odometry estimation, single-view depth estimation, and camera intrinsic parameter estimation simultaneously using a dataset consisting only of image sequences. In the existing unsupervised learning-based VO, algorithms were performed using the images and intrinsic parameters of the camera. Based on existing the technique, I propose a method for additionally estimating camera parameters from the deep intrinsic network. Intrinsic parameters are estimated by two assumptions using the properties of camera parameters in an intrinsic network. Experiments using the KITTI dataset show that the results are comparable to those of the conventional method.3차원 환경에 대한 이해는 로보틱스와 컴퓨터 비전 분야에서 굉장히 중요한 문제 중 하나이다. 이를 위해 라이다, 초음파, 적외선, inertial measurement unit (IMU), 카메라 등의 센서가 개별적으로 또는 센서 융합을 통해 여러 센서가 동시에 사용되기도 한다. 이 중에서도 최근에는 상대적으로 저렴한 가격에 많은 정보를 얻을 수 있는 카메라를 이용한 연구가 활발히 진행되고 있다. 카메라를 이용한 3차원 환경 인지는 깊이 복원, optical/scene flow 추정, visual odometry (VO) 등이 있다. 이 중 VO는 카메라를 장착한 로봇 혹은 사람이 이동하며 자신의 위치를 파악하고 주변 환경의 지도를 작성하는 기술이다. 이 기술은 경로 설정, 충돌 회피 등 다른 임무를 수행하기 전에 필수적으로 선행되어야 하며 자율 주행, AR, UAV contron, 3D modelling 등 실제 응용 문제에 적용될 수 있다. 현재 다양한 VO 알고리즘에 대한 논문이 제안되었다. 초기 VO 연구는 feature를 이용하여 feature와 로봇의 pose를 필터링 하는 방식으로 진행되었다. 필터를 이용한 방법은 계산량이 너무 많고 오차가 누적된다는 단점 때문에 keyframe을 이용하는 방법이 연구되었다. 이 방식으로 feature를 이용하는 방식과 픽셀의 intensity를 직접 사용하는 direct 방식이 연구되었다. feature를 이용하는 방법들은 feature의 추출과 매칭을 이용하여 두 이미지 사이의 pose 변화를 구하며 direct 방법들은 이미지 픽셀의 intensity를 직접 비교하여 photometric error를 최소화 시키는 pose를 구하는 방식이다. 최근에는 deep learning 알고리즘의 발달로 인해 VO에도 deep learning을 적용시키는 연구가 많이 진행되고 있다. Deep learning-based VO는 이미지를 이용한 다른 분야와 같이 기본적으로 CNN을 이용하여 feature를 추출한 뒤 이미지 사이의 pose 변화를 계산한다. 이는 다시 supervised learning을 이용한 방식과 unsupervised learning을 이용한 방법으로 나눌 수 있다. supervised learning을 이용한 VO는 pose의 참값을 사용하여 학습을 시키며, unsupervised learning을 이용하는 방법은 주어지는 참값 없이 이미지의 정보만을 이용하여 pose를 학습시키는 방식이다. 기존 VO 논문들은 좋은 성능을 보였지만 연구에 사용된 이미지 dataset들은 모두 고가의 카메라를 이용하여 얻어진 고화질의 선명한 이미지들로 구성되어 있다. 또한 노출 시간, 비선형 반응 함수, 카메라 파라미터 등의 이미지 외적인 정보를 이용해야만 알고리즘의 동작이 가능하다. VO가 실제 응용 문제에 더 널리 적용되기 위해서는 dataset이 불완전할 경우에도 odometry 추정이 잘 이루어져야 한다. 이에 본 논문에서는 deep learning을 이용하여 VO의 성능을 높이는 두 가지 방법을 제안하였다. 첫 번째로는 super-resolution (SR) 기법으로 저해상도, 노이즈가 포함된 이미지를 이용한 VO의 성능을 높이는 방법을 제안한다. 기존의 SR 기법은 수행 시간보다는 이미지의 해상도를 향상시키는 방법에 주로 집중하였다. 하지만 VO 수행에 있어서는 실시간성이 굉장히 중요하다. 따라서 수행 시간을 고려한 SR 네트워크의 설계하여 이미지의 해상도를 높이고 노이즈를 줄였다. 이 SR 네트워크를 통과시킨 뒤 VO를 수행하면 기존의 이미지를 사용할 때보다 높은 성능의 VO를 실시간으로 수행할 수 있다. TUM dataset을 이용한 실험 결과 기존의 VO 기법과 다른 SR 기법을 적용하였을 때 보다 제안하는 방법의 성능이 더 높은 것을 확인할 수 있었다. 두 번째로는 연속된 이미지만으로 구성된 dataset을 이용하여 VO, 단일 이미지 깊이 추정, 카메라 내부 파라미터 추정을 수행하는 fully unsupervised learning-based VO를 제안한다. 기존 unsupervised learning을 이용한 VO에서는 이미지들과 이미지를 촬영한 카메라의 내부 파라미터를 이용하여 VO를 수행하였다. 이 기술을 기반으로 본 논문에서는 deep intrinsic 네트워크를 추가하여 카메라 파라미터까지 네트워크에서 추정하는 방법을 제안한다. 0으로 수렴하거나 쉽게 발산하는 intrinsic 네트워크에 카메라 파라미터의 성질을 이용한 두 가지 가정을 통해 내부 파라미터를 추정할 수 있었다. KITTI dataset을 이용한 실험을 통해 intrinsic parameter 정보를 제공받아 진행된 기존의 방법과 유사한 성능을 확인할 수 있었다.1 INTRODUCTION 1 1.1 Background and Motivation 1 1.2 Literature Review 3 1.3 Contributions 10 1.4 Thesis Structure 11 2 Mathematical Preliminaries of Visual Odometry 13 2.1 Feature-based VO 13 2.2 Direct VO 17 2.3 Learning-based VO 21 2.3.1 Supervised learning-based VO 22 2.3.2 Unsupervised learning-based VO 25 3 Error Improvement in Visual Odometry Using Super-resolution 29 3.1 Introduction 29 3.2 Related Work 31 3.2.1 Visual Odometry 31 3.2.2 Super-resolution 33 3.3 SR-VO 34 3.3.1 VO performance analysis according to changing resolution 34 3.3.2 Super-Resolution Network 37 3.4 Experiments 40 3.4.1 Super-Resolution Procedure 40 3.4.2 VO with SR images 42 3.5 Summary 54 4 Visual Odometry Enhancement Method Using Fully Unsupervised Learning 55 4.1 Introduction 55 4.2 Related Work 57 4.2.1 Traditional Visual Odometry 57 4.2.2 Single-view Depth Recovery 58 4.2.3 Supervised Learning-based Visual Odometry 59 4.2.4 Unsupervised Learning-based Visual Odometry 60 4.2.5 Architecture Overview 62 4.3 Methods 62 4.3.1 Predicting the Target Image using Source Images 62 4.3.2 Intrinsic Parameters Regressor 63 4.4 Experiments 66 4.4.1 Monocular Depth Estimation 66 4.4.2 Visual Odometry 67 4.4.3 Intrinsic Parameters Estimation 77 5 Conclusion and Future Work 82 5.1 Conclusion 82 5.2 Future Work 85 Bibliography 86 Abstract (In Korean) 101Docto

Survey on video anomaly detection in dynamic scenes with moving cameras

Author: Jiao Runyu
Poiesi Fabio
Wan Yi
Wang Yiming
Publication venue
Publication date: 14/08/2023
Field of study

The increasing popularity of compact and inexpensive cameras, e.g.~dash cameras, body cameras, and cameras equipped on robots, has sparked a growing interest in detecting anomalies within dynamic scenes recorded by moving cameras. However, existing reviews primarily concentrate on Video Anomaly Detection (VAD) methods assuming static cameras. The VAD literature with moving cameras remains fragmented, lacking comprehensive reviews to date. To address this gap, we endeavor to present the first comprehensive survey on Moving Camera Video Anomaly Detection (MC-VAD). We delve into the research papers related to MC-VAD, critically assessing their limitations and highlighting associated challenges. Our exploration encompasses three application domains: security, urban transportation, and marine environments, which in turn cover six specific tasks. We compile an extensive list of 25 publicly-available datasets spanning four distinct environments: underwater, water surface, ground, and aerial. We summarize the types of anomalies these datasets correspond to or contain, and present five main categories of approaches for detecting such anomalies. Lastly, we identify future research directions and discuss novel contributions that could advance the field of MC-VAD. With this survey, we aim to offer a valuable reference for researchers and practitioners striving to develop and advance state-of-the-art MC-VAD methods.Comment: Under revie

arXiv.org e-Print Archive