282 research outputs found
Recommended from our members
Real-time smart and standalone vision/IMU navigation sensor
In this paper, we present a smart, standalone, multi-platform stereo vision/IMU-based navigation system, providing ego-motion estimation. The real-time visual odometry algorithm is run on a nano ITX single-board computer (SBC) of 1.9 GHz CPU and 16-core GPU. High-resolution stereo images of 1.2 megapixel provide high-quality data. Tracking of up to 750 features is made possible at 5 fps thanks to a minimal, but efficient, features detection–stereo matching–feature tracking scheme runs on the GPU. Furthermore, the feature tracking algorithm benefits from assistance of a 100 Hz IMU whose accelerometer and gyroscope data provide inertial features prediction enhancing execution speed and tracking efficiency. In a space mission context, we demonstrate robustness and accuracy of the real-time generated 6-degrees-of-freedom trajectories from our visual odometry algorithm. Performance evaluations are comparable to ground truth measurements from an external motion capture system
???????????? ??????????????? ????????? ???????????? ?????? ??????
Department of Mehcanical EngineeringUnmanned aerial vehicles (UAVs) are widely used in various areas such as exploration, transportation and rescue activity due to light weight, low cost, high mobility and intelligence. This intelligent system consists of highly integrated and embedded systems along with a microprocessor to perform specific task by computing algorithm or processing data. In particular, image processing is one of main core technologies to handle important tasks such as target tracking, positioning, visual servoing using visual system. However, it often requires heavy amount of computation burden and an additional micro PC controller with a flight computer should be additionally used to process image data. However, performance of the controller is not so good enough due to limited power, size, and weight. Therefore, efficient image processing techniques are needed considering computing load and hardware resources for real time operation on embedded systems.
The objective of the thesis research is to develop an efficient image processing framework on embedded systems utilizing neural network and various optimized computation techniques to satisfy both efficient computing speed versus resource usage and accuracy. Image processing techniques has been proposed and tested for management computing resources and operating high performance missions in embedded systems. Graphic processing units (GPUs) available in the market can be used for parallel computing to accelerate computing speed. Multiple cores within central processing units (CPUs) are used like multi-threading during data uploading and downloading between the CPU and the GPU. In order to minimize computing load, several methods have been proposed. The first method is visualization of convolutional neural network (CNN) that can perform both localization and detection simultaneously. The second is region proposal for input area of CNN through simple image processing, which helps algorithm to avoid full frame processing. Finally, surplus computing resources can be saved by control the transient performance such as the FPS limitation.
These optimization methods have been experimentally applied to a ground vehicle and quadrotor UAVs and verified that the developed methods offer an optimization to process in embedded environment by saving CPU and memory resources. In addition, they can support to perform various tasks such as object detection and path planning, obstacle avoidance. Through optimization and algorithms, they reveal a number of improvements for the embedded system compared to the existing. Considering the characteristics of the system to transplant the various useful algorithms to the embedded system, the method developed in the research can be further applied to various practical applications.ope
Development of a Visual Odometry System as a Location Aid for Self-Driving Cars
Conocer la posición exacta que ocupa un robot y la trayectoria que describe es esencial en el ámbito
de la automoción. Durante años se han desarrollado distintos sensores y técnicas para este cometido que
se estudian a lo largo del trabajo.
En este proyecto se utilizan dos cámaras a bordo del vehículo como sensores de percepción del entorno.
Se propone un algoritmo basado únicamente en odometría visual, es decir, analizando la secuencia de
imágenes captadas por las cámaras, sin conocimiento previo del entorno y sin el uso de otros sensores,
se pretende obtener una estimación real de la posición y orientación del vehículo. Dicha propuesta se ha
validado en el dataset de KITTI y se ha comparado con otras técnicas de odometría visual existentes en
el estado del arteKnowing the exact position occupied by a robot and the trajectory it describes is essential in the
automotive field. Some techniques and sensors have been developed over the years for this purpose
which are studied in this work.
In this project, two cameras on board the vehicle are used as sensors for the environment perception. The
proposed algorithm is based only on visual odometry, it means, using the sequence of images captured
by the cameras, without prior knowledge of the environment and without the use of other sensors. The
aim is to obtain a real estimation of the position and orientation of the vehicle. This proposal has been
validated on the KITTI benchmark and compared with other Visual Odometry techniques existing in
the state of the art.Grado en Ingeniería en Electrónica y Automática Industria
Tecniche per la rilevazione automatica marker-less di persone e marker-based di robot all'interno di reti di telecamere RGB-Depth
OpenPTrack is a state of the art solution for people detection and tracking, in this work we extended some of the functionalities (detection from highly tilted camera) of the software and introduced new ones (automatic ground plane equation calculator). Also, we test the feasibility and the behaviour of a mobile camera mounted on a people-following robot and dynamically registered in the OPT network through a fiducial cubic marke
Visual Odometry Based on Structural Matching of Local Invariant Features Using Stereo Camera Sensor
This paper describes a novel sensor system to estimate the motion of a stereo camera. Local invariant image features are matched between pairs of frames and linked into image trajectories at video rate, providing the so-called visual odometry, i.e., motion estimates from visual input alone. Our proposal conducts two matching sessions: the first one between sets of features associated to the images of the stereo pairs and the second one between sets of features associated to consecutive frames. With respect to previously proposed approaches, the main novelty of this proposal is that both matching algorithms are conducted by means of a fast matching algorithm which combines absolute and relative feature constraints. Finding the largest-valued set of mutually consistent matches is equivalent to finding the maximum-weighted clique on a graph. The stereo matching allows to represent the scene view as a graph which emerge from the features of the accepted clique. On the other hand, the frame-to-frame matching defines a graph whose vertices are features in 3D space. The efficiency of the approach is increased by minimizing the geometric and algebraic errors to estimate the final displacement of the stereo camera between consecutive acquired frames. The proposed approach has been tested for mobile robotics navigation purposes in real environments and using different features. Experimental results demonstrate the performance of the proposal, which could be applied in both industrial and service robot fields
Convolutional Neural Network Architecture Study for Aerial Visual Localization
In unmanned aerial navigation the ability to determine the aircraft\u27s location is essential for safe flight. The Global Positioning System (GPS) is the default modern application used for geospatial location determination. GPS is extremely robust, very accurate, and has essentially solved aerial localization. Unfortunately, the signals from all Global Navigation Satellite Systems (GNSS) to include GPS can be jammed or spoofed. To this response it is essential to develop alternative systems that could be used to supplement navigation systems, in the event of a lost GNSS signal. Public and governmental satellites have provided large amounts of high-resolution satellite imagery. These could be exploited through machine learning to aid onboard navigation equipment to provide a geospatial location solution. Deep learning and Convolutional Neural Networks (CNNs) have provided significant advances in specific image processing algorithms. This thesis will discuss the performance of CNN architectures with various hyperparameters and industry leading model designs to address visual aerial localization. The localization algorithm is trained and tested through satellite imagery of a localized area of 150 square kilometers. Three hyper-parameters of focus are: initializations, optimizers, and finishing layers. The five model architectures are: MobileNet V2, Inception V3, ResNet 50, Xception, and DenseNet 201. The hyper-parameter analysis demonstrates that specific initializations, optimizations and finishing layers can have significant effects on the training of a CNN architecture for this specific task. The lessons learned from the hyper-parameter analysis were implemented into the CNN comparison study. After all the models were trained for 150 epochs they were evaluated on the test set. The Xception model with pretrained initialization outperformed all other models with a Root Mean Squared (RMS) error of only 85 meters
Application of augmented reality and robotic technology in broadcasting: A survey
As an innovation technique, Augmented Reality (AR) has been gradually deployed in the broadcast, videography and cinematography industries. Virtual graphics generated by AR are dynamic and overlap on the surface of the environment so that the original appearance can be greatly enhanced in comparison with traditional broadcasting. In addition, AR enables broadcasters to interact with augmented virtual 3D models on a broadcasting scene in order to enhance the performance of broadcasting. Recently, advanced robotic technologies have been deployed in a camera shooting system to create a robotic cameraman so that the performance of AR broadcasting could be further improved, which is highlighted in the paper
Cartographer slam method for optimization with an adaptive multi-distance scan scheduler
This paper presents the use of Google's simultaneous localization and mapping (SLAM) technique, namely Cartographer, and adaptive multistage distance scheduler (AMDS) to improve the processing speed. This approach optimizes the processing speed of SLAM which is known to have performance degradation as the map grows due to a larger scan matcher. In this proposed work, the adaptive method was successfully tested in an actual vehicle to map roads in real time. The AMDS performs a local pose correction by controlling the LiDAR sensor scan range and scan matcher search window with the help ofscheduling algorithms. The scheduling algorithms manage the SLAM that swaps between short and long distances during map data collection. As a result, the algorithms efficiently improved performance speed similar to short distance LiDAR scanswhile maintaining the accuracy of the full distance of LiDAR. By swapping the scan distance of the sensor, and adaptively limiting the search size of the scan matcher to handle difference scan sizes, the pose's generation performance time is improved by approximately 16% as compared with a fixed scan distance, while maintaining similar accuracy
Real-time High Resolution Fusion of Depth Maps on GPU
A system for live high quality surface reconstruction using a single moving
depth camera on a commodity hardware is presented. High accuracy and real-time
frame rate is achieved by utilizing graphics hardware computing capabilities
via OpenCL and by using sparse data structure for volumetric surface
representation. Depth sensor pose is estimated by combining serial texture
registration algorithm with iterative closest points algorithm (ICP) aligning
obtained depth map to the estimated scene model. Aligned surface is then fused
into the scene. Kalman filter is used to improve fusion quality. Truncated
signed distance function (TSDF) stored as block-based sparse buffer is used to
represent surface. Use of sparse data structure greatly increases accuracy of
scanned surfaces and maximum scanning area. Traditional GPU implementation of
volumetric rendering and fusion algorithms were modified to exploit sparsity to
achieve desired performance. Incorporation of texture registration for sensor
pose estimation and Kalman filter for measurement integration improved accuracy
and robustness of scanning process
- …