994 research outputs found

    Map-Based Localization for Unmanned Aerial Vehicle Navigation

    Get PDF
    Unmanned Aerial Vehicles (UAVs) require precise pose estimation when navigating in indoor and GNSS-denied / GNSS-degraded outdoor environments. The possibility of crashing in these environments is high, as spaces are confined, with many moving obstacles. There are many solutions for localization in GNSS-denied environments, and many different technologies are used. Common solutions involve setting up or using existing infrastructure, such as beacons, Wi-Fi, or surveyed targets. These solutions were avoided because the cost should be proportional to the number of users, not the coverage area. Heavy and expensive sensors, for example a high-end IMU, were also avoided. Given these requirements, a camera-based localization solution was selected for the sensor pose estimation. Several camera-based localization approaches were investigated. Map-based localization methods were shown to be the most efficient because they close loops using a pre-existing map, thus the amount of data and the amount of time spent collecting data are reduced as there is no need to re-observe the same areas multiple times. This dissertation proposes a solution to address the task of fully localizing a monocular camera onboard a UAV with respect to a known environment (i.e., it is assumed that a 3D model of the environment is available) for the purpose of navigation for UAVs in structured environments. Incremental map-based localization involves tracking a map through an image sequence. When the map is a 3D model, this task is referred to as model-based tracking. A by-product of the tracker is the relative 3D pose (position and orientation) between the camera and the object being tracked. State-of-the-art solutions advocate that tracking geometry is more robust than tracking image texture because edges are more invariant to changes in object appearance and lighting. However, model-based trackers have been limited to tracking small simple objects in small environments. An assessment was performed in tracking larger, more complex building models, in larger environments. A state-of-the art model-based tracker called ViSP (Visual Servoing Platform) was applied in tracking outdoor and indoor buildings using a UAVs low-cost camera. The assessment revealed weaknesses at large scales. Specifically, ViSP failed when tracking was lost, and needed to be manually re-initialized. Failure occurred when there was a lack of model features in the cameras field of view, and because of rapid camera motion. Experiments revealed that ViSP achieved positional accuracies similar to single point positioning solutions obtained from single-frequency (L1) GPS observations standard deviations around 10 metres. These errors were considered to be large, considering the geometric accuracy of the 3D model used in the experiments was 10 to 40 cm. The first contribution of this dissertation proposes to increase the performance of the localization system by combining ViSP with map-building incremental localization, also referred to as simultaneous localization and mapping (SLAM). Experimental results in both indoor and outdoor environments show sub-metre positional accuracies were achieved, while reducing the number of tracking losses throughout the image sequence. It is shown that by integrating model-based tracking with SLAM, not only does SLAM improve model tracking performance, but the model-based tracker alleviates the computational expense of SLAMs loop closing procedure to improve runtime performance. Experiments also revealed that ViSP was unable to handle occlusions when a complete 3D building model was used, resulting in large errors in its pose estimates. The second contribution of this dissertation is a novel map-based incremental localization algorithm that improves tracking performance, and increases pose estimation accuracies from ViSP. The novelty of this algorithm is the implementation of an efficient matching process that identifies corresponding linear features from the UAVs RGB image data and a large, complex, and untextured 3D model. The proposed model-based tracker improved positional accuracies from 10 m (obtained with ViSP) to 46 cm in outdoor environments, and improved from an unattainable result using VISP to 2 cm positional accuracies in large indoor environments. The main disadvantage of any incremental algorithm is that it requires the camera pose of the first frame. Initialization is often a manual process. The third contribution of this dissertation is a map-based absolute localization algorithm that automatically estimates the camera pose when no prior pose information is available. The method benefits from vertical line matching to accomplish a registration procedure of the reference model views with a set of initial input images via geometric hashing. Results demonstrate that sub-metre positional accuracies were achieved and a proposed enhancement of conventional geometric hashing produced more correct matches - 75% of the correct matches were identified, compared to 11%. Further the number of incorrect matches was reduced by 80%

    Towards topological mapping with vision-based simultaneous localization and map building

    Full text link
    Although the theory of Simultaneous Localization and Map Building (SLAM) is well developed, there are many challenges to overcome when incorporating vision sensors into SLAM systems. Visual sensors have different properties when compared to range finding sensors and therefore require different considerations. Existing vision-based SLAM algorithms extract point landmarks, which are required for SLAM algorithms such as the Kalman filter. Under this restriction, the types of image features that can be used are limited and the full advantages of vision not realized. This thesis examines the theoretical formulation of the SLAM problem and the characteristics of visual information in the SLAM domain. It also examines different representations of uncertainty, features and environments. It identifies the necessity to develop a suitable framework for vision-based SLAM systems and proposes a framework called VisionSLAM, which utilizes an appearance-based landmark representation and topological map structure to model metric relations between landmarks. A set of Haar feature filters are used to extract image structure statistics, which are robust against illumination changes, have good uniqueness property and can be computed in real time. The algorithm is able to resolve and correct false data associations and is robust against random correlation resulting from perceptual aliasing. The algorithm has been tested extensively in a natural outdoor environment

    Online Audio-Visual Multi-Source Tracking and Separation: A Labeled Random Finite Set Approach

    Get PDF
    The dissertation proposes an online solution for separating an unknown and time-varying number of moving sources using audio and visual data. The random finite set framework is used for the modeling and fusion of audio and visual data. This enables an online tracking algorithm to estimate the source positions and identities for each time point. With this information, a set of beamformers can be designed to separate each desired source and suppress the interfering sources

    Computer Vision for Fish Monitoring: Challenges and Possibilities

    Get PDF
    This master's thesis focuses on the evaluation and exploration of detection and tracking algorithms for fish in a dense underwater environment. The primary objectives were to achieve precise and accurate fish detection and to track fish over an extended period. The thesis explores the performance of two object detection algorithms, YOLOv4 and YOLOv8, as well as their integration with the DeepSORT tracking algorithm. The algorithms were trained and evaluated using a dataset collected from a densely populated underwater fish tank. The dataset was manually annotated using bounding box annotation techniques to accurately label the objects of interest. The results demonstrated the effectiveness of both YOLOv4 and YOLOv8 in detecting fish in densely populated environments. However, YOLOv8 achieved a significantly higher mAP50-95 score, indicating better localization and detection accuracy. It proved more adept at precisely locating the position of detected fish, leading to improved overall detection performance. In terms of fish tracking the combination of DeepSORT and YOLOv8 showed the best overall performance, as evidenced by higher MOTA and IDF1 scores, and lower MOTP scores. However, tracking individual fish over extended periods presented challenges due to occlusions and rapid trajectory changes, leading to a high number of identity switches. By evaluating and exploring the effectiveness of detection and tracking algorithms, this thesis contributes to the advancement of fish monitoring techniques in aquaculture. The findings provide valuable insights into the performance of YOLOv4 and YOLOv8 and the potential of DeepSORT for accurate and reliable fish detection and tracking. The results and methodologies presented in this study lay the groundwork for further research and development in the field, aiming to enhance fish welfare, optimize resource management, and improve efficiency in aquaculture practices

    Enhancing RGB-D SLAM Using Deep Learning

    Get PDF

    Integration and coordination in a cognitive vision system

    Get PDF
    In this paper, we present a case study that exemplifies general ideas of system integration and coordination. The application field of assistant technology provides an ideal test bed for complex computer vision systems including real-time components, human-computer interaction, dynamic 3-d environments, and information retrieval aspects. In our scenario the user is wearing an augmented reality device that supports her/him in everyday tasks by presenting information that is triggered by perceptual and contextual cues. The system integrates a wide variety of visual functions like localization, object tracking and recognition, action recognition, interactive object learning, etc. We show how different kinds of system behavior are realized using the Active Memory Infrastructure that provides the technical basis for distributed computation and a data- and eventdriven integration approach

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]
    • …
    corecore