254 research outputs found

    Visual navigation and path tracking using street geometry information for image alignment and servoing

    Get PDF
    Single camera-based navigation systems need information from other sensors or from the work environment to produce reliable and accurate position measurements. Providing such trustable, accurate, and available information in the environment is very important. The work highlights that the availability of well-described streets in urban environments can be exploited by drones for navigation and path tracking purposes, thus benefitting from such structures is not limited to only automated driving cars. While the drone position is continuously computed using visual odometry, scene matching is used to correct the position drift depending on some landmarks. The drone path is defined by several waypoints, and landmarks centralized by those waypoints are carefully chosen in the street intersections. The known streets’ geometry and dimensions are used to estimate the image scale and orientation which are necessary for images alignment, to compensate for the visual odometry drift, and to pass closer to the landmark center by the visual servoing process. Probabilistic Hough transform is used to detect and extract the street borders. The system is realized in a simulation environment consisting of the Robot Operating System ROS, 3D dynamic simulator Gazebo, and IRIS drone model. The results prove the suggested system efficiency with a 1.4 m position RMS error

    SAILenv: Learning in Virtual Visual Environments Made Simple

    Full text link
    Recently, researchers in Machine Learning algorithms, Computer Vision scientists, engineers and others, showed a growing interest in 3D simulators as a mean to artificially create experimental settings that are very close to those in the real world. However, most of the existing platforms to interface algorithms with 3D environments are often designed to setup navigation-related experiments, to study physical interactions, or to handle ad-hoc cases that are not thought to be customized, sometimes lacking a strong photorealistic appearance and an easy-to-use software interface. In this paper, we present a novel platform, SAILenv, that is specifically designed to be simple and customizable, and that allows researchers to experiment visual recognition in virtual 3D scenes. A few lines of code are needed to interface every algorithm with the virtual world, and non-3D-graphics experts can easily customize the 3D environment itself, exploiting a collection of photorealistic objects. Our framework yields pixel-level semantic and instance labeling, depth, and, to the best of our knowledge, it is the only one that provides motion-related information directly inherited from the 3D engine. The client-server communication operates at a low level, avoiding the overhead of HTTP-based data exchanges. We perform experiments using a state-of-the-art object detector trained on real-world images, showing that it is able to recognize the photorealistic 3D objects of our environment. The computational burden of the optical flow compares favourably with the estimation performed using modern GPU-based convolutional networks or more classic implementations. We believe that the scientific community will benefit from the easiness and high-quality of our framework to evaluate newly proposed algorithms in their own customized realistic conditions.Comment: 8 pages, 7 figures, submitted to ICPR 202

    Development and training of an Artificial Neural Network in Unity3D, including a game to interact with it and observe its performance

    Get PDF
    Treball final de Grau en Disseny i Desenvolupament de Videojocs. Codi: VJ1241. Curs acadèmic: 2019/2020This document presents the Final Report of the Video Game Design and Development project. The work consists of creating an AI with the ANN (Artificial Neural Network) method, to face the player in different game modes and then training as best as possible with the Reinforcement learning method. Then, we will introduce our AI into different game environments the player may encounter. All this can be done using the Unity game engine

    Rapid Development of the Seeker Free-Flying Inspector Guidance, Navigation, and Control System

    Get PDF
    Seeker is an automated extravehicular free-flying inspector CubeSat designed and built in-house at the Johnson Space Center (JSC). As a Class 1E project funded by the International Space Station (ISS) Program, Seeker had a streamlined process to flight certification, but the vehicle had to be designed, developed, tested, and delivered within approximately one year after authority to pro-ceed (ATP) and within a $1.8 million budget. These constraints necessitated an expedited Guidance, Navigation, and Control (GNC) development schedule, development began with a navigation sensor trade study using Linear Covariance (LinCov) analysis and a rapid sensor downselection process, resulting in the use of commercial off-the-shelf (COTS) sensors which could be procured quickly and subjected to in-house environmental testing to qualify them for flight. A neural network was used to enable a COTS camera to provide bearing measurements for visual navigation. The GNC flight software (FSW) algorithms utilized lean development practices and leveraged the Core Flight Software (CFS) architecture to rapidly develop the GNC system, tune the system parameters, and verify performance in simulation. This pace was anchored by several Hardware-Software Integration (HSI) milestones, which forced the Seeker GNC team to develop the interfaces both between hardware and software and between the GNC domains early in the project and to enable a timely delivery

    Adaptive user interface for vehicle swarm control

    Get PDF
    An algorithm to automatically generate behaviors for robotic vehicles has been created and tested in a laboratory setting. This system is designed to be applied in situations where a large number of robotic vehicles must be controlled by a single operator. The system learns what behaviors the operator typically issues and offers these behaviors to the operator in future missions. This algorithm uses the symbolic clustering method Gram-ART to generate these behaviors. Gram-ART has been shown to be successful at clustering such standard symbolic problems as the mushroom dataset and the Unix commands dataset. The algorithm was tested by having users complete exploration and tracking missions. Users were brought in for two sessions of testing. In the first session, they familiarized themselves with the testing interface and generated training information for Gram-ART. In the second session, the users ran missions with and without the generated behaviors to determine what effect the generated behaviors had on the users\u27 performance. Through these human tests, missions with generated behaviors enabled are shown to have reduced operator workload over those without. Missions with generated behaviors required fewer button presses than those without while maintaining a similar or greater level of mission success. Users also responded positively in a survey after the second session. Most users\u27 responses indicated that the generated behaviors increased their ability to complete the missions --Abstract, page iii

    Computer vision in target pursuit using a UAV

    Get PDF
    Research in target pursuit using Unmanned Aerial Vehicle (UAV) has gained attention in recent years, this is primarily due to decrease in cost and increase in demand of small UAVs in many sectors. In computer vision, target pursuit is a complex problem as it involves the solving of many sub-problems which are typically concerned with the detection, tracking and following of the object of interest. At present, the majority of related existing methods are developed using computer simulation with the assumption of ideal environmental factors, while the remaining few practical methods are mainly developed to track and follow simple objects that contain monochromatic colours with very little texture variances. Current research in this topic is lacking of practical vision based approaches. Thus the aim of this research is to fill the gap by developing a real-time algorithm capable of following a person continuously given only a photo input. As this research considers the whole procedure as an autonomous system, therefore the drone is activated automatically upon receiving a photo of a person through Wi-Fi. This means that the whole system can be triggered by simply emailing a single photo from any device anywhere. This is done by first implementing image fetching to automatically connect to WIFI, download the image and decode it. Then, human detection is performed to extract the template from the upper body of the person, the intended target is acquired using both human detection and template matching. Finally, target pursuit is achieved by tracking the template continuously while sending the motion commands to the drone. In the target pursuit system, the detection is mainly accomplished using a proposed human detection method that is capable of detecting, extracting and segmenting the human body figure robustly from the background without prior training. This involves detecting face, head and shoulder separately, mainly using gradient maps. While the tracking is mainly accomplished using a proposed generic and non-learning template matching method, this involves combining intensity template matching with colour histogram model and employing a three-tier system for template management. A flight controller is also developed, it supports three types of controls: keyboard, mouse and text messages. Furthermore, the drone is programmed with three different modes: standby, sentry and search. To improve the detection and tracking of colour objects, this research has also proposed several colour related methods. One of them is a colour model for colour detection which consists of three colour components: hue, purity and brightness. Hue represents the colour angle, purity represents the colourfulness and brightness represents intensity. It can be represented in three different geometric shapes: sphere, hemisphere and cylinder, each of these shapes also contains two variations. Experimental results have shown that the target pursuit algorithm is capable of identifying and following the target person robustly given only a photo input. This can be evidenced by the live tracking and mapping of the intended targets with different clothing in both indoor and outdoor environments. Additionally, the various methods developed in this research could enhance the performance of practical vision based applications especially in detecting and tracking of objects

    VISION-BASED URBAN NAVIGATION PROCEDURES FOR VERBALLY INSTRUCTED ROBOTS

    Get PDF
    The work presented in this thesis is part of a project in instruction based learning (IBL) for mobile robots were a robot is designed that can be instructed by its users through unconstrained natural language. The robot uses vision guidance to follow route instructions in a miniature town model. The aim of the work presented here was to determine the functional vocabulary of the robot in the form of "primitive procedures". In contrast to previous work in the field of instructable robots this was done following a "user-centred" approach were the main concern was to create primitive procedures that can be directly associated with natural language instructions. To achieve this, a corpus of human-to-human natural language instructions was collected and analysed. A set of primitive actions was found with which the collected corpus could be represented. These primitive actions were then implemented as robot-executable procedures. Natural language instructions are under-specified when destined to be executed by a robot. This is because instructors omit information that they consider as "commonsense" and rely on the listener's sensory-motor capabilities to determine the details of the task execution. In this thesis the under-specification problem is solved by determining the missing information, either during the learning of new routes or during their execution by the robot. During learning, the missing information is determined by imitating the commonsense approach human listeners take to achieve the same purpose. During execution, missing information, such as the location of road layout features mentioned in route instructions, is determined from the robot's view by using image template matching. The original contribution of this thesis, in both these methods, lies in the fact that they are driven by the natural language examples found in the corpus collected for the IDL project. During the testing phase a high success rate of primitive calls, when these were considered individually, showed that the under-specification problem has overall been solved. A novel method for testing the primitive procedures, as part of complete route descriptions, is also proposed in this thesis. This was done by comparing the performance of human subjects when driving the robot, following route descriptions, with the performance of the robot when executing the same route descriptions. The results obtained from this comparison clearly indicated where errors occur from the time when a human speaker gives a route description to the time when the task is executed by a human listener or by the robot. Finally, a software speed controller is proposed in this thesis in order to control the wheel speeds of the robot used in this project. The controller employs PI (Proportional and Integral) and PID (Proportional, Integral and Differential) control and provides a good alternative to expensive hardware

    Identification of aircrew tasks for using direct voice input (DVI) to reduce pilot workload in the AH-64D Apache Longbow

    Get PDF
    Advances in helicopter design continue to saturate the pilot\u27s visual channel and produce remarkable increases in cognitive workload for the pilot. This study investigates the potential implementation of Direct Voice Input (DVI) as an alternative control for interacting with onboard systems of the AH-64D Apache, in an attempt to reduce pilot workload during a hands on the controls and eyes out condition. The intent is to identify AH-64D cockpit tasks performed through Multi Purpose Displays (MPDs) that when converted to DVI will provide the greatest reduction in task execution time and workload. A brief description of applicable AH-64D audio and visual displays are provided. A review of current trends in state-of-the-art voice recognition technology is presented, as well as previous and current voice input cockpit identification studies. To identify tasks in the AH-64D, a methodology was developed consisting of a detailed analysis of the aircraft\u27s mission and on-board systems. A pilot questionnaire was developed and administered to operational AH-64D pilots to assess their input on DVI implementation. Findings indicate DVI would be most useful for displaying selected MPD pages and performing tasks pertaining to the Tactical Situation Display (TSD), weapons, and communications. Six of the candidate DVI tasks were performed in the AH-64D simulator using the manual input method and a simulated voice input method. Two different pilots made objective and subjective evaluations. Task execution times and workload rating were lower using a simulated means of voice input. Overall, DVI shows limited potential for workload reduction and warrants further simulator testing before proceeding to the flight environment

    Learning in vision and robotics

    Get PDF
    I present my work on learning from video and robotic input. This is an important problem, with numerous potential applications. The use of machine learning makes it possible to obtain models which can handle noise and variation without explicitly programming them. It also raises the possibility of robots which can interact more seamlessly with humans rather than only exhibiting hard-coded behaviors. I will present my work in two areas: video action recognition, and robot navigation. First, I present a video action recognition method which represents actions in video by sequences of retinotopic appearance and motion detectors, learns such models automatically from training data, and allow actions in new video to be recognized and localized completely automatically. Second, I present a new method which allows a mobile robot to learn word meanings from a combination of robot sensor measurements and sentential descriptions corresponding to a set of robotically driven paths. These word meanings support automatic driving from sentential input, and generation of sentential description of new paths. Finally, I also present work on a new action recognition dataset, and comparisons of the performance of recent methods on this dataset and others
    corecore