90 research outputs found
Ambient awareness on a sidewalk for visually impaired
Safe navigation by avoiding obstacles is vital for visually impaired while walking on a sidewalk. There are both static and dynamic obstacles to avoid. Detection, monitoring, and estimating the threat posed by obstacles remain challenging. Also, it is imperative that the design of the system must be energy efficient and low cost. An additional challenge in designing an interactive system capable of providing useful feedback is to minimize users\u27 cognitive load. We started the development of the prototype system through classifying obstacles and providing feedback. To overcome the limitations of the classification-based system, we adopted the image annotation framework in describing the scene, which may or may not include the obstacles. Both solutions partially solved the safe navigation but were found to be ineffective in providing meaningful feedback and issues with the diurnal cycle. To address such limitations, we introduce the notion of free-path and threat level imposed by the static or dynamic obstacles. This solution reduced the overhead of obstacle detection and helped in designing meaningful feedback. Affording users a natural conversation through an interactive dialog enabled interface was found to promote safer navigation. In this dissertation, we modeled the free-path and threat level using a reinforcement learning (RL) framework.We built the RL model in the Gazebo robot simulation environment and implanted that in a handheld device. A natural conversation model was created using data collected through a Wizard of OZ approach. The RL model and conversational agent model together resulted in the handheld assistive device called Augmented Guiding Torch (AGT). The AGT provides improved mobility over white cane by providing ambient awareness through natural conversation. It can inform the visually impaired about the obstacles which are helpful to be warned about ahead of time, e.g., construction site, scooter, crowd, car, bike, or big hole. Using the RL framework, the robot avoided over 95% obstacles. The visually impaired avoided over 85% obstacles with the help of AGT on a 500 feet U-shape sidewalk. Findings of this dissertation support the effectiveness of augmented guiding through RL for navigation and obstacle avoidance of visually impaired users
Visual-Inertial Sensor Fusion Models and Algorithms for Context-Aware Indoor Navigation
Positioning in navigation systems is predominantly performed by Global Navigation Satellite Systems (GNSSs). However, while GNSS-enabled devices have become commonplace for outdoor navigation, their use for indoor navigation is hindered due to GNSS signal degradation or blockage. For this, development of alternative positioning approaches and techniques for navigation systems is an ongoing research topic. In this dissertation, I present a new approach and address three major navigational problems: indoor positioning, obstacle detection, and keyframe detection. The proposed approach utilizes inertial and visual sensors available on smartphones and are focused on developing: a framework for monocular visual internal odometry (VIO) to position human/object using sensor fusion and deep learning in tandem; an unsupervised algorithm to detect obstacles using sequence of visual data; and a supervised context-aware keyframe detection.
The underlying technique for monocular VIO is a recurrent convolutional neural network for computing six-degree-of-freedom (6DoF) in an end-to-end fashion and an extended Kalman filter module for fine-tuning the scale parameter based on inertial observations and managing errors. I compare the results of my featureless technique with the results of conventional feature-based VIO techniques and manually-scaled results. The comparison results show that while the framework is more effective compared to featureless method and that the accuracy is improved, the accuracy of feature-based method still outperforms the proposed approach.
The approach for obstacle detection is based on processing two consecutive images to detect obstacles. Conducting experiments and comparing the results of my approach with the results of two other widely used algorithms show that my algorithm performs better; 82% precision compared with 69%. In order to determine the decent frame-rate extraction from video stream, I analyzed movement patterns of camera and inferred the context of the user to generate a model associating movement anomaly with proper frames-rate extraction. The output of this model was utilized for determining the rate of keyframe extraction in visual odometry (VO). I defined and computed the effective frames for VO and experimented with and used this approach for context-aware keyframe detection. The results show that the number of frames, using inertial data to infer the decent frames, is decreased
Sample-Efficient Training of Robotic Guide Using Human Path Prediction Network
Training a robot that engages with people is challenging, because it is
expensive to involve people in a robot training process requiring numerous data
samples. This paper proposes a human path prediction network (HPPN) and an
evolution strategy-based robot training method using virtual human movements
generated by the HPPN, which compensates for this sample inefficiency problem.
We applied the proposed method to the training of a robotic guide for visually
impaired people, which was designed to collect multimodal human response data
and reflect such data when selecting the robot's actions. We collected 1,507
real-world episodes for training the HPPN and then generated over 100,000
virtual episodes for training the robot policy. User test results indicate that
our trained robot accurately guides blindfolded participants along a goal path.
In addition, by the designed reward to pursue both guidance accuracy and human
comfort during the robot policy training process, our robot leads to improved
smoothness in human motion while maintaining the accuracy of the guidance. This
sample-efficient training method is expected to be widely applicable to all
robots and computing machinery that physically interact with humans
Implementation of a Blind navigation method in outdoors/indoors areas
According to WHO statistics, the number of visually impaired people is
increasing annually. One of the most critical necessities for visually impaired
people is the ability to navigate safely. This paper proposes a navigation
system based on the visual slam and Yolo algorithm using monocular cameras. The
proposed system consists of three steps: obstacle distance estimation, path
deviation detection, and next-step prediction. Using the ORB-SLAM algorithm,
the proposed method creates a map from a predefined route and guides the users
to stay on the route while notifying them if they deviate from it.
Additionally, the system utilizes the YOLO algorithm to detect obstacles along
the route and alert the user. The experimental results, obtained by using a
laptop camera, show that the proposed system can run in 30 frame per second
while guiding the user within predefined routes of 11 meters in indoors and
outdoors. The accuracy of the positioning system is 8cm, and the system
notifies the users if they deviate from the predefined route by more than 60
cm.Comment: 14 pages, 6 figures and 6 table
Analysis of Navigation Assistants for Blind and Visually Impaired People: A Systematic Review
Over the last few decades, the development in the field of navigation and routing devices has become a hindering task for the researchers to develop smart and intelligent guiding mechanism at indoor and outdoor locations for blind and visually impaired people (BVIPs). The existing research need to be analysed from a historical perception including early research on the first electronic travel aids to the use of modern artificial vision models for the navigation of BVIPs. Diverse approaches such as: e-cane or guide dog, infrared-based cane, laser based walker and many others are proposed for the navigation of BVIPs. But most of these techniques have limitations such as: infrared and ultrasonic based assistance has short range capacities for object detection. While laser based assistance can harm other people if it directly hit them on their eyes or any other part of the body. These trade-offs are critical to bring this technology in practice.To systematically assess, analyze, and identify the primary studies in this specialized field and provide an overview of the trends and empirical evidence in the proposed field. This systematic research work is performed by defining a set of relevant keywords, formulating four research questions, defining selection criteria for the articles, and synthesizing the empirical evidence in this area. Our pool of studies include 191 most relevant articles to the proposed field reported between 2011 and 2020 (a portion of 2020 is included). This systematic mapping will help the researchers, engineers, and practitioners to make more authentic decisions for finding gaps in the available navigation assistants and suggest a new and enhanced smart assistant application accordingly to ensure safety and accurate guidance of the BVIPs. This research work have several implications in particular the impact of reducing fatalities and major injuries of BVIPs.Qatar University [IRCC-2020-009]
Enhancing image captioning with depth information using a Transformer-based framework
Captioning images is a challenging scene-understanding task that connects
computer vision and natural language processing. While image captioning models
have been successful in producing excellent descriptions, the field has
primarily focused on generating a single sentence for 2D images. This paper
investigates whether integrating depth information with RGB images can enhance
the captioning task and generate better descriptions. For this purpose, we
propose a Transformer-based encoder-decoder framework for generating a
multi-sentence description of a 3D scene. The RGB image and its corresponding
depth map are provided as inputs to our framework, which combines them to
produce a better understanding of the input scene. Depth maps could be ground
truth or estimated, which makes our framework widely applicable to any RGB
captioning dataset. We explored different fusion approaches to fuse RGB and
depth images. The experiments are performed on the NYU-v2 dataset and the
Stanford image paragraph captioning dataset. During our work with the NYU-v2
dataset, we found inconsistent labeling that prevents the benefit of using
depth information to enhance the captioning task. The results were even worse
than using RGB images only. As a result, we propose a cleaned version of the
NYU-v2 dataset that is more consistent and informative. Our results on both
datasets demonstrate that the proposed framework effectively benefits from
depth information, whether it is ground truth or estimated, and generates
better captions. Code, pre-trained models, and the cleaned version of the
NYU-v2 dataset will be made publically available.Comment: 19 pages, 5 figures, 13 table
Deep reinforcement learning for multi-modal embodied navigation
Ce travail se concentre sur une tâche de micro-navigation en plein air où le but est de naviguer
vers une adresse de rue spécifiée en utilisant plusieurs modalités (par exemple, images, texte
de scène et GPS). La tâche de micro-navigation extérieure s’avère etre un défi important pour
de nombreuses personnes malvoyantes, ce que nous démontrons à travers des entretiens et
des études de marché, et nous limitons notre définition des problèmes à leurs besoins. Nous
expérimentons d’abord avec un monde en grille partiellement observable (Grid-Street et Grid
City) contenant des maisons, des numéros de rue et des régions navigables. Ensuite, nous
introduisons le Environnement de Trottoir pour la Navigation Visuelle (ETNV), qui contient
des images panoramiques avec des boîtes englobantes pour les numéros de maison, les portes
et les panneaux de nom de rue, et des formulations pour plusieurs tâches de navigation. Dans
SEVN, nous formons un modèle de politique pour fusionner des observations multimodales
sous la forme d’images à résolution variable, de texte visible et de données GPS simulées afin
de naviguer vers une porte d’objectif. Nous entraînons ce modèle en utilisant l’algorithme
d’apprentissage par renforcement, Proximal Policy Optimization (PPO). Nous espérons que
cette thèse fournira une base pour d’autres recherches sur la création d’agents pouvant aider
les membres de la communauté des gens malvoyantes à naviguer le monde.This work focuses on an Outdoor Micro-Navigation (OMN) task in which the goal is to
navigate to a specified street address using multiple modalities including images, scene-text,
and GPS. This task is a significant challenge to many Blind and Visually Impaired (BVI)
people, which we demonstrate through interviews and market research. To investigate the
feasibility of solving this task with Deep Reinforcement Learning (DRL), we first introduce
two partially observable grid-worlds, Grid-Street and Grid City, containing houses, street
numbers, and navigable regions. In these environments, we train an agent to find specific
houses using local observations under a variety of training procedures. We parameterize
our agent with a neural network and train using reinforcement learning methods. Next, we
introduce the Sidewalk Environment for Visual Navigation (SEVN), which contains panoramic
images with labels for house numbers, doors, and street name signs, and formulations for
several navigation tasks. In SEVN, we train another neural network model using Proximal
Policy Optimization (PPO) to fuse multi-modal observations in the form of variable resolution
images, visible text, and simulated GPS data, and to use this representation to navigate to
goal doors. Our best model used all available modalities and was able to navigate to over 100
goals with an 85% success rate. We found that models with access to only a subset of these
modalities performed significantly worse, supporting the need for a multi-modal approach to
the OMN task. We hope that this thesis provides a foundation for further research into the
creation of agents to assist members of the BVI community to safely navigate
- …