11,425 research outputs found

    Review of Machine Vision-Based Electronic Travel Aids

    Get PDF
    Visual impaired people have navigation and mobility problems on the road. Up to now, many approaches have been conducted to help them navigate around using different sensing techniques. This paper reviews several machine vision- based Electronic Travel Aids (ETAs) and compares them with those using other sensing techniques. The functionalities of machine vision-based ETAs are classified from low-level image processing such as detecting the road regions and obstacles to high-level functionalities such as recognizing the digital tags and texts. In addition, the characteristics of the ETA systems for blind people are particularly discussed

    Text Extraction From Natural Scene: Methodology And Application

    Full text link
    With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications

    Ability of head-mounted display technology to improve mobility in people with low vision: a systematic review

    Get PDF
    Purpose: The purpose of this study was to undertake a systematic literature review on how vision enhancements, implemented using head-mounted displays (HMDs), can improve mobility, orientation, and associated aspects of visual function in people with low vision. Methods: The databases Medline, Chinl, Scopus, and Web of Science were searched for potentially relevant studies. Publications from all years until November 2018 were identified based on predefined inclusion and exclusion criteria. The data were tabulated and synthesized to produce a systematic review. Results: The search identified 28 relevant papers describing the performance of vision enhancement techniques on mobility and associated visual tasks. Simplifying visual scenes improved obstacle detection and object recognition but decreased walking speed. Minification techniques increased the size of the visual field by 3 to 5 times and improved visual search performance. However, the impact of minification on mobility has not been studied extensively. Clinical trials with commercially available devices recorded poor results relative to conventional aids. Conclusions: The effects of current vision enhancements using HMDs are mixed. They appear to reduce mobility efficiency but improved obstacle detection and object recognition. The review highlights the lack of controlled studies with robust study designs. To support the evidence base, well-designed trials with larger sample sizes that represent different types of impairments and real-life scenarios are required. Future work should focus on identifying the needs of people with different types of vision impairment and providing targeted enhancements. Translational Relevance: This literature review examines the evidence regarding the ability of HMD technology to improve mobility in people with sight loss

    The Eye: A Light Weight Mobile Application for Visually Challenged People Using Improved YOLOv5l Algorithm

    Get PDF
    The eye is an essential sensory organ that allows us to perceive our surroundings at a glance. Losing this sense can result in numerous challenges in daily life. However, society is designed for the majority, which can create even more difficulties for visually impaired individuals. Therefore, empowering them and promoting self-reliance are crucial. To address this need, we propose a new Android application called “The Eye” that utilizes Machine Learning (ML)-based object detection techniques to recognize objects in real-time using a smartphone camera or a camera attached to a stick. The article proposed an improved YOLOv5l algorithm to improve object detection in visual applications. YOLOv5l has a larger model size and captures more complex features and details, leading to enhanced object detection accuracy compared to smaller variants like YOLOv5s and YOLOv5m. The primary enhancement in the improved YOLOv5l algorithm is integrating L1 and L2 regularization techniques. These techniques prevent overfitting and improve generalization by adding a regularization term to the loss function during training. Our approach combines image processing and text-to-speech conversion modules to produce reliable results. The Android text-to-speech module is then used to convert the object recognition results into an audio output. According to the experimental results, the improved YOLOv5l has higher detection accuracy than the original YOLOv5 and can detect small, multiple, and overlapped targets with higher accuracy. This study contributes to the advancement of technology to help visually impaired individuals become more self-sufficient and confident. Doi: 10.28991/ESJ-2023-07-05-011 Full Text: PD

    Development of a text reading system on video images

    Get PDF
    Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text region¿s identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions

    Iterative Design and Prototyping of Computer Vision Mediated Remote Sighted Assistance

    Get PDF
    Remote sighted assistance (RSA) is an emerging navigational aid for people with visual impairments (PVI). Using scenario-based design to illustrate our ideas, we developed a prototype showcasing potential applications for computer vision to support RSA interactions. We reviewed the prototype demonstrating real-world navigation scenarios with an RSA expert, and then iteratively refined the prototype based on feedback. We reviewed the refined prototype with 12 RSA professionals to evaluate the desirability and feasibility of the prototyped computer vision concepts. The RSA expert and professionals were engaged by, and reacted insightfully and constructively to the proposed design ideas. We discuss what we learned about key resources, goals, and challenges of the RSA prosthetic practice through our iterative prototype review, as well as implications for the design of RSA systems and the integration of computer vision technologies into RSA