12 research outputs found

    VISION-BASED URBAN NAVIGATION PROCEDURES FOR VERBALLY INSTRUCTED ROBOTS

    Get PDF
    The work presented in this thesis is part of a project in instruction based learning (IBL) for mobile robots were a robot is designed that can be instructed by its users through unconstrained natural language. The robot uses vision guidance to follow route instructions in a miniature town model. The aim of the work presented here was to determine the functional vocabulary of the robot in the form of "primitive procedures". In contrast to previous work in the field of instructable robots this was done following a "user-centred" approach were the main concern was to create primitive procedures that can be directly associated with natural language instructions. To achieve this, a corpus of human-to-human natural language instructions was collected and analysed. A set of primitive actions was found with which the collected corpus could be represented. These primitive actions were then implemented as robot-executable procedures. Natural language instructions are under-specified when destined to be executed by a robot. This is because instructors omit information that they consider as "commonsense" and rely on the listener's sensory-motor capabilities to determine the details of the task execution. In this thesis the under-specification problem is solved by determining the missing information, either during the learning of new routes or during their execution by the robot. During learning, the missing information is determined by imitating the commonsense approach human listeners take to achieve the same purpose. During execution, missing information, such as the location of road layout features mentioned in route instructions, is determined from the robot's view by using image template matching. The original contribution of this thesis, in both these methods, lies in the fact that they are driven by the natural language examples found in the corpus collected for the IDL project. During the testing phase a high success rate of primitive calls, when these were considered individually, showed that the under-specification problem has overall been solved. A novel method for testing the primitive procedures, as part of complete route descriptions, is also proposed in this thesis. This was done by comparing the performance of human subjects when driving the robot, following route descriptions, with the performance of the robot when executing the same route descriptions. The results obtained from this comparison clearly indicated where errors occur from the time when a human speaker gives a route description to the time when the task is executed by a human listener or by the robot. Finally, a software speed controller is proposed in this thesis in order to control the wheel speeds of the robot used in this project. The controller employs PI (Proportional and Integral) and PID (Proportional, Integral and Differential) control and provides a good alternative to expensive hardware

    Detection of immovable objects on visually impaired people walking aids

    Get PDF
    One consequence of a visually impaired (blind) person is a lack of ability in the activities related to the orientation and mobility. Blind person uses a stick as a tool to know the objects that surround him/her.The objective of this research is to develop a tool for blind person which is able to recognize what object in front of him/her when he/she is walking. An attached camera will obtain an image of an object which is then processed using template matching method to identify and trace the image of the object. After getting the image of the object, furthermore calculate and compare it with the data training. The output is produced in the form of sound that in accordance with the object. The result of this research is that the best slope and distance for the template matching method to properly detect silent objects is 90 degrees and 2 meters

    Rancang Bangun Sistem Security Door Lock Berbasis PI Camera dan Sensor Sidik Jari

    Get PDF
    In some locations, security systems are a priority. One is the location of the laboratory. Many valuable research tools and equipment are available in the laboratory. To improve the security system, the laboratory acces is equipped with a digital security system.  In this paper, we created a security doorlock with multi-feature.  We combine fingerprint recognition systems and image matching method using cameras. An ATMega2560 microcontroller is used as the main controller. Image processing is powered by a raspberry and pi camera device.The security door lock system is also equipped with data logger. By using this system, the laboratory can be accessed easily, simply and safely. Keywords— image processing, microcontroler, raspberry PI.Â

    Object Detection Based on Template Matching through Use of Best-So-Far ABC

    Get PDF
    Best-so-far ABC is a modified version of the artificial bee colony (ABC) algorithm used for optimization tasks. This algorithm is one of the swarm intelligence (SI) algorithms proposed in recent literature, in which the results demonstrated that the best-so-far ABC can produce higher quality solutions with faster convergence than either the ordinary ABC or the current state-of-the-art ABC-based algorithm. In this work, we aim to apply the best-so-far ABC-based approach for object detection based on template matching by using the difference between the RGB level histograms corresponding to the target object and the template object as the objective function. Results confirm that the proposed method was successful in both detecting objects and optimizing the time used to reach the solution

    Spoken Language and Vision for Adaptive Human-Robot Cooperation

    Get PDF

    Abstract Vision-Based Urban Navigation Procedures for Verbally Instructed Robots

    No full text
    When humans explain a task to be executed by a robot they decompose it into chunks of actions. These form a chain of search-and-act sensory-motor loops that exit when a condition is met. In this paper we investigate the nature of these chunks in an urban visual navigation context, and propose a method for implementing the corresponding robot primitives such as "take the n th turn right/left". These primitives make use of a "shortlived" internal map updated as the robot moves along. The recognition and localisation of intersections is done in the map using task-guided template matching. This approach takes advantage of the content of human instructions to save computation time and improve robustness

    MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS

    Get PDF
    This thesis presents a theoretical framework for the design of user-programmable robots. The objective of the work is to investigate multi-modal unconstrained natural instructions given to robots in order to design a learning robot. A corpus-centred approach is used to design an agent that can reason, learn and interact with a human in a natural unconstrained way. The corpus-centred design approach is formalised and developed in detail. It requires the developer to record a human during interaction and analyse the recordings to find instruction primitives. These are then implemented into a robot. The focus of this work has been on how to combine speech and gesture using rules extracted from the analysis of a corpus. A multi-modal integration algorithm is presented, that can use timing and semantics to group, match and unify gesture and language. The algorithm always achieves correct pairings on a corpus and initiates questions to the user in ambiguous cases or missing information. The domain of card games has been investigated, because of its variety of games which are rich in rules and contain sequences. A further focus of the work is on the translation of rule-based instructions. Most multi-modal interfaces to date have only considered sequential instructions. The combination of frame-based reasoning, a knowledge base organised as an ontology and a problem solver engine is used to store these rules. The understanding of rule instructions, which contain conditional and imaginary situations require an agent with complex reasoning capabilities. A test system of the agent implementation is also described. Tests to confirm the implementation by playing back the corpus are presented. Furthermore, deployment test results with the implemented agent and human subjects are presented and discussed. The tests showed that the rate of errors that are due to the sentences not being defined in the grammar does not decrease by an acceptable rate when new grammar is introduced. This was particularly the case for complex verbal rule instructions which have a large variety of being expressed

    Development of Cognitive Capabilities in Humanoid Robots

    Get PDF
    Merged with duplicate record 10026.1/645 on 03.04.2017 by CS (TIS)Building intelligent systems with human level of competence is the ultimate grand challenge for science and technology in general, and especially for the computational intelligence community. Recent theories in autonomous cognitive systems have focused on the close integration (grounding) of communication with perception, categorisation and action. Cognitive systems are essential for integrated multi-platform systems that are capable of sensing and communicating. This thesis presents a cognitive system for a humanoid robot that integrates abilities such as object detection and recognition, which are merged with natural language understanding and refined motor controls. The work includes three studies; (1) the use of generic manipulation of objects using the NMFT algorithm, by successfully testing the extension of the NMFT to control robot behaviour; (2) a study of the development of a robotic simulator; (3) robotic simulation experiments showing that a humanoid robot is able to acquire complex behavioural, cognitive, and linguistic skills through individual and social learning. The robot is able to learn to handle and manipulate objects autonomously, to cooperate with human users, and to adapt its abilities to changes in internal and environmental conditions. The model and the experimental results reported in this thesis, emphasise the importance of embodied cognition, i.e. the humanoid robot's physical interaction between its body and the environment

    Visual Speech Recognition

    Get PDF
    In recent years, Visual speech recognition has a more concentration, by researchers, than the past. Because of the leakage of the visual processing of the Arabic vocabularies recognition, we start to search in this field. Audio speech recognition concerned with the acoustic characteristic of the signal, but there are many situations that the audio signal is weak of not exist, and this will be a point in Chapter 2. The visual recognition process focuses on the features extracted from video of the speaker. These features are to be classified using several techniques. The most important feature to be extracted is motion. By segmenting motion of the lips of the speaker, an algorithm has manipulate it in such away to recognize the word which is said. But motion segmentation is not the only problem facing the speech recognition process, segmenting the lips itself is an early step in the speech recognition process, so, to segment lips motion we have to segment lips first, a new approach for lip segmentation is proposed in this thesis. Sometimes, motion feature needs another feature to support in recognition the spoken word. So in our thesis another new algorithm is proposed to use motion segmentation by using the Abstract Difference Image from an image series, supported by correlation for registering images in the image series, to recognize ten words in the Arabic language, the words are from “one” to “ten” in Arabic language. The algorithm also uses the HU-Invariant set of features to describe the Abstract Difference Image, and uses a three different recognition methods to recognize the words. The CLAHE method as a filtering technique is used by our algorithm to manipulate lighting problems. Our algorithm based on extracting the differences details from a series of images to recognize the word, achieved an overall results 55.8%, it is an adequate result for our algorithm when integrated in an audio-visual system
    corecore