611 research outputs found

    Virtual sensors for human concepts—Building detection by an outdoor mobile robot

    Get PDF
    In human–robot communication it is often important to relate robot sensor readings to concepts used by humans. We suggest the use of a virtual sensor (one or several physical sensors with a dedicated signal processing unit for the recognition of real world concepts) and a method with which the virtual sensor can learn from a set of generic features. The virtual sensor robustly establishes the link between sensor data and a particular human concept. In this work, we present a virtual sensor for building detection that uses vision and machine learning to classify the image content in a particular direction as representing buildings or non-buildings. The virtual sensor is trained on a diverse set of image data, using features extracted from grey level images. The features are based on edge orientation, the configurations of these edges, and on grey level clustering. To combine these features, the AdaBoost algorithm is applied. Our experiments with an outdoor mobile robot show that the method is able to separate buildings from nature with a high classification rate, and to extrapolate well to images collected under different conditions. Finally, the virtual sensor is applied on the mobile robot, combining its classifications of sub-images from a panoramic view with spatial information (in the form of location and orientation of the robot) in order to communicate the likely locations of buildings to a remote human operator

    Spatial language driven robot

    Get PDF
    This dissertation investigates the methods to enable a robot to interact with human using spatial language. A prototype system of human-robot interaction using spatial language running on an autonomous robot is proposed in the dissertation. The system includes two complementary works. One is to control the robot by human natural spatial language to find the target object to fetch it. Another work is to generate a natural spatial language description to describe a target object in the robot working environment. The first task is called spatial language grounding and the second work is named as spatial language generation. The spatial language grounding and generation are both end-to-end process which means the system will determine the output only by the natural language command from a human during the interaction and the raw perception data collected from the environment. Furniture recognizers are designed for the robot to detect the environment during the tasks. A hierarchy system is designed to translate the human spatial language to the symbolic grounding model and then to the robot actions. To reduce the ambiguity in the interaction, a human demonstration system is designed to collect the spatial concept of the human user for building the robot behavior policies under different grounding models. A language generation system trained by real human spatial language corpus is proposed to automatically edit spatial descriptions of the location of a target object. All the modules in the system are evaluated in the physical environment, and a 3D robot simulator developed on ROS and GAZEBO.Includes biblographical reference

    A fuzzy logic approach to localisation in wireless local area networks

    Get PDF
    This thesis examines the use and value of fuzzy sets, fuzzy logic and fuzzy inference in wireless positioning systems and solutions. Various fuzzy-related techniques and methodologies are reviewed and investigated, including a comprehensive review of fuzzy-based positioning and localisation systems. The thesis is aimed at the development of a novel positioning technique which enhances well-known multi-nearest-neighbour (kNN) and fingerprinting algorithms with received signal strength (RSS) measurements. A fuzzy inference system is put forward for the generation of weightings for selected nearest-neighbours and the elimination of outliers. In this study, Monte Carlo simulations of a proposed multivariable fuzzy localisation (MVFL) system showed a significant improvement in the root mean square error (RMSE) in position estimation, compared with well-known localisation algorithms. The simulation outcomes were confirmed empirically in laboratory tests under various scenarios. The proposed technique uses available indoor wireless local area network (WLAN) infrastructure and requires no additional hardware or modification to the network, nor any active user participation. The thesis aims to benefit practitioners and academic researchers of system positioning

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

    BWIBots: A platform for bridging the gap between AI and human–robot interaction research

    Get PDF
    Recent progress in both AI and robotics have enabled the development of general purpose robot platforms that are capable of executing a wide variety of complex, temporally extended service tasks in open environments. This article introduces a novel, custom-designed multi-robot platform for research on AI, robotics, and especially human–robot interaction for service robots. Called BWIBots, the robots were designed as a part of the Building-Wide Intelligence (BWI) project at the University of Texas at Austin. The article begins with a description of, and justification for, the hardware and software design decisions underlying the BWIBots, with the aim of informing the design of such platforms in the future. It then proceeds to present an overview of various research contributions that have enabled the BWIBots to better (a) execute action sequences to complete user requests, (b) efficiently ask questions to resolve user requests, (c) understand human commands given in natural language, and (d) understand human intention from afar. The article concludes with a look forward towards future research opportunities and applications enabled by the BWIBot platform

    Mobile robot vavigation using a vision based approach

    Get PDF
    PhD ThesisThis study addresses the issue of vision based mobile robot navigation in a partially cluttered indoor environment using a mapless navigation strategy. The work focuses on two key problems, namely vision based obstacle avoidance and vision based reactive navigation strategy. The estimation of optical flow plays a key role in vision based obstacle avoidance problems, however the current view is that this technique is too sensitive to noise and distortion under real conditions. Accordingly, practical applications in real time robotics remain scarce. This dissertation presents a novel methodology for vision based obstacle avoidance, using a hybrid architecture. This integrates an appearance-based obstacle detection method into an optical flow architecture based upon a behavioural control strategy that includes a new arbitration module. This enhances the overall performance of conventional optical flow based navigation systems, enabling a robot to successfully move around without experiencing collisions. Behaviour based approaches have become the dominant methodologies for designing control strategies for robot navigation. Two different behaviour based navigation architectures have been proposed for the second problem, using monocular vision as the primary sensor and equipped with a 2-D range finder. Both utilize an accelerated version of the Scale Invariant Feature Transform (SIFT) algorithm. The first architecture employs a qualitative-based control algorithm to steer the robot towards a goal whilst avoiding obstacles, whereas the second employs an intelligent control framework. This allows the components of soft computing to be integrated into the proposed SIFT-based navigation architecture, conserving the same set of behaviours and system structure of the previously defined architecture. The intelligent framework incorporates a novel distance estimation technique using the scale parameters obtained from the SIFT algorithm. The technique employs scale parameters and a corresponding zooming factor as inputs to train a neural network which results in the determination of physical distance. Furthermore a fuzzy controller is designed and integrated into this framework so as to estimate linear velocity, and a neural network based solution is adopted to estimate the steering direction of the robot. As a result, this intelligent iv approach allows the robot to successfully complete its task in a smooth and robust manner without experiencing collision. MS Robotics Studio software was used to simulate the systems, and a modified Pioneer 3-DX mobile robot was used for real-time implementation. Several realistic scenarios were developed and comprehensive experiments conducted to evaluate the performance of the proposed navigation systems. KEY WORDS: Mobile robot navigation using vision, Mapless navigation, Mobile robot architecture, Distance estimation, Vision for obstacle avoidance, Scale Invariant Feature Transforms, Intelligent framework

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

    Tactile Sensing for Assistive Robotics

    Get PDF

    Joint Perceptual Learning and Natural Language Acquisition for Autonomous Robots

    Get PDF
    Understanding how children learn the components of their mother tongue and the meanings of each word has long fascinated linguists and cognitive scientists. Equally, robots face a similar challenge in understanding language and perception to allow for a natural and effortless human-robot interaction. Acquiring such knowledge is a challenging task, unless this knowledge is preprogrammed, which is no easy task either, nor does it solve the problem of language difference between individuals or learning the meaning of new words. In this thesis, the problem of bootstrapping knowledge in language and vision for autonomous robots is addressed through novel techniques in grammar induction and word grounding to the perceptual world. The learning is achieved in a cognitively plausible loosely-supervised manner from raw linguistic and visual data. The visual data is collected using different robotic platforms deployed in real-world and simulated environments and equipped with different sensing modalities, while the linguistic data is collected using online crowdsourcing tools and volunteers. The presented framework does not rely on any particular robot or any specific sensors; rather it is flexible to what the modalities of the robot can support. The learning framework is divided into three processes. First, the perceptual raw data is clustered into a number of Gaussian components to learn the ‘visual concepts’. Second, frequent co-occurrence of words and visual concepts are used to learn the language grounding, and finally, the learned language grounding and visual concepts are used to induce probabilistic grammar rules to model the language structure. In this thesis, the visual concepts refer to: (i) people’s faces and the appearance of their garments; (ii) objects and their perceptual properties; (iii) pairwise spatial relations; (iv) the robot actions; and (v) human activities. The visual concepts are learned by first processing the raw visual data to find people and objects in the scene using state-of-the-art techniques in human pose estimation, object segmentation and tracking, and activity analysis. Once found, the concepts are learned incrementally using a combination of techniques: Incremental Gaussian Mixture Models and a Bayesian Information Criterion to learn simple visual concepts such as object colours and shapes; spatio-temporal graphs and topic models to learn more complex visual concepts, such as human activities and robot actions. Language grounding is enabled by seeking frequent co-occurrence between words and learned visual concepts. Finding the correct language grounding is formulated as an integer programming problem to find the best many-to-many matches between words and concepts. Grammar induction refers to the process of learning a formal grammar (usually as a collection of re-write rules or productions) from a set of observations. In this thesis, Probabilistic Context Free Grammar rules are generated to model the language by mapping natural language sentences to learned visual concepts, as opposed to traditional supervised grammar induction techniques where the learning is only made possible by using manually annotated training examples on large datasets. The learning framework attains its cognitive plausibility from a number of sources. First, the learning is achieved by providing the robot with pairs of raw linguistic and visual inputs in a “show-and-tell” procedure akin to how human children learn about their environment. Second, no prior knowledge is assumed about the meaning of words or the structure of the language, except that there are different classes of words (corresponding to observable actions, spatial relations, and objects and their observable properties). Third, the knowledge in both language and vision is obtained in an incremental manner where the gained knowledge can evolve to adapt to new observations without the need to revisit previously seen ones (previous observations). Fourth, the robot learns about the visual world first, then it learns about how it maps to language, which aligns with the findings of cognitive studies on language acquisition in human infants that suggest children come to develop considerable cognitive understanding about their environment in the pre-linguistic period of their lives. It should be noted that this work does not claim to be modelling how humans learn about objects in their environments, but rather it is inspired by it. For validation, four different datasets are used which contain temporally aligned video clips of people or robots performing activities, and sentences describing these video clips. The video clips are collected using four robotic platforms, three robot arms in simple block-world scenarios and a mobile robot deployed in a challenging real-world office environment observing different people performing complex activities. The linguistic descriptions for these datasets are obtained using Amazon Mechanical Turk and volunteers. The analysis performed on these datasets suggest that the learning framework is suitable to learn from complex real-world scenarios. The experimental results show that the learning framework enables (i) acquiring correct visual concepts from visual data; (ii) learning the word grounding for each of the extracted visual concepts; (iii) inducing correct grammar rules to model the language structure; (iv) using the gained knowledge to understand previously unseen linguistic commands; and (v) using the gained knowledge to generate well-formed natural language descriptions of novel scenes

    Hierarchical Modelling and Recognition of Activities of Daily Living

    Get PDF
    Activity recognition is becoming an increasingly important task in artificial intelligence. Successful activity recognition systems must be able to model and recognise activities ranging from simple short activities spanning a few seconds to complex longer activities spanning minutes or hours. We define activities as a set of qualitatively interesting interactions between people, objects and the environment. Accurate activity recognition is a desirable task in many scenarios such as surveillance, smart environments, robotic vision etc. In the domain of robotic vision specifically, there is now an increasing interest in autonomous robots that are able to operate without human intervention for long periods of time. The goal of this research is to build activity recognition approaches for such systems that are able to model and recognise simple short activities as well as complex longer activities arising from long-term autonomous operation of intelligent systems. The research makes the following key contributions: 1. We present a qualitative and quantitative representation to model simple activities as observed by autonomous systems. 2. We present a hierarchical framework to efficiently model complex activities that comprise of many sub-activities at varying levels of granularity. Simple activities are modelled using a discriminative model where a combined feature space, consisting of qualitative and quantitative spatio-temporal features, is generated in order to encode various aspects of the activity. Qualitative features are computed using qualitative spatio-temporal relations between human subjects and objects in order to abstractly represent the simple activity. Unlike current state-of-the-art approaches, our approach uses significantly fewer assumptions and does not require any knowledge about object types, their affordances, or the constituent activities of an activity. The optimal and most discriminating features are then extracted, using an entropy-based feature selection process, to best represent the training data. A novel approach for building models of complex long-term activities is presented as well. The proposed approach builds a hierarchical activity model from mark-up of activities acquired from multiple annotators in a video corpus. Multiple human annotators identify activities at different levels of conceptual granularity. Our method automatically infers a ‘part-of’ hierarchical activity model from this data using semantic similarity of textual annotations and temporal consistency. We then consolidate hierarchical structures learned from different training videos into a generalised hierarchical model represented as an extended grammar describing the over all activity. We then describe an inference mechanism to interpret new instances of activities. Simple short activity classes are first recognised using our previously learned generalised model. Given a test video, simple activities are detected as a stream of temporally complex low-level actions. We then use the learned extended grammar to infer the higher-level activities as a hierarchy over the low-level action input stream. We make use of three publicly available datasets to validate our two approaches of modelling simple to complex activities. These datasets have been annotated by multiple annotators through crowd-sourcing and in-house annotations. They consist of daily activity videos such as ‘cleaning microwave’, ‘having lunch in a restaurant’, ‘working in an office’ etc. The activities in these datasets have all been marked up at multiple levels of abstraction by multiple annotators, however no information on the ‘part-of’ relationship between activities is provided. The complexity of the videos and their annotations allows us to demonstrate the effectiveness of the proposed methods
    • 

    corecore