66,397 research outputs found

    Grounding semantics in robots for Visual Question Answering

    Get PDF
    In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning

    Combining depth and intensity images to produce enhanced object detection for use in a robotic colony

    Get PDF
    Robotic colonies that can communicate with each other and interact with their ambient environments can be utilized for a wide range of research and industrial applications. However amongst the problems that these colonies face is that of the isolating objects within an environment. Robotic colonies that can isolate objects within the environment can not only map that environment in de-tail, but interact with that ambient space. Many object recognition techniques ex-ist, however these are often complex and computationally expensive, leading to overly complex implementations. In this paper a simple model is proposed to isolate objects, these can then be recognize and tagged. The model will be using 2D and 3D perspectives of the perceptual data to produce a probability map of the outline of an object, therefore addressing the defects that exist with 2D and 3D image techniques. Some of the defects that will be addressed are; low level illumination and objects at similar depths. These issues may not be completely solved, however, the model provided will provide results confident enough for use in a robotic colony

    Indoor wireless communications and applications

    Get PDF
    Chapter 3 addresses challenges in radio link and system design in indoor scenarios. Given the fact that most human activities take place in indoor environments, the need for supporting ubiquitous indoor data connectivity and location/tracking service becomes even more important than in the previous decades. Specific technical challenges addressed in this section are(i), modelling complex indoor radio channels for effective antenna deployment, (ii), potential of millimeter-wave (mm-wave) radios for supporting higher data rates, and (iii), feasible indoor localisation and tracking techniques, which are summarised in three dedicated sections of this chapter

    Role Playing Learning for Socially Concomitant Mobile Robot Navigation

    Full text link
    In this paper, we present the Role Playing Learning (RPL) scheme for a mobile robot to navigate socially with its human companion in populated environments. Neural networks (NN) are constructed to parameterize a stochastic policy that directly maps sensory data collected by the robot to its velocity outputs, while respecting a set of social norms. An efficient simulative learning environment is built with maps and pedestrians trajectories collected from a number of real-world crowd data sets. In each learning iteration, a robot equipped with the NN policy is created virtually in the learning environment to play itself as a companied pedestrian and navigate towards a goal in a socially concomitant manner. Thus, we call this process Role Playing Learning, which is formulated under a reinforcement learning (RL) framework. The NN policy is optimized end-to-end using Trust Region Policy Optimization (TRPO), with consideration of the imperfectness of robot's sensor measurements. Simulative and experimental results are provided to demonstrate the efficacy and superiority of our method

    Cognitive visual tracking and camera control

    Get PDF
    Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision
    corecore