1,646 research outputs found

    InLoc: Indoor Visual Localization with Dense Matching and View Synthesis

    Get PDF
    We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph with respect to a large indoor 3D map. The contributions of this work are three-fold. First, we develop a new large-scale visual localization method targeted for indoor environments. The method proceeds along three steps: (i) efficient retrieval of candidate poses that ensures scalability to large-scale environments, (ii) pose estimation using dense matching rather than local features to deal with textureless indoor scenes, and (iii) pose verification by virtual view synthesis to cope with significant changes in viewpoint, scene layout, and occluders. Second, we collect a new dataset with reference 6DoF poses for large-scale indoor localization. Query photographs are captured by mobile phones at a different time than the reference 3D map, thus presenting a realistic indoor localization scenario. Third, we demonstrate that our method significantly outperforms current state-of-the-art indoor localization approaches on this new challenging data

    Qualitative Distances and Qualitative Description of Images for Indoor Scene Description and Recognition in Robotics

    Get PDF
    The automatic extraction of knowledge from the world by a robotic system as human beings interpret their environment through their senses is still an unsolved task in Artificial Intelligence. A robotic agent is in contact with the world through its sensors and other electronic components which obtain and process mainly numerical information. Sonar, infrared and laser sensors obtain distance information. Webcams obtain digital images that are represented internally as matrices of red, blue and green (RGB) colour coordinate values. All this numerical values obtained from the environment need a later interpretation in order to provide the knowledge required by the robotic agent in order to carry out a task. Similarly, light wavelengths with specific amplitude are captured by cone cells of human eyes obtaining also stimulus without meaning. However, the information that human beings can describe and remember from what they see is expressed using words, that is qualitatively. The research work done in this thesis tries to narrow the gap between the acquisition of low level information by robot sensors and the need of obtaining high level or qualitative information for enhancing human-machine communication and for applying logical reasoning processes based on concepts. Moreover, qualitative concepts can be added a meaning by relating them to others. They can be used for reasoning applying qualitative models that have been developed in the last twenty years for describing and interpreting metrical and mathematical concepts such as orientation, distance, velocity, acceleration, and so on. And they can be also understood by human-users both written and read aloud. The first contribution presented is the definition of a method for obtaining fuzzy distance patterns (which include qualitative distances such as near , far , very far and so on) from the data obtained by any kind of distance sensors incorporated in a mobile robot and the definition of a factor to measure the dissimilarity between those fuzzy patterns. Both have been applied to the integration of the distances obtained by the sonar and laser distance sensors incorporated in a Pioneer 2 dx mobile robot and, as a result, special obstacles have been detected as glass window , mirror , and so on. Moreover, the fuzzy distance patterns provided have been also defuzzified in order to obtain a smooth robot speed and used to classify orientation reference systems into open (it defines an open space to be explored) or closed . The second contribution presented is the definition of a model for qualitative image description (QID) based on qualitative models of shape, colour, topology and orientation. This model can qualitatively describe any kind of digital image and is independent of the image segmentation method used. The QID model have been tested in two scenarios in robotics: (i) the description of digital images captured by the camera of a Pioneer 2 dx mobile robot and (ii) the description of digital images of tile mosaics taken by an industrial camera located on a platform used by a robot arm to assemble tile mosaics. In order to provide a formal and explicit meaning to the qualitative description of the images generated, a Description Logic (DL) based ontology has been designed and presented as the third contribution. Our approach can automatically process any random image and obtain a set of DL-axioms that describe it visually and spatially. And objects included in the images are classified according to the ontology schema using a DL reasoner. Tests have been carried out using digital images captured by a webcam incorporated in a Pioneer 2 dx mobile robot. The images taken correspond to the corridors of a building at University Jaume I and objects with them have been classified into walls , floor , office doors and fire extinguishers under different illumination conditions and from different observer viewpoints. The final contribution is the definition of a similarity measure between qualitative descriptions of shape, colour, topology and orientation. And the integration of those measures into the definition of a general similarity measure between two qualitative descriptions of images. These similarity measures have been applied to: (i) extract objects with similar shapes from the MPEG7 CE Shape-1 library; (ii) assemble tile mosaics by qualitative shape and colour similarity matching; (iii) compare images of tile compositions; and (iv) compare images of natural landmarks in a mobile robot world for their recognition

    Describing images using qualitative models and description logics

    Get PDF
    Special Issue: Qualitative spatial and temporal reasoning: emerging applications, trends, and directionsOur approach describes any digital image qualitatively by detecting regions/objects inside it and describing their visual characteristics (shape and colour) and their spatial characteristics (orientation and topology) by means of qualitative models. The description obtained is translated into a description logic (DL) based ontology, which gives a formal and explicit meaning to the qualitative tags representing the visual features of the objects in the image and the spatial relations between them. For any image, our approach obtains a set of individuals that are classified using a DL reasoner according to the descriptions of our ontolog

    Embodied Question Answering

    Full text link
    We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather information through first-person (egocentric) vision, and then answer the question ("orange"). This challenging task requires a range of AI skills -- active perception, language understanding, goal-driven navigation, commonsense reasoning, and grounding of language into actions. In this work, we develop the environments, end-to-end-trained reinforcement learning agents, and evaluation protocols for EmbodiedQA.Comment: 20 pages, 13 figures, Webpage: https://embodiedqa.org

    The robot's vista space : a computational 3D scene analysis

    Get PDF
    Swadzba A. The robot's vista space : a computational 3D scene analysis. Bielefeld (Germany): Bielefeld University; 2011.The space that can be explored quickly from a fixed view point without locomotion is known as the vista space. In indoor environments single rooms and room parts follow this definition. The vista space plays an important role in situations with agent-agent interaction as it is the directly surrounding environment in which the interaction takes place. A collaborative interaction of the partners in and with the environment requires that both partners know where they are, what spatial structures they are talking about, and what scene elements they are going to manipulate. This thesis focuses on the analysis of a robot's vista space. Mechanisms for extracting relevant spatial information are developed which enable the robot to recognize in which place it is, to detect the scene elements the human partner is talking about, and to segment scene structures the human is changing. These abilities are addressed by the proposed holistic, aligned, and articulated modeling approach. For a smooth human-robot interaction, the computed models should be aligned to the partner's representations. Therefore, the design of the computational models is based on the combination of psychological results from studies on human scene perception with basic physical properties of the perceived scene and the perception itself. The holistic modeling realizes a categorization of room percepts based on the observed 3D spatial layout. Room layouts have room type specific features and fMRI studies have shown that some of the human brain areas being active in scene recognition are sensitive to the 3D geometry of a room. With the aligned modeling, the robot is able to extract the hierarchical scene representation underlying a scene description given by a human tutor. Furthermore, it is able to ground the inferred scene elements in its own visual perception of the scene. This modeling follows the assumption that cognition and language schematize the world in the same way. This is visible in the fact that a scene depiction mainly consists of relations between an object and its supporting structure or between objects located on the same supporting structure. Last, the articulated modeling equips the robot with a methodology for articulated scene part extraction and fast background learning under short and disturbed observation conditions typical for human-robot interaction scenarios. Articulated scene parts are detected model-less by observing scene changes caused by their manipulation. Change detection and background learning are closely coupled because change is defined phenomenologically as variation of structure. This means that change detection involves a comparison of currently visible structures with a representation in memory. In range sensing this comparison can be nicely implement as subtraction of these two representations. The three modeling approaches enable the robot to enrich its visual perceptions of the surrounding environment, the vista space, with semantic information about meaningful spatial structures useful for further interaction with the environment and the human partner

    3D Reconstruction of Indoor Corridor Models Using Single Imagery and Video Sequences

    Get PDF
    In recent years, 3D indoor modeling has gained more attention due to its role in decision-making process of maintaining the status and managing the security of building indoor spaces. In this thesis, the problem of continuous indoor corridor space modeling has been tackled through two approaches. The first approach develops a modeling method based on middle-level perceptual organization. The second approach develops a visual Simultaneous Localisation and Mapping (SLAM) system with model-based loop closure. In the first approach, the image space was searched for a corridor layout that can be converted into a geometrically accurate 3D model. Manhattan rule assumption was adopted, and indoor corridor layout hypotheses were generated through a random rule-based intersection of image physical line segments and virtual rays of orthogonal vanishing points. Volumetric reasoning, correspondences to physical edges, orientation map and geometric context of an image are all considered for scoring layout hypotheses. This approach provides physically plausible solutions while facing objects or occlusions in a corridor scene. In the second approach, Layout SLAM is introduced. Layout SLAM performs camera localization while maps layout corners and normal point features in 3D space. Here, a new feature matching cost function was proposed considering both local and global context information. In addition, a rotation compensation variable makes Layout SLAM robust against cameras orientation errors accumulations. Moreover, layout model matching of keyframes insures accurate loop closures that prevent miss-association of newly visited landmarks to previously visited scene parts. The comparison of generated single image-based 3D models to ground truth models showed that average ratio differences in widths, heights and lengths were 1.8%, 3.7% and 19.2% respectively. Moreover, Layout SLAM performed with the maximum absolute trajectory error of 2.4m in position and 8.2 degree in orientation for approximately 318m path on RAWSEEDS data set. Loop closing was strongly performed for Layout SLAM and provided 3D indoor corridor layouts with less than 1.05m displacement errors in length and less than 20cm in width and height for approximately 315m path on York University data set. The proposed methods can successfully generate 3D indoor corridor models compared to their major counterpart

    CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit Representation

    Full text link
    Beyond novel view synthesis, Neural Radiance Fields are useful for applications that interact with the real world. In this paper, we use them as an implicit map of a given scene and propose a camera relocalization algorithm tailored for this representation. The proposed method enables to compute in real-time the precise position of a device using a single RGB camera, during its navigation. In contrast with previous work, we do not rely on pose regression or photometric alignment but rather use dense local features obtained through volumetric rendering which are specialized on the scene with a self-supervised objective. As a result, our algorithm is more accurate than competitors, able to operate in dynamic outdoor environments with changing lightning conditions and can be readily integrated in any volumetric neural renderer.Comment: Accepted to ICCV 202

    Qualitative spatial logic descriptors from 3D indoor scenes to generate explanations in natural language

    Get PDF
    Falomir Z, Kluth T. Qualitative spatial logic descriptors from 3D indoor scenes to generate explanations in natural language. Cognitive Processing. 2018;19(2):265-284.The challenge of describing 3D real scenes is tackled in this paper using qualitative spatial descriptors. A key point to study is which qualitative descriptors to use and how these qualitative descriptors must be organized to produce a suitable cognitive explanation. In order to find answers, a survey test was carried out with human participants which openly described a scene containing some pieces of furniture. The data obtained in this survey are analysed, and taking this into account, the QSn3D computational approach was developed which uses a XBox 360 Kinect to obtain 3D data from a real indoor scene. Object features are computed on these 3D data to identify objects in indoor scenes. The object orientation is computed, and qualitative spatial relations between the objects are extracted. These qualitative spatial relations are the input to a grammar which applies saliency rules obtained from the survey study and generates cognitive natural language descriptions of scenes. Moreover, these qualitative descriptors can be expressed as first-order logical facts in Prolog for further reasoning. Finally, a validation study is carried out to test whether the descriptions provided by QSn3D approach are human readable. The obtained results show that their acceptability is higher than 82%
    • …
    corecore