271 research outputs found

    360º Indoors image processing for 3D model reconstruction

    Get PDF
    In this modern age of computer technology we are pushing the unimaginable limits of our reality. One of the human desires with these advances is to digitise the vast amount of information that is present in our reality. An important source of information is the 3-dimensional space in which we live. Especially indoors environments that we frequently occupy, for example, living places. With the proliferation of photographing devices, the development of cheap omnidirectional cameras has been one of the interests. So nowadays it is quite easy to obtain spatial data of interior spaces in form of equirectangular images. In this project we study the problem of 3D Indoors Model Reconstruction from Spherical Images. Though, we study it under perspective based methods as it is possible to perform the conversion from one to other. We first formally specify the problem to be solved. We find many different specifications and describe reconstruction methods for some of them. We choose one specification for our use case. Most of the methods require feature extraction and matching, and then performing multi-view geometry estimation. We continue the study of these methods in the experimentation phase. We propose different hypothesis relevant to different steps, perform experiments and form our conclusions. We finish our work by implementing a very simple system solving this problem, making use of ASIFT feature extractor, FLANN kD-Tree feature matcher, and OpenCV's essential matrix estimation algorithm.En aquesta era moderna de la tecnologia de computadors estem empenyent els líımits inimaginables de la nostra realitat. Una de les aspiracions humanes amb aquests avenços és la digitalització de l’enorme quantitat d’informació present en la nostra realitat. Una de les fonts importants d’informació és l’espai 3-dimensional en el que vivim. Especialment, els entorns interiors que habitem, per exemple, els habitatges. Amb la proliferació dels dispositius fotogràfics, el desenvolupament de càmeres omnidireccionals barates ha estat un dels interessos. Per aquest motiu, avui en dia és molt fàcil obtenir dades espacials dels espais interiors en forma d’imatges equirectangulars. En aquest projecte estudiem el problema de la Reconstrucció de Models 3D d’Interiors a partir d’Imatges Esfèriques. Tanmateix, estudiem el problema fent ús de mètodes basats en la perspectiva ja que és possible fer la conversió d’un a l’altre. En primer lloc, especifiquem formalment el problema a resoldre. A continuació, trobem diverses especificacions i descrivim mètodes de reconstruccions per algunes d’elles. Seleccionem una especificaciò pel nostre cas d’ús. La majoria de mètodes utilitzen feature extraction, feature matching i epipolar geometry. Continuem l’estudi amb la fase d’experimentació. Proposem hipòtesis rellevants a diferents passos, realitzem els experiments i extraiem conclusions. Acabem el treball implementant un sistema simple resolent el problema, fent ús de ASIFT feature extractor, FLANN kD- Tree feature matcher, i l’algorisme d’OpenCV per l’aproximació de la matriu essencialEn esta era moderna de tecnología de computadores estamos empujando los límites inimaginables de nuestra realidad. Una de las aspiraciones humanas con estos avances es la digitalización de la tremenda cantidad de información presente en nuestra realidad. Una de las importantes fuentes de información es el espacio 3-dimensional en que vivimos. Especialmente los entornos interiores que habitamos, por ejemplo, las viviendas. Con la proliferación de los dispositivos fotográficos, el desarrollo de cámaras omnidireccionales baratas ha sido uno de los intereses. Por ello, hoy en día es muy fácil de obtener datos espaciales de los espacios interiores en forma de imágenes equirectangulares. En este proyecto estudiamos el problema de Reconstrucción de Modelos 3D de Interiores desde Imágenes Esféricas. Sin embargo, estudiamos el problema bajo métodos basados en la perspectiva ya que es posible hacer la conversión de uno al otro. Primero especificamos formalmente el problema a resolver. Encontramos distintas especificaciones y describimos métodos de reconstruccion para algunas de ellas. Seleccionamos una especificación para nuestro caso de uso. La mayoría de métodos utilizan feature extraction, feature matching, y epipolar geometry. Continuamos el estudio en la fase de experimentación. Proponemos hipótesis relevantes a diferentes pasos, realizamos los experimentos y sacamos conclusiones. Acabamos el trabajo implementando un sistema simple resolviendo el problema, haciendo uso de ASIFT feature extractor, FLANN kD-Tree feature matcher, y el algoritmo de OpenCV para la aproximación de la matriz esencial

    Language Guided Localization and Navigation

    Get PDF
    Embodied tasks that require active perception are key to improving language grounding models and creating holistic social agents. In this dissertation we explore four multi-modal embodied perception tasks and which require localization or navigation of an agent in an unknown temporal or 3D space with limited information about the environment. We first explore how an agent can be guided by language to navigate a temporal space using reinforcement learning in a similar way to that of a 3D space. Next, we explore how to teach an agent to navigate using only self-supervised learning from passive data. In this task we remove the complexity of language and explore a topological map and graph-network based strategy for navigation. We then present the Where Are You? (WAY) dataset which contains over 6k dialogs of two humans performing a localization task. On top of this dataset we design three tasks which push the envelope of current visual language-grounding tasks by introducing a multi-agent set up in which agents are required to use active perception to communicate, navigate, and localize. We specifically focus on modeling one of these tasks, Localization from Embodied Dialog (LED). The LED task involves taking a natural language dialog of two agents -- an observer and a locator -- and predicting the location of the observer agent. We find that a topological graph map of the environments is a successful representation for modeling the complex relational structure of the dialog and observer locations. We validate our approach on several state of the art multi-modal baselines and show that a multi-modal transformer with large-scale pre-training outperforms all other models. We additionally introduce a novel analysis pipeline on this model for the LED and the Vision Language Navigation (VLN) task to diagnose and reveal limitations and failure modes of these types of models.Ph.D

    Learning cognitive maps: Finding useful structure in an uncertain world

    Get PDF
    In this chapter we will describe the central mechanisms that influence how people learn about large-scale space. We will focus particularly on how these mechanisms enable people to effectively cope with both the uncertainty inherent in a constantly changing world and also with the high information content of natural environments. The major lessons are that humans get by with a less is more approach to building structure, and that they are able to quickly adapt to environmental changes thanks to a range of general purpose mechanisms. By looking at abstract principles, instead of concrete implementation details, it is shown that the study of human learning can provide valuable lessons for robotics. Finally, these issues are discussed in the context of an implementation on a mobile robot. © 2007 Springer-Verlag Berlin Heidelberg
    • …
    corecore