3,560 research outputs found

    Holistic indoor scene understanding by context supported instance segmentation

    Get PDF
    Intelligent robots require advanced vision capabilities to perceive and interact with the real physical world. While computer vision has made great strides in recent years, its predominant paradigm still focuses on building deep-learning networks or handcrafted features to achieve semantic labeling or instance segmentation separately and independently. However, the two tasks should be synergistically unified in the recognition flow since they have a complementary nature in scene understanding.This dissertation presents the detection of instances in multiple scene understanding levels. Representations that enable intelligent systems to not only recognize what is seen (e.g. Does that pixel represent a chair?), but also predict contextual information about the complete 3D scene as a whole (e.g. How big is the chair? Is the chair placed next to a table?). More specifically, it presents a flow of understanding from local information to global fitness. First, we investigate in the 3D geometry information of instances. A new approach of generating tight cuboids for objects is presented. Then, we take advantage of the trained semantic labeling networks by using the intermediate layer output as a per-category local detector. Instance hypotheses are generated to help traditional optimization methods to get a higher instance segmentation accuracy. After that, to bring the local detection results to holistic scene understanding, our method optimizes object instance segmentation considering both the spacial fitness and the relational compatibility. The context information is implemented using graphical models which represent the scene level object placement in three ways: horizontal, vertical and non-placement hanging relations. Finally, the context information is implemented to a network structure. A deep learning-based re-inferencing frame work is proposed to boost any pixel-level labeling outputs using our local collaborative object presence (LoCOP) feature as the global-to-local guidance.This dissertation demonstrates that uniting pixel-level detection and instance segmentation not only significantly improves the overall performance for localized and individualized analysis, but also paves the way for holistic scene understanding

    Efficient Belief Propagation for Perception and Manipulation in Clutter

    Full text link
    Autonomous service robots are required to perform tasks in common human indoor environments. To achieve goals associated with these tasks, the robot should continually perceive, reason its environment, and plan to manipulate objects, which we term as goal-directed manipulation. Perception remains the most challenging aspect of all stages, as common indoor environments typically pose problems in recognizing objects under inherent occlusions with physical interactions among themselves. Despite recent progress in the field of robot perception, accommodating perceptual uncertainty due to partial observations remains challenging and needs to be addressed to achieve the desired autonomy. In this dissertation, we address the problem of perception under uncertainty for robot manipulation in cluttered environments using generative inference methods. Specifically, we aim to enable robots to perceive partially observable environments by maintaining an approximate probability distribution as a belief over possible scene hypotheses. This belief representation captures uncertainty resulting from inter-object occlusions and physical interactions, which are inherently present in clutterred indoor environments. The research efforts presented in this thesis are towards developing appropriate state representations and inference techniques to generate and maintain such belief over contextually plausible scene states. We focus on providing the following features to generative inference while addressing the challenges due to occlusions: 1) generating and maintaining plausible scene hypotheses, 2) reducing the inference search space that typically grows exponentially with respect to the number of objects in a scene, 3) preserving scene hypotheses over continual observations. To generate and maintain plausible scene hypotheses, we propose physics informed scene estimation methods that combine a Newtonian physics engine within a particle based generative inference framework. The proposed variants of our method with and without a Monte Carlo step showed promising results on generating and maintaining plausible hypotheses under complete occlusions. We show that estimating such scenarios would not be possible by the commonly adopted 3D registration methods without the notion of a physical context that our method provides. To scale up the context informed inference to accommodate a larger number of objects, we describe a factorization of scene state into object and object-parts to perform collaborative particle-based inference. This resulted in the Pull Message Passing for Nonparametric Belief Propagation (PMPNBP) algorithm that caters to the demands of the high-dimensional multimodal nature of cluttered scenes while being computationally tractable. We demonstrate that PMPNBP is orders of magnitude faster than the state-of-the-art Nonparametric Belief Propagation method. Additionally, we show that PMPNBP successfully estimates poses of articulated objects under various simulated occlusion scenarios. To extend our PMPNBP algorithm for tracking object states over continuous observations, we explore ways to propose and preserve hypotheses effectively over time. This resulted in an augmentation-selection method, where hypotheses are drawn from various proposals followed by the selection of a subset using PMPNBP that explained the current state of the objects. We discuss and analyze our augmentation-selection method with its counterparts in belief propagation literature. Furthermore, we develop an inference pipeline for pose estimation and tracking of articulated objects in clutter. In this pipeline, the message passing module with the augmentation-selection method is informed by segmentation heatmaps from a trained neural network. In our experiments, we show that our proposed pipeline can effectively maintain belief and track articulated objects over a sequence of observations under occlusion.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163159/1/kdesingh_1.pd

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    SEMANTIC MODELING OF UTILITY NETWORKS IMPLEMENTATION OF USE CASES FOR DEHRADUN CITY

    Get PDF
    The current on-going boom in the field of Building Information Modeling (BIM) and 3D GIS is widely being explored for vast urban related applications, analyses and simulations. Large amount of 3D city models are created using various sources of data. Substantial studies are carried out for above-surface features in 3D city models. This ensures providing relevant information about various spatial analyses over the urban systems and environment. Relevant researches explored the numerous applications of 3D GIS such as disaster management, city administration, urban and environment planning, environmental studies, etc. Utility infrastructures (overhead, subsurface or on-surface) play the critical role in the urban space; still they are not considered in 3D city models. OGC CityGML is acting well behind these applications by integrating the geographic information, semantics and the various interdependencies information with the 3D city model. Moreover, comparative studies over the existing network data models make Utility Network ADE as the perfect approach for data modelling. This research proposes the methodology for 3D semantic modelling of subsurface water supply network along with its various „subsurface, overhead and on-surface‟ utility network components with the help of OGC CityGML Utility Network ADE. This study is conducted for Dehradun area. As a result, semantically modelled utility network data is utilised for successful implementation of use cases such as 1) areas affected by utility failure; 2) street space affected by utility maintenance; and; 3) visualisation. These cases are investigated along with urban space

    A Study on Robustness and Semantic Understanding of Visual Models

    Get PDF
    Vision models have improved in popularity and performance on many tasks since the emergence of large-scale datasets, improved access to computational resources, and new model architectures like the transformer. However, it is still not well understood if these models can be deployed in the real world. Because these models are blackbox architectures, we do not fully understand what these models are truly learning. An understanding of what models learn underneath the hood would result in better improvements for real-world scenarios. Motivated by this, we benchmark these impressive visual models using newly proposed datasets and tasks on their robustness and their general understanding, using semantics as both a probe and an area of improvement. We first propose a new task of graphical representation for video, using language as a semantic signal to enable quick and interpretable video understanding through cross-attention between language and video. We then explore robustness of video action-recognition models. Given real-world shifts from the original video distribution deep learning models are trained on, where do models fail, and how can we improve these failures. Next, we explore the robustness of video-language models for text-to-video retrieval. Given real-world shifts in either the video or the text distribution models were trained on, how are models failing, and where can improvements be made. Findings in this work indicated visual-language models may struggle with human-level understanding. So, we next benchmark visual-language models on conceptual understandings of object-relations, attribute-object relations, and context-object relations by proposing new datasets. Across all works in this dissertation, we empirically provide both weaknesses and strengths of large, vision models and potential areas of improvement. Through this research, we aim to contribute to the advancement of computer vision model understanding, paving the way for more robust and generalizable models that can effectively handle real-world scenarios

    Web-Based Management of Public Buildings: A Workflow Based on Integration of BIM and IoT Sensors with a Web–GIS Portal

    Get PDF
    In this paper, we present the final results from the research project “Urban Abacus of Building Energy Performances (Abaco Urbano Energeticodegli Edifci–AUREE)” aimed at supporting the renovation process and energy efficiency enhancement of urban building stocks. The crux of the AUREE project is a Web–GIS GeoBlog portal with customized semantic dashboards aimed at sharing information on an urban built environment and promoting the participation of local stakeholders in its improvement. As the latest development of this research, a workflow that integrates the AUREE portal with BIM authoring and an open-source IoT platform is implemented and applied to an experimental case study concerning a public building in Carbonia (Italy). The headquarters of the Sotacarbo Sustainable Energy Research Center was selected as the case study. The presented results proved that it was possible to create a valid open system, which was accessible to both specialist and unskilled users, and aimed at guiding, through a progressive knowledge deepening, common end-users toward proper conscious “energy behaviors” as well as public administrations and decision-makers toward sustainable facility management. Later, the proposed open system could also be suitable to be used as an effective tool to support the rising “energy communities”
    • …
    corecore