600 research outputs found

    Robust Scene Estimation for Goal-directed Robotic Manipulation in Unstructured Environments

    Full text link
    To make autonomous robots "taskable" so that they function properly and interact fluently with human partners, they must be able to perceive and understand the semantic aspects of their environments. More specifically, they must know what objects exist and where they are in the unstructured human world. Progresses in robot perception, especially in deep learning, have greatly improved for detecting and localizing objects. However, it still remains a challenge for robots to perform a highly reliable scene estimation in unstructured environments that is determined by robustness, adaptability and scale. In this dissertation, we address the scene estimation problem under uncertainty, especially in unstructured environments. We enable robots to build a reliable object-oriented representation that describes objects present in the environment, as well as inter-object spatial relations. Specifically, we focus on addressing following challenges for reliable scene estimation: 1) robust perception under uncertainty results from noisy sensors, objects in clutter and perceptual aliasing, 2) adaptable perception in adverse conditions by combined deep learning and probabilistic generative methods, 3) scalable perception as the number of objects grows and the structure of objects becomes more complex (e.g. objects in dense clutter). Towards realizing robust perception, our objective is to ground raw sensor observations into scene states while dealing with uncertainty from sensor measurements and actuator control . Scene states are represented as scene graphs, where scene graphs denote parameterized axiomatic statements that assert relationships between objects and their poses. To deal with the uncertainty, we present a pure generative approach, Axiomatic Scene Estimation (AxScEs). AxScEs estimates a probabilistic distribution across plausible scene graph hypotheses describing the configuration of objects. By maintaining a diverse set of possible states, the proposed approach demonstrates the robustness to the local minimum in the scene graph state space and effectiveness for manipulation-quality perception based on edit distance on scene graphs. To scale up to more unstructured scenarios and be adaptable to adversarial scenarios, we present Sequential Scene Understanding and Manipulation (SUM), which estimates the scene as a collection of objects in cluttered environments. SUM is a two-stage method that leverages the accuracy and efficiency from convolutional neural networks (CNNs) with probabilistic inference methods. Despite the strength from CNNs, they are opaque in understanding how the decisions are made and fragile for generalizing beyond overfit training samples in adverse conditions (e.g., changes in illumination). The probabilistic generative method complements these weaknesses and provides an avenue for adaptable perception. To scale up to densely cluttered environments where objects are physically touching with severe occlusions, we present GeoFusion, which fuses noisy observations from multiple frames by exploring geometric consistency at object level. Geometric consistency characterizes geometric compatibility between objects and geometric similarity between observations and objects. It reasons about geometry at the object-level, offering a fast and reliable way to be robust to semantic perceptual aliasing. The proposed approach demonstrates greater robustness and accuracy than the state-of-the-art pose estimation approach.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163060/1/zsui_1.pd

    Toward knowledge-based automatic 3D spatial topological modeling from LiDAR point clouds for urban areas

    Get PDF
    Le traitement d'un très grand nombre de données LiDAR demeure très coûteux et nécessite des approches de modélisation 3D automatisée. De plus, les nuages de points incomplets causés par l'occlusion et la densité ainsi que les incertitudes liées au traitement des données LiDAR compliquent la création automatique de modèles 3D enrichis sémantiquement. Ce travail de recherche vise à développer de nouvelles solutions pour la création automatique de modèles géométriques 3D complets avec des étiquettes sémantiques à partir de nuages de points incomplets. Un cadre intégrant la connaissance des objets à la modélisation 3D est proposé pour améliorer la complétude des modèles géométriques 3D en utilisant un raisonnement qualitatif basé sur les informations sémantiques des objets et de leurs composants, leurs relations géométriques et spatiales. De plus, nous visons à tirer parti de la connaissance qualitative des objets en reconnaissance automatique des objets et à la création de modèles géométriques 3D complets à partir de nuages de points incomplets. Pour atteindre cet objectif, plusieurs solutions sont proposées pour la segmentation automatique, l'identification des relations topologiques entre les composants de l'objet, la reconnaissance des caractéristiques et la création de modèles géométriques 3D complets. (1) Des solutions d'apprentissage automatique ont été proposées pour la segmentation sémantique automatique et la segmentation de type CAO afin de segmenter des objets aux structures complexes. (2) Nous avons proposé un algorithme pour identifier efficacement les relations topologiques entre les composants d'objet extraits des nuages de points afin d'assembler un modèle de Représentation Frontière. (3) L'intégration des connaissances sur les objets et la reconnaissance des caractéristiques a été développée pour inférer automatiquement les étiquettes sémantiques des objets et de leurs composants. Afin de traiter les informations incertitudes, une solution de raisonnement automatique incertain, basée sur des règles représentant la connaissance, a été développée pour reconnaître les composants du bâtiment à partir d'informations incertaines extraites des nuages de points. (4) Une méthode heuristique pour la création de modèles géométriques 3D complets a été conçue en utilisant les connaissances relatives aux bâtiments, les informations géométriques et topologiques des composants du bâtiment et les informations sémantiques obtenues à partir de la reconnaissance des caractéristiques. Enfin, le cadre proposé pour améliorer la modélisation 3D automatique à partir de nuages de points de zones urbaines a été validé par une étude de cas visant à créer un modèle de bâtiment 3D complet. L'expérimentation démontre que l'intégration des connaissances dans les étapes de la modélisation 3D est efficace pour créer un modèle de construction complet à partir de nuages de points incomplets.The processing of a very large set of LiDAR data is very costly and necessitates automatic 3D modeling approaches. In addition, incomplete point clouds caused by occlusion and uneven density and the uncertainties in the processing of LiDAR data make it difficult to automatic creation of semantically enriched 3D models. This research work aims at developing new solutions for the automatic creation of complete 3D geometric models with semantic labels from incomplete point clouds. A framework integrating knowledge about objects in urban scenes into 3D modeling is proposed for improving the completeness of 3D geometric models using qualitative reasoning based on semantic information of objects and their components, their geometric and spatial relations. Moreover, we aim at taking advantage of the qualitative knowledge of objects in automatic feature recognition and further in the creation of complete 3D geometric models from incomplete point clouds. To achieve this goal, several algorithms are proposed for automatic segmentation, the identification of the topological relations between object components, feature recognition and the creation of complete 3D geometric models. (1) Machine learning solutions have been proposed for automatic semantic segmentation and CAD-like segmentation to segment objects with complex structures. (2) We proposed an algorithm to efficiently identify topological relationships between object components extracted from point clouds to assemble a Boundary Representation model. (3) The integration of object knowledge and feature recognition has been developed to automatically obtain semantic labels of objects and their components. In order to deal with uncertain information, a rule-based automatic uncertain reasoning solution was developed to recognize building components from uncertain information extracted from point clouds. (4) A heuristic method for creating complete 3D geometric models was designed using building knowledge, geometric and topological relations of building components, and semantic information obtained from feature recognition. Finally, the proposed framework for improving automatic 3D modeling from point clouds of urban areas has been validated by a case study aimed at creating a complete 3D building model. Experiments demonstrate that the integration of knowledge into the steps of 3D modeling is effective in creating a complete building model from incomplete point clouds

    On Space Syntax as a Configurational Theory of Architecture from a Situated Observer's Viewpoint

    Get PDF
    A confi gurational theory of architecture (CTA) from a situated observer’s viewpoint (SOV) is discussed. It includes the levels of description-proper, representation, and interpretation. It takes a bottom-up approach because a situated observer, who is on the ground with a building, typically builds her understanding of the building using immediately available elements, called perceptual primitives. Evidence from geometry, psychology/cognition, and spatial reasoning suggests that the level of description-proper of a CTA from a SOV must include unambiguously defi ned perceptual primitives and their perceivable elementary topological and projective relations. Subsequently, in the levels of representation and interpretation any complex relational properties of buildings must be constructed and their meanings must be explained using these perceptual primitives. Early space syntax (SS), with its foundations defi ned using such perceptual primitives as convex space and axial lines, helps capture the structure of visual experience of buildings but has limitations regarding a CTA from a SOV. More recently, SS theorists have revised the foundations of SS using much simpler perceptual primitives in an attempt to integrate the apparently disparate techniques of SS into a coherent mathematical system. As a result, they have eliminated many limitations of early SS regarding a CTA from a SOV. However, in order to become a CTA from a SOV, SS will still need to explain the importance of these newly defi ned perceptual primitives, and - provide a framework for confi gurational studies using the mathematical system developed based on these primitives

    A topological solution to object segmentation and tracking

    Full text link
    The world is composed of objects, the ground, and the sky. Visual perception of objects requires solving two fundamental challenges: segmenting visual input into discrete units, and tracking identities of these units despite appearance changes due to object deformation, changing perspective, and dynamic occlusion. Current computer vision approaches to segmentation and tracking that approach human performance all require learning, raising the question: can objects be segmented and tracked without learning? Here, we show that the mathematical structure of light rays reflected from environment surfaces yields a natural representation of persistent surfaces, and this surface representation provides a solution to both the segmentation and tracking problems. We describe how to generate this surface representation from continuous visual input, and demonstrate that our approach can segment and invariantly track objects in cluttered synthetic video despite severe appearance changes, without requiring learning.Comment: 21 pages, 6 main figures, 3 supplemental figures, and supplementary material containing mathematical proof

    The weight of phonetic substance in the structure of sound inventories

    Get PDF
    In the research field initiated by Lindblom & Liljencrants in 1972, we illustrate the possibility of giving substance to phonology, predicting the structure of phonological systems with nonphonological principles, be they listener-oriented (perceptual contrast and stability) or speaker-oriented (articulatory contrast and economy). We proposed for vowel systems the Dispersion-Focalisation Theory (Schwartz et al., 1997b). With the DFT, we can predict vowel systems using two competing perceptual constraints weighted with two parameters, respectively λ and α. The first one aims at increasing auditory distances between vowel spectra (dispersion), the second one aims at increasing the perceptual salience of each spectrum through formant proximities (focalisation). We also introduced new variants based on research in physics - namely, phase space (λ,α) and polymorphism of a given phase, or superstructures in phonological organisations (Vallée et al., 1999) which allow us to generate 85.6% of 342 UPSID systems from 3- to 7-vowel qualities. No similar theory for consonants seems to exist yet. Therefore we present in detail a typology of consonants, and then suggest ways to explain plosive vs. fricative and voiceless vs. voiced consonants predominances by i) comparing them with language acquisition data at the babbling stage and looking at the capacity to acquire relatively different linguistic systems in relation with the main degrees of freedom of the articulators; ii) showing that the places “preferred” for each manner are at least partly conditioned by the morphological constraints that facilitate or complicate, make possible or impossible the needed articulatory gestures, e.g. the complexity of the articulatory control for voicing and the aerodynamics of fricatives. A rather strict coordination between the glottis and the oral constriction is needed to produce acceptable voiced fricatives (Mawass et al., 2000). We determine that the region where the combinations of Ag (glottal area) and Ac (constriction area) values results in a balance between the voice and noise components is indeed very narrow. We thus demonstrate that some of the main tendencies in the phonological vowel and consonant structures of the world’s languages can be explained partly by sensorimotor constraints, and argue that actually phonology can take part in a theory of Perception-for-Action-Control

    Efficient Belief Propagation for Perception and Manipulation in Clutter

    Full text link
    Autonomous service robots are required to perform tasks in common human indoor environments. To achieve goals associated with these tasks, the robot should continually perceive, reason its environment, and plan to manipulate objects, which we term as goal-directed manipulation. Perception remains the most challenging aspect of all stages, as common indoor environments typically pose problems in recognizing objects under inherent occlusions with physical interactions among themselves. Despite recent progress in the field of robot perception, accommodating perceptual uncertainty due to partial observations remains challenging and needs to be addressed to achieve the desired autonomy. In this dissertation, we address the problem of perception under uncertainty for robot manipulation in cluttered environments using generative inference methods. Specifically, we aim to enable robots to perceive partially observable environments by maintaining an approximate probability distribution as a belief over possible scene hypotheses. This belief representation captures uncertainty resulting from inter-object occlusions and physical interactions, which are inherently present in clutterred indoor environments. The research efforts presented in this thesis are towards developing appropriate state representations and inference techniques to generate and maintain such belief over contextually plausible scene states. We focus on providing the following features to generative inference while addressing the challenges due to occlusions: 1) generating and maintaining plausible scene hypotheses, 2) reducing the inference search space that typically grows exponentially with respect to the number of objects in a scene, 3) preserving scene hypotheses over continual observations. To generate and maintain plausible scene hypotheses, we propose physics informed scene estimation methods that combine a Newtonian physics engine within a particle based generative inference framework. The proposed variants of our method with and without a Monte Carlo step showed promising results on generating and maintaining plausible hypotheses under complete occlusions. We show that estimating such scenarios would not be possible by the commonly adopted 3D registration methods without the notion of a physical context that our method provides. To scale up the context informed inference to accommodate a larger number of objects, we describe a factorization of scene state into object and object-parts to perform collaborative particle-based inference. This resulted in the Pull Message Passing for Nonparametric Belief Propagation (PMPNBP) algorithm that caters to the demands of the high-dimensional multimodal nature of cluttered scenes while being computationally tractable. We demonstrate that PMPNBP is orders of magnitude faster than the state-of-the-art Nonparametric Belief Propagation method. Additionally, we show that PMPNBP successfully estimates poses of articulated objects under various simulated occlusion scenarios. To extend our PMPNBP algorithm for tracking object states over continuous observations, we explore ways to propose and preserve hypotheses effectively over time. This resulted in an augmentation-selection method, where hypotheses are drawn from various proposals followed by the selection of a subset using PMPNBP that explained the current state of the objects. We discuss and analyze our augmentation-selection method with its counterparts in belief propagation literature. Furthermore, we develop an inference pipeline for pose estimation and tracking of articulated objects in clutter. In this pipeline, the message passing module with the augmentation-selection method is informed by segmentation heatmaps from a trained neural network. In our experiments, we show that our proposed pipeline can effectively maintain belief and track articulated objects over a sequence of observations under occlusion.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163159/1/kdesingh_1.pd

    The geometry of visual experience

    Get PDF
    In this thesis I examine what the geometry of visual experience is. This is a question which to some has had an a priori answer, and to some its relevance to philosophy has been considered questionable. I argue that the question is of philosophical concern partly because it generates an interesting form of the argument from illusion against Direct Realism, and partly because it concerns the phenomenal character of visual experience. I consider and offer objections to a number of arguments, both a priori and empirical, for the conclusion that the geometry of visual experience is not the same as the geometry of the physical environment. Three proposals are argued against: that visual experience underdetermines a geometry; that the geometry is of the spherical non-Euclidean type; and that the geometry is of the hyperbolic non- Euclidean type

    Generating Optimal Curves for Necklace Maps

    Get PDF
    • …
    corecore