37,730 research outputs found

    Audio Visual Language Maps for Robot Navigation

    Full text link
    While interacting in the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. In this work, we propose Audio-Visual-Language Maps (AVLMaps), a unified 3D spatial map representation for storing cross-modal information from audio, visual, and language cues. AVLMaps integrate the open-vocabulary capabilities of multimodal foundation models pre-trained on Internet-scale data by fusing their features into a centralized 3D voxel grid. In the context of navigation, we show that AVLMaps enable robot systems to index goals in the map based on multimodal queries, e.g., textual descriptions, images, or audio snippets of landmarks. In particular, the addition of audio information enables robots to more reliably disambiguate goal locations. Extensive experiments in simulation show that AVLMaps enable zero-shot multimodal goal navigation from multimodal prompts and provide 50% better recall in ambiguous scenarios. These capabilities extend to mobile robots in the real world - navigating to landmarks referring to visual, audio, and spatial concepts. Videos and code are available at: https://avlmaps.github.io.Comment: Project page: https://avlmaps.github.io

    Learning Representations in Model-Free Hierarchical Reinforcement Learning

    Full text link
    Common approaches to Reinforcement Learning (RL) are seriously challenged by large-scale applications involving huge state spaces and sparse delayed reward feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address this scalability issue by learning action selection policies at multiple levels of temporal abstraction. Abstraction can be had by identifying a relatively small set of states that are likely to be useful as subgoals, in concert with the learning of corresponding skill policies to achieve those subgoals. Many approaches to subgoal discovery in HRL depend on the analysis of a model of the environment, but the need to learn such a model introduces its own problems of scale. Once subgoals are identified, skills may be learned through intrinsic motivation, introducing an internal reward signal marking subgoal attainment. In this paper, we present a novel model-free method for subgoal discovery using incremental unsupervised learning over a small memory of the most recent experiences (trajectories) of the agent. When combined with an intrinsic motivation learning mechanism, this method learns both subgoals and skills, based on experiences in the environment. Thus, we offer an original approach to HRL that does not require the acquisition of a model of the environment, suitable for large-scale applications. We demonstrate the efficiency of our method on two RL problems with sparse delayed feedback: a variant of the rooms environment and the first screen of the ATARI 2600 Montezuma's Revenge game

    Perceiving Smellscapes

    Get PDF
    We perceive smells as perduring complex entities within a distal array that might be conceived of as smellscapes. However, the philosophical orthodoxy of Odor Theories has been to deny that smells are perceived as having a distal location. Recent challenges have been mounted to Odor Theories’ veracity in handling the timescale of olfactory perception, how it individuates odors as a distal entities, and their claim that olfactory perception is not spatial. The paper does not aim to dispute these criticisms. Rather, what will be shown is that Molecular Structure Theory, a refinement of Odor Theory, can be further developed to handle these challenges. The theory is further refined by focusing on distal perception that requires considering the perceptual object as mereologically complex persisting odor against a background scene conceived of as a smellscape. What will be offered is an expansion of Molecular Structure Theory to account for distal smell perception within natural environments

    A half century of progress towards a unified neural theory of mind and brain with applications to autonomous adaptive agents and mental disorders

    Full text link
    Invited article for the book Artificial Intelligence in the Age of Neural Networks and Brain Computing R. Kozma, C. Alippi, Y. Choe, and F. C. Morabito, Eds. Cambridge, MA: Academic PressThis article surveys some of the main design principles, mechanisms, circuits, and architectures that have been discovered during a half century of systematic research aimed at developing a unified theory that links mind and brain, and shows how psychological functions arise as emergent properties of brain mechanisms. The article describes a theoretical method that has enabled such a theory to be developed in stages by carrying out a kind of conceptual evolution. It also describes revolutionary computational paradigms like Complementary Computing and Laminar Computing that constrain the kind of unified theory that can describe the autonomous adaptive intelligence that emerges from advanced brains. Adaptive Resonance Theory, or ART, is one of the core models that has been discovered in this way. ART proposes how advanced brains learn to attend, recognize, and predict objects and events in a changing world that is filled with unexpected events. ART is not, however, a “theory of everything” if only because, due to Complementary Computing, different matching and learning laws tend to support perception and cognition on the one hand, and spatial representation and action on the other. The article mentions why a theory of this kind may be useful in the design of autonomous adaptive agents in engineering and technology. It also notes how the theory has led to new mechanistic insights about mental disorders such as autism, medial temporal amnesia, Alzheimer’s disease, and schizophrenia, along with mechanistically informed proposals about how their symptoms may be ameliorated

    A study of existing Ontologies in the IoT-domain

    Get PDF
    Several domains have adopted the increasing use of IoT-based devices to collect sensor data for generating abstractions and perceptions of the real world. This sensor data is multi-modal and heterogeneous in nature. This heterogeneity induces interoperability issues while developing cross-domain applications, thereby restricting the possibility of reusing sensor data to develop new applications. As a solution to this, semantic approaches have been proposed in the literature to tackle problems related to interoperability of sensor data. Several ontologies have been proposed to handle different aspects of IoT-based sensor data collection, ranging from discovering the IoT sensors for data collection to applying reasoning on the collected sensor data for drawing inferences. In this paper, we survey these existing semantic ontologies to provide an overview of the recent developments in this field. We highlight the fundamental ontological concepts (e.g., sensor-capabilities and context-awareness) required for an IoT-based application, and survey the existing ontologies which include these concepts. Based on our study, we also identify the shortcomings of currently available ontologies, which serves as a stepping stone to state the need for a common unified ontology for the IoT domain.Comment: Submitted to Elsevier JWS SI on Web semantics for the Internet/Web of Thing

    A tesselated probabilistic representation for spatial robot perception and navigation

    Get PDF
    The ability to recover robust spatial descriptions from sensory information and to efficiently utilize these descriptions in appropriate planning and problem-solving activities are crucial requirements for the development of more powerful robotic systems. Traditional approaches to sensor interpretation, with their emphasis on geometric models, are of limited use for autonomous mobile robots operating in and exploring unknown and unstructured environments. Here, researchers present a new approach to robot perception that addresses such scenarios using a probabilistic tesselated representation of spatial information called the Occupancy Grid. The Occupancy Grid is a multi-dimensional random field that maintains stochastic estimates of the occupancy state of each cell in the grid. The cell estimates are obtained by interpreting incoming range readings using probabilistic models that capture the uncertainty in the spatial information provided by the sensor. A Bayesian estimation procedure allows the incremental updating of the map using readings taken from several sensors over multiple points of view. An overview of the Occupancy Grid framework is given, and its application to a number of problems in mobile robot mapping and navigation are illustrated. It is argued that a number of robotic problem-solving activities can be performed directly on the Occupancy Grid representation. Some parallels are drawn between operations on Occupancy Grids and related image processing operations

    YOLO-BEV: Generating Bird's-Eye View in the Same Way as 2D Object Detection

    Full text link
    Vehicle perception systems strive to achieve comprehensive and rapid visual interpretation of their surroundings for improved safety and navigation. We introduce YOLO-BEV, an efficient framework that harnesses a unique surrounding cameras setup to generate a 2D bird's-eye view of the vehicular environment. By strategically positioning eight cameras, each at a 45-degree interval, our system captures and integrates imagery into a coherent 3x3 grid format, leaving the center blank, providing an enriched spatial representation that facilitates efficient processing. In our approach, we employ YOLO's detection mechanism, favoring its inherent advantages of swift response and compact model structure. Instead of leveraging the conventional YOLO detection head, we augment it with a custom-designed detection head, translating the panoramically captured data into a unified bird's-eye view map of ego car. Preliminary results validate the feasibility of YOLO-BEV in real-time vehicular perception tasks. With its streamlined architecture and potential for rapid deployment due to minimized parameters, YOLO-BEV poses as a promising tool that may reshape future perspectives in autonomous driving systems
    corecore