653 research outputs found

    Learning efficient haptic shape exploration with a rigid tactile sensor array

    Full text link
    Haptic exploration is a key skill for both robots and humans to discriminate and handle unknown objects or to recognize familiar objects. Its active nature is evident in humans who from early on reliably acquire sophisticated sensory-motor capabilities for active exploratory touch and directed manual exploration that associates surfaces and object properties with their spatial locations. This is in stark contrast to robotics. In this field, the relative lack of good real-world interaction models - along with very restricted sensors and a scarcity of suitable training data to leverage machine learning methods - has so far rendered haptic exploration a largely underdeveloped skill. In the present work, we connect recent advances in recurrent models of visual attention with previous insights about the organisation of human haptic search behavior, exploratory procedures and haptic glances for a novel architecture that learns a generative model of haptic exploration in a simulated three-dimensional environment. The proposed algorithm simultaneously optimizes main perception-action loop components: feature extraction, integration of features over time, and the control strategy, while continuously acquiring data online. We perform a multi-module neural network training, including a feature extractor and a recurrent neural network module aiding pose control for storing and combining sequential sensory data. The resulting haptic meta-controller for the rigid 16×1616 \times 16 tactile sensor array moving in a physics-driven simulation environment, called the Haptic Attention Model, performs a sequence of haptic glances, and outputs corresponding force measurements. The resulting method has been successfully tested with four different objects. It achieved results close to 100%100 \% while performing object contour exploration that has been optimized for its own sensor morphology

    A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data

    Full text link
    Neural-symbolic approaches to machine learning incorporate the advantages from both connectionist and symbolic methods. Typically, these models employ a first module based on a neural architecture to extract features from complex data. Then, these features are processed as symbols by a symbolic engine that provides reasoning, concept structures, composability, better generalization and out-of-distribution learning among other possibilities. However, neural approaches to the grounding of symbols in sensory data, albeit powerful, still require heavy training and tedious labeling for the most part. This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex spatial sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. Following his suggestion, the model extracts atomic features from raw data by computing elemental sequential comparisons in a stream of multivariate numerical values. Higher-level constructs are built from these features by subjecting them to further comparisons in a recursive process. At any stage in the recursion, a concept structure may be obtained from these constructs and features by means of Formal Concept Analysis. Results show that the model is able to produce fairly rich yet human-readable conceptual representations without training. Additionally, the concept structures obtained through the model (i) present high composability, which potentially enables the generation of 'unseen' concepts, (ii) allow formal reasoning, and (iii) have inherent abilities for generalization and out-of-distribution learning. Consequently, this method may offer an interesting angle to current neural-symbolic research. Future work is required to develop a training methodology so that the model can be tested against a larger dataset

    Deep Learning: Our Miraculous Year 1990-1991

    Full text link
    In 2020, we will celebrate that many of the basic ideas behind the deep learning revolution were published three decades ago within fewer than 12 months in our "Annus Mirabilis" or "Miraculous Year" 1990-1991 at TU Munich. Back then, few people were interested, but a quarter century later, neural networks based on these ideas were on over 3 billion devices such as smartphones, and used many billions of times per day, consuming a significant fraction of the world's compute.Comment: 37 pages, 188 references, based on work of 4 Oct 201

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    Emotional Storyteller for Vision Impaired and Hearing-Impaired Children

    Get PDF
    Tellie is an innovative mobile app designed to offer an immersive and emotionally enriched storytelling experience for children who are visually and hearing impaired. It achieves this through four main objectives: Text extraction utilizes the CRAFT model and a combination of Convolutional Neural Networks (CNNs), Connectionist Temporal Classification (CTC), and Long Short-Term Memory (LSTM) networks to accurately extract and recognize text from images in storybooks. Recognition of Emotions in Sentences employs BERT to detect and distinguish emotions at the sentence level including happiness, anger, sadness, and surprise. Conversion of Text to Human Natural Audio with Emotion transforms text into emotionally expressive audio using Tacotron2 and Wave Glow, enhancing the synthesized speech with emotional styles to create engaging audio narratives. Conversion of Text to Sign Language: To cater to the Deaf and hard-of-hearing community, Tellie translates text into sign language using CNNs, ensuring alignment with real sign language expressions. These objectives combine to create Tellie, a groundbreaking app that empowers visually and hearing-impaired children with access to captivating storytelling experiences, promoting accessibility and inclusivity through the harmonious integration of language, creativity, and technology. This research demonstrates the potential of advanced technologies in fostering inclusive and emotionally engaging storytelling for all children

    MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Learning via Interactive Perception

    Full text link
    A holistic understanding of object properties across diverse sensory modalities (e.g., visual, audio, and haptic) is essential for tasks ranging from object categorization to complex manipulation. Drawing inspiration from cognitive science studies that emphasize the significance of multi-sensory integration in human perception, we introduce MOSAIC (Multimodal Object property learning with Self-Attention and Interactive Comprehension), a novel framework designed to facilitate the learning of unified multi-sensory object property representations. While it is undeniable that visual information plays a prominent role, we acknowledge that many fundamental object properties extend beyond the visual domain to encompass attributes like texture, mass distribution, or sounds, which significantly influence how we interact with objects. In MOSAIC, we leverage this profound insight by distilling knowledge from multimodal foundation models and aligning these representations not only across vision but also haptic and auditory sensory modalities. Through extensive experiments on a dataset where a humanoid robot interacts with 100 objects across 10 exploratory behaviors, we demonstrate the versatility of MOSAIC in two task families: object categorization and object-fetching tasks. Our results underscore the efficacy of MOSAIC's unified representations, showing competitive performance in category recognition through a simple linear probe setup and excelling in the fetch object task under zero-shot transfer conditions. This work pioneers the application of sensory grounding in foundation models for robotics, promising a significant leap in multi-sensory perception capabilities for autonomous systems. We have released the code, datasets, and additional results: https://github.com/gtatiya/MOSAIC.Comment: Accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA), May 13 to 17, 2024; Yokohama, Japa

    Determining object geometry with compliance and simple sensors

    Full text link
    • …
    corecore