653 research outputs found
Learning efficient haptic shape exploration with a rigid tactile sensor array
Haptic exploration is a key skill for both robots and humans to discriminate
and handle unknown objects or to recognize familiar objects. Its active nature
is evident in humans who from early on reliably acquire sophisticated
sensory-motor capabilities for active exploratory touch and directed manual
exploration that associates surfaces and object properties with their spatial
locations. This is in stark contrast to robotics. In this field, the relative
lack of good real-world interaction models - along with very restricted sensors
and a scarcity of suitable training data to leverage machine learning methods -
has so far rendered haptic exploration a largely underdeveloped skill. In the
present work, we connect recent advances in recurrent models of visual
attention with previous insights about the organisation of human haptic search
behavior, exploratory procedures and haptic glances for a novel architecture
that learns a generative model of haptic exploration in a simulated
three-dimensional environment. The proposed algorithm simultaneously optimizes
main perception-action loop components: feature extraction, integration of
features over time, and the control strategy, while continuously acquiring data
online. We perform a multi-module neural network training, including a feature
extractor and a recurrent neural network module aiding pose control for storing
and combining sequential sensory data. The resulting haptic meta-controller for
the rigid tactile sensor array moving in a physics-driven
simulation environment, called the Haptic Attention Model, performs a sequence
of haptic glances, and outputs corresponding force measurements. The resulting
method has been successfully tested with four different objects. It achieved
results close to while performing object contour exploration that has
been optimized for its own sensor morphology
A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data
Neural-symbolic approaches to machine learning incorporate the advantages
from both connectionist and symbolic methods. Typically, these models employ a
first module based on a neural architecture to extract features from complex
data. Then, these features are processed as symbols by a symbolic engine that
provides reasoning, concept structures, composability, better generalization
and out-of-distribution learning among other possibilities. However, neural
approaches to the grounding of symbols in sensory data, albeit powerful, still
require heavy training and tedious labeling for the most part. This paper
presents a new symbolic-only method for the generation of hierarchical concept
structures from complex spatial sensory data. The approach is based on
Bateson's notion of difference as the key to the genesis of an idea or a
concept. Following his suggestion, the model extracts atomic features from raw
data by computing elemental sequential comparisons in a stream of multivariate
numerical values. Higher-level constructs are built from these features by
subjecting them to further comparisons in a recursive process. At any stage in
the recursion, a concept structure may be obtained from these constructs and
features by means of Formal Concept Analysis. Results show that the model is
able to produce fairly rich yet human-readable conceptual representations
without training. Additionally, the concept structures obtained through the
model (i) present high composability, which potentially enables the generation
of 'unseen' concepts, (ii) allow formal reasoning, and (iii) have inherent
abilities for generalization and out-of-distribution learning. Consequently,
this method may offer an interesting angle to current neural-symbolic research.
Future work is required to develop a training methodology so that the model can
be tested against a larger dataset
Deep Learning: Our Miraculous Year 1990-1991
In 2020, we will celebrate that many of the basic ideas behind the deep
learning revolution were published three decades ago within fewer than 12
months in our "Annus Mirabilis" or "Miraculous Year" 1990-1991 at TU Munich.
Back then, few people were interested, but a quarter century later, neural
networks based on these ideas were on over 3 billion devices such as
smartphones, and used many billions of times per day, consuming a significant
fraction of the world's compute.Comment: 37 pages, 188 references, based on work of 4 Oct 201
Neurocognitive Informatics Manifesto.
Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given
Emotional Storyteller for Vision Impaired and Hearing-Impaired Children
Tellie is an innovative mobile app designed to offer an immersive and emotionally enriched storytelling experience for children who are visually and hearing impaired. It achieves this through four main objectives: Text extraction utilizes the CRAFT model and a combination of Convolutional Neural Networks (CNNs), Connectionist Temporal Classification (CTC), and Long Short-Term Memory (LSTM) networks to accurately extract and recognize text from images in storybooks. Recognition of Emotions in Sentences employs BERT to detect and distinguish emotions at the sentence level including happiness, anger, sadness, and surprise. Conversion of Text to Human Natural Audio with Emotion transforms text into emotionally expressive audio using Tacotron2 and Wave Glow, enhancing the synthesized speech with emotional styles to create engaging audio narratives. Conversion of Text to Sign Language: To cater to the Deaf and hard-of-hearing community, Tellie translates text into sign language using CNNs, ensuring alignment with real sign language expressions. These objectives combine to create Tellie, a groundbreaking app that empowers visually and hearing-impaired children with access to captivating storytelling experiences, promoting accessibility and inclusivity through the harmonious integration of language, creativity, and technology. This research demonstrates the potential of advanced technologies in fostering inclusive and emotionally engaging storytelling for all children
MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Learning via Interactive Perception
A holistic understanding of object properties across diverse sensory
modalities (e.g., visual, audio, and haptic) is essential for tasks ranging
from object categorization to complex manipulation. Drawing inspiration from
cognitive science studies that emphasize the significance of multi-sensory
integration in human perception, we introduce MOSAIC (Multimodal Object
property learning with Self-Attention and Interactive Comprehension), a novel
framework designed to facilitate the learning of unified multi-sensory object
property representations. While it is undeniable that visual information plays
a prominent role, we acknowledge that many fundamental object properties extend
beyond the visual domain to encompass attributes like texture, mass
distribution, or sounds, which significantly influence how we interact with
objects. In MOSAIC, we leverage this profound insight by distilling knowledge
from multimodal foundation models and aligning these representations not only
across vision but also haptic and auditory sensory modalities. Through
extensive experiments on a dataset where a humanoid robot interacts with 100
objects across 10 exploratory behaviors, we demonstrate the versatility of
MOSAIC in two task families: object categorization and object-fetching tasks.
Our results underscore the efficacy of MOSAIC's unified representations,
showing competitive performance in category recognition through a simple linear
probe setup and excelling in the fetch object task under zero-shot transfer
conditions. This work pioneers the application of sensory grounding in
foundation models for robotics, promising a significant leap in multi-sensory
perception capabilities for autonomous systems. We have released the code,
datasets, and additional results: https://github.com/gtatiya/MOSAIC.Comment: Accepted to the 2024 IEEE International Conference on Robotics and
Automation (ICRA), May 13 to 17, 2024; Yokohama, Japa
- …