6 research outputs found

    Visually Indicated Sounds

    Get PDF
    Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We show that the sounds predicted by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that they convey significant information about material properties and physical interactions

    Knock-Knock: Acoustic Object Recognition using Stacked Denoising Autoencoders

    Get PDF
    This paper presents a successful application of deep learning for object recognition based on acoustic data. The shortcomings of previously employed approaches where handcrafted features describing the acoustic data are being used, include limiting the capability of the found representation to be widely applicable and facing the risk of capturing only insignificant characteristics for a task. In contrast, there is no need to define the feature representation format when using multilayer/deep learning architecture methods: features can be learned from raw sensor data without defining discriminative characteristics a-priori. In this paper, stacked denoising autoencoders are applied to train a deep learning model. Knocking each object in our test set 120 times with a marker pen to obtain the auditory data, thirty different objects were successfully classified in our experiment and each object was knocked 120 times by a marker pen to obtain the auditory data. By employing the proposed deep learning framework, a high accuracy of 91.50% was achieved. A traditional method using handcrafted features with a shallow classifier was taken as a benchmark and the attained recognition rate was only 58.22%. Interestingly, a recognition rate of 82.00% was achieved when using a shallow classifier with raw acoustic data as input. In addition, we could show that the time taken to classify one object using deep learning was far less (by a factor of more than 6) than utilizing the traditional method. It was also explored how different model parameters in our deep architecture affect the recognition performance.Comment: 6 pages, 10 figures, Neurocomputin

    Behavior-grounded multi-sensory object perception and exploration by a humanoid robot

    Get PDF
    Infants use exploratory behaviors to learn about the objects around them. Psychologists have theorized that behaviors such as touching, pressing, lifting, and dropping enable infants to form grounded object representations. For example, scratching an object can provide information about its roughness, while lifting it can provide information about its weight. In a sense, the exploratory behavior acts as a ``question\u27\u27 to the object, which is subsequently ``answered by the sensory stimuli produced during the execution of the behavior. In contrast, most object representations used by robots today rely solely on computer vision or laser scan data, gathered through passive observation. Such disembodied approaches to robotic perception may be useful for recognizing an object using a 3D model database, but nevertheless, will fail to infer object properties that cannot be detected using vision alone. To bridge this gap, this dissertation introduces a framework for object perception and exploration in which the robot\u27s representation of objects is grounded in its own sensorimotor experience with them. In this framework, an object is represented by sensorimotor contingencies that span a diverse set of exploratory behaviors and sensory modalities. The results from several large-scale experimental studies show that the behavior-grounded object representation enables a robot to solve a wide variety of tasks including recognition of objects based on the stimuli that they produce, object grouping and sorting, and learning category labels that describe objects and their properties

    Interactive learning of the acoustic properties of household objects

    No full text
    Abstract — Human beings can perceive object properties such as size, weight, and material type based solely on the sounds thattheobjectsmakewhenanactionisperformedonthem. In order to be successful, the household robots of the near future must also be capable of learning and reasoning about the acoustic properties of everyday objects. Such an ability would allow a robot to detect and classify various interactions with objects that occur outside of the robot’s field of view. This paper presents a framework that allows a robot to infer the object and the type of behavioral interaction performed with it from the sounds generated by the object during the interaction. The framework is evaluated on a 7-d.o.f. Barrett WAM robot which performs grasping, shaking, dropping, pushing and tapping behaviors on 36 different household objects. The results show that the robot can learn models that can be used to recognize objects (and behaviors performed on objects) from the sounds generated during the interaction. In addition, the robot can use the learned models to estimate the similarity between two objects in terms of their acoustic properties. I
    corecore