6 research outputs found
Visually Indicated Sounds
Objects make distinctive sounds when they are hit or scratched. These sounds
reveal aspects of an object's material properties, as well as the actions that
produced them. In this paper, we propose the task of predicting what sound an
object makes when struck as a way of studying physical interactions within a
visual scene. We present an algorithm that synthesizes sound from silent videos
of people hitting and scratching objects with a drumstick. This algorithm uses
a recurrent neural network to predict sound features from videos and then
produces a waveform from these features with an example-based synthesis
procedure. We show that the sounds predicted by our model are realistic enough
to fool participants in a "real or fake" psychophysical experiment, and that
they convey significant information about material properties and physical
interactions
Knock-Knock: Acoustic Object Recognition using Stacked Denoising Autoencoders
This paper presents a successful application of deep learning for object
recognition based on acoustic data. The shortcomings of previously employed
approaches where handcrafted features describing the acoustic data are being
used, include limiting the capability of the found representation to be widely
applicable and facing the risk of capturing only insignificant characteristics
for a task. In contrast, there is no need to define the feature representation
format when using multilayer/deep learning architecture methods: features can
be learned from raw sensor data without defining discriminative characteristics
a-priori. In this paper, stacked denoising autoencoders are applied to train a
deep learning model. Knocking each object in our test set 120 times with a
marker pen to obtain the auditory data, thirty different objects were
successfully classified in our experiment and each object was knocked 120 times
by a marker pen to obtain the auditory data. By employing the proposed deep
learning framework, a high accuracy of 91.50% was achieved. A traditional
method using handcrafted features with a shallow classifier was taken as a
benchmark and the attained recognition rate was only 58.22%. Interestingly, a
recognition rate of 82.00% was achieved when using a shallow classifier with
raw acoustic data as input. In addition, we could show that the time taken to
classify one object using deep learning was far less (by a factor of more than
6) than utilizing the traditional method. It was also explored how different
model parameters in our deep architecture affect the recognition performance.Comment: 6 pages, 10 figures, Neurocomputin
Behavior-grounded multi-sensory object perception and exploration by a humanoid robot
Infants use exploratory behaviors to learn about the objects around them. Psychologists have theorized that behaviors such as touching, pressing, lifting, and dropping enable infants to form grounded object representations. For example, scratching an object can provide information about its roughness, while lifting it can provide information about its weight. In a sense, the exploratory behavior acts as a ``question\u27\u27 to the object, which is subsequently ``answered by the sensory stimuli produced during the execution of the behavior. In contrast, most object representations used by robots today rely solely on computer vision or laser scan data, gathered through passive observation. Such disembodied approaches to robotic perception may be useful for recognizing an object using a 3D model database, but nevertheless, will fail to infer object properties that cannot be detected using vision alone. To bridge this gap, this dissertation introduces a framework for object perception and exploration in which the robot\u27s representation of objects is grounded in its own sensorimotor experience with them. In this framework, an object is represented by sensorimotor contingencies that span a diverse set of exploratory behaviors and sensory modalities. The results from several large-scale experimental studies show that the behavior-grounded object representation enables a robot to solve a wide variety of tasks including recognition of objects based on the stimuli that they produce, object grouping and sorting, and learning category labels that describe objects and their properties
Kıvrak bacaklı robotlarda gürbüz görsel algının işitsel algı desteğinde geliştirilmesi ve otonom navigasyon amaçlı uygulanması
TÜBİTAK EEEAG Proje01.10.201
Interactive learning of the acoustic properties of household objects
Abstract — Human beings can perceive object properties such as size, weight, and material type based solely on the sounds thattheobjectsmakewhenanactionisperformedonthem. In order to be successful, the household robots of the near future must also be capable of learning and reasoning about the acoustic properties of everyday objects. Such an ability would allow a robot to detect and classify various interactions with objects that occur outside of the robot’s field of view. This paper presents a framework that allows a robot to infer the object and the type of behavioral interaction performed with it from the sounds generated by the object during the interaction. The framework is evaluated on a 7-d.o.f. Barrett WAM robot which performs grasping, shaking, dropping, pushing and tapping behaviors on 36 different household objects. The results show that the robot can learn models that can be used to recognize objects (and behaviors performed on objects) from the sounds generated during the interaction. In addition, the robot can use the learned models to estimate the similarity between two objects in terms of their acoustic properties. I