16,158 research outputs found
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Objects that Sound
In this paper our objectives are, first, networks that can embed audio and
visual inputs into a common space that is suitable for cross-modal retrieval;
and second, a network that can localize the object that sounds in an image,
given the audio signal. We achieve both these objectives by training from
unlabelled video using only audio-visual correspondence (AVC) as the objective
function. This is a form of cross-modal self-supervision from video.
To this end, we design new network architectures that can be trained for
cross-modal retrieval and localizing the sound source in an image, by using the
AVC task. We make the following contributions: (i) show that audio and visual
embeddings can be learnt that enable both within-mode (e.g. audio-to-audio) and
between-mode retrieval; (ii) explore various architectures for the AVC task,
including those for the visual stream that ingest a single image, or multiple
images, or a single image and multi-frame optical flow; (iii) show that the
semantic object that sounds within an image can be localized (using only the
sound, no motion or flow information); and (iv) give a cautionary tale on how
to avoid undesirable shortcuts in the data preparation.Comment: Appears in: European Conference on Computer Vision (ECCV) 201
Unsupervised Learning of Artistic Styles with Archetypal Style Analysis
In this paper, we introduce an unsupervised learning approach to
automatically discover, summarize, and manipulate artistic styles from large
collections of paintings. Our method is based on archetypal analysis, which is
an unsupervised learning technique akin to sparse coding with a geometric
interpretation. When applied to deep image representations from a collection of
artworks, it learns a dictionary of archetypal styles, which can be easily
visualized. After training the model, the style of a new image, which is
characterized by local statistics of deep visual features, is approximated by a
sparse convex combination of archetypes. This enables us to interpret which
archetypal styles are present in the input image, and in which proportion.
Finally, our approach allows us to manipulate the coefficients of the latent
archetypal decomposition, and achieve various special effects such as style
enhancement, transfer, and interpolation between multiple archetypes.Comment: Accepted at NIPS 2018, Montr\'eal, Canad
Learning Representations in Model-Free Hierarchical Reinforcement Learning
Common approaches to Reinforcement Learning (RL) are seriously challenged by
large-scale applications involving huge state spaces and sparse delayed reward
feedback. Hierarchical Reinforcement Learning (HRL) methods attempt to address
this scalability issue by learning action selection policies at multiple levels
of temporal abstraction. Abstraction can be had by identifying a relatively
small set of states that are likely to be useful as subgoals, in concert with
the learning of corresponding skill policies to achieve those subgoals. Many
approaches to subgoal discovery in HRL depend on the analysis of a model of the
environment, but the need to learn such a model introduces its own problems of
scale. Once subgoals are identified, skills may be learned through intrinsic
motivation, introducing an internal reward signal marking subgoal attainment.
In this paper, we present a novel model-free method for subgoal discovery using
incremental unsupervised learning over a small memory of the most recent
experiences (trajectories) of the agent. When combined with an intrinsic
motivation learning mechanism, this method learns both subgoals and skills,
based on experiences in the environment. Thus, we offer an original approach to
HRL that does not require the acquisition of a model of the environment,
suitable for large-scale applications. We demonstrate the efficiency of our
method on two RL problems with sparse delayed feedback: a variant of the rooms
environment and the first screen of the ATARI 2600 Montezuma's Revenge game
- …