4,136 research outputs found

    Encoding of phonology in a recurrent neural model of grounded speech

    Full text link
    We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and phoneme discrimination we show that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retain at the top recurrent layer. We further find out that the attention mechanism following the top recurrent layer significantly attenuates encoding of phonology and makes the utterance embeddings much more invariant to synonymy. Moreover, a hierarchical clustering of phoneme representations learned by the network shows an organizational structure of phonemes similar to those proposed in linguistics.Comment: Accepted at CoNLL 201

    Ongoing Emergence: A Core Concept in Epigenetic Robotics

    Get PDF
    We propose ongoing emergence as a core concept in epigenetic robotics. Ongoing emergence refers to the continuous development and integration of new skills and is exhibited when six criteria are satisfied: (1) continuous skill acquisition, (2) incorporation of new skills with existing skills, (3) autonomous development of values and goals, (4) bootstrapping of initial skills, (5) stability of skills, and (6) reproducibility. In this paper we: (a) provide a conceptual synthesis of ongoing emergence based on previous theorizing, (b) review current research in epigenetic robotics in light of ongoing emergence, (c) provide prototypical examples of ongoing emergence from infant development, and (d) outline computational issues relevant to creating robots exhibiting ongoing emergence

    Emerging Linguistic Functions in Early Infancy

    Get PDF
    This paper presents results from experimental studies on early language acquisition in infants and attempts to interpret the experimental results within the framework of the Ecological Theory of Language Acquisition (ETLA) recently proposed by (Lacerda et al., 2004a). From this perspective, the infant’s first steps in the acquisition of the ambient language are seen as a consequence of the infant’s general capacity to represent sensory input and the infant’s interaction with other actors in its immediate ecological environment. On the basis of available experimental evidence, it will be argued that ETLA offers a productive alternative to traditional descriptive views of the language acquisition process by presenting an operative model of how early linguistic function may emerge through interaction

    Learning weakly supervised multimodal phoneme embeddings

    Full text link
    Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, in a weakly supervised way using Siamese networks and lexical same-different side information. In particular, we ask whether one modality can benefit from the other to provide a richer representation for phone recognition in a weakly supervised setting. We introduce mono-task and multi-task methods for merging speech and visual modalities for phone recognition. The mono-task learning consists in applying a Siamese network on the concatenation of the two modalities, while the multi-task learning receives several different combinations of modalities at train time. We show that multi-task learning enhances discriminability for visual and multimodal inputs while minimally impacting auditory inputs. Furthermore, we present a qualitative analysis of the obtained phone embeddings, and show that cross-modal visual input can improve the discriminability of phonological features which are visually discernable (rounding, open/close, labial place of articulation), resulting in representations that are closer to abstract linguistic features than those based on audio only

    Introduction: The Third International Conference on Epigenetic Robotics

    Get PDF
    This paper summarizes the paper and poster contributions to the Third International Workshop on Epigenetic Robotics. The focus of this workshop is on the cross-disciplinary interaction of developmental psychology and robotics. Namely, the general goal in this area is to create robotic models of the psychological development of various behaviors. The term "epigenetic" is used in much the same sense as the term "developmental" and while we could call our topic "developmental robotics", developmental robotics can be seen as having a broader interdisciplinary emphasis. Our focus in this workshop is on the interaction of developmental psychology and robotics and we use the phrase "epigenetic robotics" to capture this focus

    Evaluating computational models of infant phonetic learning across languages

    Get PDF
    In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in infants from the speech input they hear have been lacking. A recent study presented the first such model, drawing on algorithms proposed for unsupervised learning from naturalistic speech, and tested it on a single phone contrast. Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infants' discrimination patterns. The five models display varying degrees of agreement with empirical observations, showing that our approach can help decide between candidate mechanisms for early phonetic learning, and providing insight into which aspects of the models are critical for capturing infants' perceptual development.Comment: 7 pages, 1 figur

    An emergentist perspective on the origin of number sense

    Get PDF
    open2noopenZorzi, Marco; Testolin, AlbertoZorzi, Marco; Testolin, Albert
    • 

    corecore