Search CORE

720 research outputs found

The psychological reality of rhythm classes: Perceptual studies

Author: Dupoux Emmanuel
Mehler Jacques
Ramus Franck
Publication venue: Universitat Autònoma de Barcelona
Publication date: 01/01/2003
Field of study

Linguists have traditionally classified languages into three rhythm classes, namely stress-timed, syllable-timed and mora-timed languages. However, this classification has remained controversial for various reasons: the search for reliable acoustic cues to the different rhythm types has long remained elusive; some languages are claimed to belong to none of the three classes; and few perceptual studies has bolstered the notion. We have previously proposed an acoustic/phonetic model of the different types of linguistic rhythm, and of their categorisation as such by listeners. Here, we present perceptual experiments that directly test the notion of rhythm classes, our model's predictions, and the question of intermediate languages. Language discrimination experiments were run using a speech resynthesis technique to ensure that only rhythmic cues were available to the subjects. Languages investigated were English, Dutch, Spanish, Catalan and Polish. Our results are consistent with the idea that English and Dutch are stress-timed, Spanish and Catalan are syllable-timed, but Polish seems to be different from any other language studied and thus may constitute a new rhythm class. We propose that perceptual studies tapping the ability to discriminate languages' rhythm are the proper way to generate more empirical data relevant to rhythm typology

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Influence conjointes de la culture libertaire et de la paysannerie

Author: dupoux julien
Publication venue: HAL CCSD
Publication date: 20/11/2013
Field of study

National audience« Tous les paysans sont un peu anarchistes » m’a déclaré un paysan de Peyrelevade, en Corrèze. Qu’est-ce que ça veut dire, qu’ils sont tous plutôt individualistes -chacun pour soi en ce monde barbare- ou bien qu’ils ne s’en laissent pas compter ? Le terme d’anarchiste n’est pas politiquement neutre et je dois avouer, pour ne pas vous abuser sur ma neutralité, qu’à moi il me plaît plutôt. Le paysan qui l’a prononcé était d’ailleurs syndiqué à la Confédération Paysanne, syndicat qui n’est pas majoritaire en Limousin, ma région d’étude, et qui prône plutôt des orientations de gauche ou écologistes. Je me suis plus particulièrement intéressé à plusieurs communes du Plateau de Millevaches où le milieu libertaire est présent ; sachant cela, l’emploi du mot d’anarchiste dans la bouche d’un éleveur semble moins surprenant. Après vous avoir mentionné les raisons pour lesquelles j’ai choisi ce terrain et comment se remarque une présence libertaire ainsi que la façon dont j’ai abordé les paysans, je propose de regarder comment la paysannerie peut s’imprégner du milieu libertaire. Ces observations de terrain peuvent permettre d’analyser une façon d’être paysan, une version très locale, littérale, de la définition du « paysan »

HAL Clermont Université

Where is the length effect? A cross-linguistic study.

Author: Bachoud-Lévi A.C.
Cohen L.
Dupoux E.
Mehler J.
Publication venue
Publication date: 01/01/1998
Field of study

Many models of speech production assume that one cannot begin to articulate a word before all its segmental units are inserted into the articulatory plan. Moreover, some of these models assume that segments are serially inserted from left to right. As a consequence, latencies to name words should increase with word length. In a series of five experiments, however, we showed that the time to name a picture or retrieve a word associated with a symbol is not affected by the length of the word. Experiments 1 and 2 used French materials and participants, while Experiments 3, 4 and 5 were conducted with English materials and participants. These results are discussed in relation to current models of speech production, and previous reports of length effects are reevaluated in light of these findings. We conclude that if words are encoded serially, then articulation can start before an entire phonological word has been encoded

CogPrints Cognitive Sciences Eprint Archive

A destressing "deafness" in French?

Author: Dupoux E.
Mehler J.
Pallier C.
Sebastian N.
Publication venue
Publication date: 01/01/1997
Field of study

French is a language in which accent is mandatory on the last syllable of every content word. In contrast, Spanish uses accent to distinguish different lexical items (e.g., b'ebe vs beb'e). Two population of subjects were tested on the same materials to study whether such linguistic differences have an impact on the perceptual capacities of listeners. In Experiment 1, using an ABX paradigm, we find that French Subjects have a surprising deficit compared to Spanish Subjects in making accent distinctions. In Experiment 2, we find that Spanish subjects cannot ignore irrelevant differences in accent in a phoneme-based ABX task, whereas French Subjects have no difficulty at all. In Experiment 3, we replicate the basic French finding, and find that Spanish subjects benefit from redundant accent information even when phonemic information alone is sufficient to perform the task. In our final Experiment 4, we show that French subjects can hear the acoustic correlates of accent; their problem seem to arise at the level of short term memory. Implications for language-specific processing and acquisition are discussed

CogPrints Cognitive Sciences Eprint Archive

Learning weakly supervised multimodal phoneme embeddings

Author: Chaabouni Rahma
Dunbar Ewan
Dupoux Emmanuel
Zeghidour Neil
Publication venue
Publication date: 01/01/2017
Field of study

Recent works have explored deep architectures for learning multimodal speech representation (e.g. audio and images, articulation and audio) in a supervised way. Here we investigate the role of combining different speech modalities, i.e. audio and visual information representing the lips movements, in a weakly supervised way using Siamese networks and lexical same-different side information. In particular, we ask whether one modality can benefit from the other to provide a richer representation for phone recognition in a weakly supervised setting. We introduce mono-task and multi-task methods for merging speech and visual modalities for phone recognition. The mono-task learning consists in applying a Siamese network on the concatenation of the two modalities, while the multi-task learning receives several different combinations of modalities at train time. We show that multi-task learning enhances discriminability for visual and multimodal inputs while minimally impacting auditory inputs. Furthermore, we present a qualitative analysis of the obtained phone embeddings, and show that cross-modal visual input can improve the discriminability of phonological features which are visually discernable (rounding, open/close, labial place of articulation), resulting in representations that are closer to abstract linguistic features than those based on audio only

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Occlusion resistant learning of intuitive physics from videos

Author: Dupoux Emmanuel
Laptev Ivan
Riochet Ronan
Sivic Josef
Publication venue
Publication date: 30/04/2020
Field of study

To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1

Epenthetic vowels in Japanese: A perceptual illusion?

Author: Christophe Pallier
Emmanuel Dupoux
Jacques Mehler
Kazohiko Kakehi
Yuki Hirose
Publication venue
Publication date: 01/01/1998
Field of study

In four cross-linguistic experiments comparing French and Japanese hearers, we found that the phonotactic properties of Japanese (very reduced set of syllable types) induce Japanese listeners to perceive ``illusory'' vowels inside consonant clusters in VCCV stimuli. In Experiments 1 and 2, we used a continuum of stimuli ranging from no vowel (e.g. ebzo) to a full vowel between the consonants (e.g. ebuzo). Japanese, but not French participants, reported the presence of a vowel [u] between consonants, even in stimuli with no vowel. A speeded ABX discrimination paradigm was used in Experiments 3 and 4, and revealed that Japanese participants had trouble discriminating between VCCV and VCuCV stimuli. French participants, in contrast had problems discriminating items that differ in vowel length (ebuzo vs. ebuuzo), a distinctive contrast in Japanese but not in French. We conclude that models of speech perception have to be revised to account for phonotactically-based assimilations

CogPrints Cognitive Sciences Eprint Archive

A Temporal Coherence Loss Function for Learning Unsupervised Acoustic Embeddings

Author: Dupoux Emmanuel
Synnaeve Gabriel
Publication venue: The Author(s). Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractWe train neural networks of varying depth with a loss function which imposes the output representations to have a temporal profile which looks like that of phonemes. We show that a simple loss function which maximizes the dissimilarity between near frames and long distance frames helps to construct a speech embedding that improves phoneme discriminability, both within and across speakers, even though the loss function only uses within speaker information. However, with too deep an architecture, this loss function yields overfitting, suggesting the need for more data and/or regularization

Elsevier - Publisher Connector

Perceptual adjustment to time-compressed Speech: a cross-linguistic study

Author: Christophe Anne
Dupoux Emmanuel
Mehler Jacques
Pallier Christophe,
Sebastián-Gallés Núria
Publication venue
Publication date: 01/07/1998
Field of study

revious research has shown that, when hearers listen to artificially speeded speech, their performance improves over the course of 10-15 sentences, as if their perceptual system was "adapting" to these fast rates of speech. In this paper, we further investigate the mechanisms that are responsible for such effects. In Experiment 1, we report that, for bilingual speakers of Catalan and Spanish, exposure to compressed sentences in either language improves performance on sentences in the other language. Experiment 2 reports that Catalan/Spanish transfer of performance occurs even in monolingual speakers of Spanish who do not understand Catalan. In Experiment 3, we study another pair of languages--namely, English and French--and report no transfer of adaptation between these two languages for English-French bilinguals. Experiment 4, with monolingual English speakers, assesses transfer of adaptation from French, Dutch, and English toward English. Here we find that there is no adaptation from French and intermediate adaptation from Dutch. We discuss the locus of the adaptation to compressed speech and relate our findings to other cross-linguistic studies in speech perception

Developmental Psychology: A Precursor of Moral Judgment in Human Infants?

Author: Dupoux Emmanuel
Jacob Pierre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

Human infants evaluate social interactions well before they can speak, and show a preference for characters that help others over characters that are not cooperative or are hindering

Archive Electronique - Institut Jean Nicod