1,669 research outputs found
Creating and exploiting multimodal annotated corpora
International audienceThe paper presents a project of the Laboratoire Parole et Langage which aims at collecting, annotating and exploiting a corpus of spoken French in a multimodal perspective. The project directly meets the present needs in linguistics where a growing number of researchers become aware of the fact that a theory of communication which aims at describing real interactions should take into account the complexity of these interactions. However, in order to take into account such a complexity, linguists should have access to spoken corpora annotated in different fields. The paper presents the annotation schemes used in phonetics, morphology and syntax, prosody, gestuality at the LPL together with the type of linguistic description made from the annotations seen in two examples
Semi-Supervised Speech Emotion Recognition with Ladder Networks
Speech emotion recognition (SER) systems find applications in various fields
such as healthcare, education, and security and defense. A major drawback of
these systems is their lack of generalization across different conditions. This
problem can be solved by training models on large amounts of labeled data from
the target domain, which is expensive and time-consuming. Another approach is
to increase the generalization of the models. An effective way to achieve this
goal is by regularizing the models through multitask learning (MTL), where
auxiliary tasks are learned along with the primary task. These methods often
require the use of labeled data which is computationally expensive to collect
for emotion recognition (gender, speaker identity, age or other emotional
descriptors). This study proposes the use of ladder networks for emotion
recognition, which utilizes an unsupervised auxiliary task. The primary task is
a regression problem to predict emotional attributes. The auxiliary task is the
reconstruction of intermediate feature representations using a denoising
autoencoder. This auxiliary task does not require labels so it is possible to
train the framework in a semi-supervised fashion with abundant unlabeled data
from the target domain. This study shows that the proposed approach creates a
powerful framework for SER, achieving superior performance than fully
supervised single-task learning (STL) and MTL baselines. The approach is
implemented with several acoustic features, showing that ladder networks
generalize significantly better in cross-corpus settings. Compared to the STL
baselines, the proposed approach achieves relative gains in concordance
correlation coefficient (CCC) between 3.0% and 3.5% for within corpus
evaluations, and between 16.1% and 74.1% for cross corpus evaluations,
highlighting the power of the architecture
Creating Comparable Multimodal Corpora for Nordic Languages
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), 153-160.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
Timing Relationships between Speech and Co-Verbal Gestures in Spontaneous French
International audienceSeveral studies have described the links between gesture and speech in terms of timing, most of them concentrating on the production of hand gestures during speech or during pauses (Beattie & Aboudan, 1994; Nobe, 2000). Other studies have focused on the anticipation, synchronization or delay of gestures regarding their co-occurrence with speech (Schegloff, 1984; McNeill, 1992, 2005; Kipp, 2003; Loehr, 2004; Chui, 2005; Kida & Faraco, 2008; Leonard and Cummins, 2009) and we would like to participate in the debate in the present paper. We studied the timing relationships between iconic gestures and their lexical affiliates (Kipp, Neff et al., 2001) in a corpus of French conversational speech involving 6 speakers and annotated both in Praat (Boersma & Weenink, 2009) and Anvil (Kipp, 2001). The timing relationships we observed concerned the position of the gesture stroke as compared to that of the lexical affiliate and the Intonation Phrase, as well as the position of the gesture Phrase as regards that of the Intonation Phrase. The main results show that although gesture and speech are co-occurring, gestures generally start before the related speech segment
Timing Relationships between Speech and Co-Verbal Gestures in Spontaneous French
International audienceSeveral studies have described the links between gesture and speech in terms of timing, most of them concentrating on the production of hand gestures during speech or during pauses (Beattie & Aboudan, 1994; Nobe, 2000). Other studies have focused on the anticipation, synchronization or delay of gestures regarding their co-occurrence with speech (Schegloff, 1984; McNeill, 1992, 2005; Kipp, 2003; Loehr, 2004; Chui, 2005; Kida & Faraco, 2008; Leonard and Cummins, 2009) and we would like to participate in the debate in the present paper. We studied the timing relationships between iconic gestures and their lexical affiliates (Kipp, Neff et al., 2001) in a corpus of French conversational speech involving 6 speakers and annotated both in Praat (Boersma & Weenink, 2009) and Anvil (Kipp, 2001). The timing relationships we observed concerned the position of the gesture stroke as compared to that of the lexical affiliate and the Intonation Phrase, as well as the position of the gesture Phrase as regards that of the Intonation Phrase. The main results show that although gesture and speech are co-occurring, gestures generally start before the related speech segment
Final FLaReNet deliverable: Language Resources for the Future - The Future of Language Resources
Language Technologies (LT), together with their backbone, Language Resources (LR), provide an essential support to the challenge of Multilingualism and ICT of the future. The main task of language technologies is to bridge language barriers and to help creating a new environment where information flows smoothly across frontiers and languages, no matter the country, and the language, of origin. To achieve this goal, all players involved need to act as a community able to join forces on a set of shared priorities. However, until now the field of Language Resources and Technology has long suffered from an excess of individuality and fragmentation, with a lack of coherence concerning the priorities for the field, the direction to move, not to mention a common timeframe. The context encountered by the FLaReNet project was thus represented by an active field needing a coherence that can only be given by sharing common priorities and endeavours. FLaReNet has contributed to the creation of this coherence by gathering a wide community of experts and making them participate in the definition of an exhaustive set of recommendations
Exploiting `Subjective' Annotations
Many interesting phenomena in conversation can only be annotated as a subjective task, requiring interpretative judgements from annotators. This leads to data which is annotated with lower levels of agreement not only due to errors in the annotation, but also due to the differences in how annotators interpret conversations. This paper constitutes an attempt to find out how subjective annotations with a low level of agreement can profitably be used for machine learning purposes. We analyse the (dis)agreements between annotators for two different cases in a multimodal annotated corpus and explicitly relate the results to the way machine-learning algorithms perform on the annotated data. Finally we present two new concepts, namely `subjective entity' classifiers resp. `consensus objective' classifiers, and give recommendations for using subjective data in machine-learning applications.\u
- …