7 research outputs found
ΠΠ½Π°Π»ΠΈΠ· ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΠΎΠ³ΠΎ ΠΈ ΠΌΠ°ΡΠ΅ΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠ΅Π½ΠΈΡ Π΄Π»Ρ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊΠ°
Π ΡΡΠ°ΡΡΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ Π°Π½Π°Π»ΠΈΡΠΈΡΠ΅ΡΠΊΠΈΠΉ ΠΎΠ±Π·ΠΎΡ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠΉ Π² ΠΎΠ±Π»Π°ΡΡΠΈ Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΉ. ΠΡΠΎ Π½Π°ΠΏΡΠ°Π²Π»Π΅Π½ΠΈΠ΅ ΡΠ²Π»ΡΠ΅ΡΡΡ ΡΠΎΡΡΠ°Π²Π»ΡΡΡΠ΅ΠΉ ΠΈΡΠΊΡΡΡΡΠ²Π΅Π½Π½ΠΎΠ³ΠΎ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΠ°, ΠΈ ΠΈΠ·ΡΡΠ°Π΅Ρ ΠΌΠ΅ΡΠΎΠ΄Ρ, Π°Π»Π³ΠΎΡΠΈΡΠΌΡ ΠΈ ΡΠΈΡΡΠ΅ΠΌΡ Π΄Π»Ρ Π°Π½Π°Π»ΠΈΠ·Π° Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊΠ° ΠΏΡΠΈ Π΅Π³ΠΎ Π²Π·Π°ΠΈΠΌΠΎΠ΄Π΅ΠΉΡΡΠ²ΠΈΠΈ Ρ Π΄ΡΡΠ³ΠΈΠΌΠΈ Π»ΡΠ΄ΡΠΌΠΈ, ΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠ½ΡΠΌΠΈ ΡΠΈΡΡΠ΅ΠΌΠ°ΠΌΠΈ ΠΈΠ»ΠΈ ΡΠΎΠ±ΠΎΡΠ°ΠΌΠΈ. Π ΠΎΠ±Π»Π°ΡΡΠΈ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΡΠ°Π»ΡΠ½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° Π΄Π°Π½Π½ΡΡ
ΠΏΠΎΠ΄ Π°ΡΡΠ΅ΠΊΡΠΎΠΌ ΠΏΠΎΠ΄ΡΠ°Π·ΡΠΌΠ΅Π²Π°Π΅ΡΡΡ ΠΏΡΠΎΡΠ²Π»Π΅Π½ΠΈΠ΅ ΠΏΡΠΈΡ
ΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΠΈΡ
ΡΠ΅Π°ΠΊΡΠΈΠΉ Π½Π° Π²ΠΎΠ·Π±ΡΠΆΠ΄Π°Π΅ΠΌΠΎΠ΅ ΡΠΎΠ±ΡΡΠΈΠ΅, ΠΊΠΎΡΠΎΡΠΎΠ΅ ΠΌΠΎΠΆΠ΅Ρ ΠΏΡΠΎΡΠ΅ΠΊΠ°ΡΡ ΠΊΠ°ΠΊ Π² ΠΊΡΠ°ΡΠΊΠΎΡΡΠΎΡΠ½ΠΎΠΌ, ΡΠ°ΠΊ ΠΈ Π² Π΄ΠΎΠ»Π³ΠΎΡΡΠΎΡΠ½ΠΎΠΌ ΠΏΠ΅ΡΠΈΠΎΠ΄Π΅, Π° ΡΠ°ΠΊΠΆΠ΅ ΠΈΠΌΠ΅ΡΡ ΡΠ°Π·Π»ΠΈΡΠ½ΡΡ ΠΈΠ½ΡΠ΅Π½ΡΠΈΠ²Π½ΠΎΡΡΡ ΠΏΠ΅ΡΠ΅ΠΆΠΈΠ²Π°Π½ΠΈΠΉ. ΠΡΡΠ΅ΠΊΡΡ Π² ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΠΌΠΎΠΉ ΠΎΠ±Π»Π°ΡΡΠΈ ΡΠ°Π·Π΄Π΅Π»Π΅Π½Ρ Π½Π° 4 Π²ΠΈΠ΄Π°: Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΠ΅ ΡΠΌΠΎΡΠΈΠΈ, Π±Π°Π·ΠΎΠ²ΡΠ΅ ΡΠΌΠΎΡΠΈΠΈ, Π½Π°ΡΡΡΠΎΠ΅Π½ΠΈΠ΅ ΠΈ Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΠ΅ ΡΠ°ΡΡΡΡΠΎΠΉΡΡΠ²Π°. ΠΡΠΎΡΠ²Π»Π΅Π½ΠΈΠ΅ Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ ΠΎΡΡΠ°ΠΆΠ°Π΅ΡΡΡ Π² Π²Π΅ΡΠ±Π°Π»ΡΠ½ΡΡ
Π΄Π°Π½Π½ΡΡ
ΠΈ Π½Π΅Π²Π΅ΡΠ±Π°Π»ΡΠ½ΡΡ
Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠ°Ρ
ΠΏΠΎΠ²Π΅Π΄Π΅Π½ΠΈΡ: Π°ΠΊΡΡΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΈ Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΡΠ΅ΡΠΊΠΈΡ
Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠΈΡΡΠΈΠΊΠ°Ρ
ΡΠ΅ΡΠΈ, ΠΌΠΈΠΌΠΈΠΊΠ΅, ΠΆΠ΅ΡΡΠ°Ρ
ΠΈ ΠΏΠΎΠ·Π°Ρ
ΡΠ΅Π»ΠΎΠ²Π΅ΠΊΠ°. Π ΠΎΠ±Π·ΠΎΡΠ΅ ΠΏΡΠΈΠ²ΠΎΠ΄ΠΈΡΡΡ ΡΡΠ°Π²Π½ΠΈΡΠ΅Π»ΡΠ½ΡΠΉ Π°Π½Π°Π»ΠΈΠ· ΡΡΡΠ΅ΡΡΠ²ΡΡΡΠ΅Π³ΠΎ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΎΠ½Π½ΠΎΠ³ΠΎ ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠ΅Π½ΠΈΡ Π΄Π»Ρ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊΠ° Π½Π° ΠΏΡΠΈΠΌΠ΅ΡΠ΅ ΡΠΌΠΎΡΠΈΠΉ, ΡΠ΅Π½ΡΠΈΠΌΠ΅Π½ΡΠ°, Π°Π³ΡΠ΅ΡΡΠΈΠΈ ΠΈ Π΄Π΅ΠΏΡΠ΅ΡΡΠΈΠΈ. ΠΠ΅ΠΌΠ½ΠΎΠ³ΠΎΡΠΈΡΠ»Π΅Π½Π½ΡΠ΅ ΡΡΡΡΠΊΠΎΡΠ·ΡΡΠ½ΡΠ΅ Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΠ΅ Π±Π°Π·Ρ Π΄Π°Π½Π½ΡΡ
ΠΏΠΎΠΊΠ° ΡΡΡΠ΅ΡΡΠ²Π΅Π½Π½ΠΎ ΡΡΡΡΠΏΠ°ΡΡ ΠΏΠΎ ΠΎΠ±ΡΠ΅ΠΌΡ ΠΈ ΠΊΠ°ΡΠ΅ΡΡΠ²Ρ ΡΠ»Π΅ΠΊΡΡΠΎΠ½Π½ΡΠΌ ΡΠ΅ΡΡΡΡΠ°ΠΌ Π½Π° Π΄ΡΡΠ³ΠΈΡ
ΠΌΠΈΡΠΎΠ²ΡΡ
ΡΠ·ΡΠΊΠ°Ρ
, ΡΡΠΎ ΠΎΠ±ΡΡΠ»Π°Π²Π»ΠΈΠ²Π°Π΅Ρ Π½Π΅ΠΎΠ±Ρ
ΠΎΠ΄ΠΈΠΌΠΎΡΡΡ ΡΠ°ΡΡΠΌΠΎΡΡΠ΅Π½ΠΈΡ ΡΠΈΡΠΎΠΊΠΎΠ³ΠΎ ΡΠΏΠ΅ΠΊΡΡΠ° Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡΠ΅Π»ΡΠ½ΡΡ
ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ΠΎΠ², ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² ΠΈ Π°Π»Π³ΠΎΡΠΈΡΠΌΠΎΠ², ΠΏΡΠΈΠΌΠ΅Π½ΡΠ΅ΠΌΡΡ
Π² ΡΡΠ»ΠΎΠ²ΠΈΡΡ
ΠΎΠ³ΡΠ°Π½ΠΈΡΠ΅Π½Π½ΠΎΠ³ΠΎ ΠΎΠ±ΡΠ΅ΠΌΠ° ΠΎΠ±ΡΡΠ°ΡΡΠΈΡ
ΠΈ ΡΠ΅ΡΡΠΎΠ²ΡΡ
Π΄Π°Π½Π½ΡΡ
, ΠΈ ΡΡΠ°Π²ΠΈΡ Π·Π°Π΄Π°ΡΡ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠΈ Π½ΠΎΠ²ΡΡ
ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ΠΎΠ² ΠΊ Π°ΡΠ³ΠΌΠ΅Π½ΡΠ°ΡΠΈΠΈ Π΄Π°Π½Π½ΡΡ
, ΠΏΠ΅ΡΠ΅Π½ΠΎΡΡ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ ΠΈ Π°Π΄Π°ΠΏΡΠ°ΡΠΈΠΈ ΠΈΠ½ΠΎΡΠ·ΡΡΠ½ΡΡ
ΡΠ΅ΡΡΡΡΠΎΠ². Π ΡΡΠ°ΡΡΠ΅ ΠΏΡΠΈΠ²ΠΎΠ΄ΠΈΡΡΡ ΠΎΠΏΠΈΡΠ°Π½ΠΈΠ΅ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² Π°Π½Π°Π»ΠΈΠ·Π° ΠΎΠ΄Π½ΠΎΠΌΠΎΠ΄Π°Π»ΡΠ½ΠΎΠΉ Π²ΠΈΠ·ΡΠ°Π»ΡΠ½ΠΎΠΉ, Π°ΠΊΡΡΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΈ Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ, Π° ΡΠ°ΠΊΠΆΠ΅ ΠΌΠ½ΠΎΠ³ΠΎΠΌΠΎΠ΄Π°Π»ΡΠ½ΡΡ
ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ΠΎΠ² ΠΊ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ. ΠΠ½ΠΎΠ³ΠΎΠΌΠΎΠ΄Π°Π»ΡΠ½ΡΠΉ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ ΠΊ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΌΡ Π°Π½Π°Π»ΠΈΠ·Ρ Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ ΠΏΠΎΠ²ΡΡΠΈΡΡ ΡΠΎΡΠ½ΠΎΡΡΡ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΠΌΡΡ
ΡΠ²Π»Π΅Π½ΠΈΠΉ ΠΎΡΠ½ΠΎΡΠΈΡΠ΅Π»ΡΠ½ΠΎ ΠΎΠ΄Π½ΠΎΠΌΠΎΠ΄Π°Π»ΡΠ½ΡΡ
ΡΠ΅ΡΠ΅Π½ΠΈΠΉ. Π ΠΎΠ±Π·ΠΎΡΠ΅ ΠΎΡΠΌΠ΅ΡΠ΅Π½Π° ΡΠ΅Π½Π΄Π΅Π½ΡΠΈΡ ΡΠΎΠ²ΡΠ΅ΠΌΠ΅Π½Π½ΡΡ
ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠΉ, Π·Π°ΠΊΠ»ΡΡΠ°ΡΡΠ°ΡΡΡ Π² ΡΠΎΠΌ, ΡΡΠΎ Π½Π΅ΠΉΡΠΎΡΠ΅ΡΠ΅Π²ΡΠ΅ ΠΌΠ΅ΡΠΎΠ΄Ρ ΠΏΠΎΡΡΠ΅ΠΏΠ΅Π½Π½ΠΎ Π²ΡΡΠ΅ΡΠ½ΡΡΡ ΠΊΠ»Π°ΡΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ Π΄Π΅ΡΠ΅ΡΠΌΠΈΠ½ΠΈΡΠΎΠ²Π°Π½Π½ΡΠ΅ ΠΌΠ΅ΡΠΎΠ΄Ρ Π±Π»Π°Π³ΠΎΠ΄Π°ΡΡ Π»ΡΡΡΠ΅ΠΌΡ ΠΊΠ°ΡΠ΅ΡΡΠ²Ρ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ ΠΈ ΠΎΠΏΠ΅ΡΠ°ΡΠΈΠ²Π½ΠΎΠΉ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠ΅ Π±ΠΎΠ»ΡΡΠΎΠ³ΠΎ ΠΎΠ±ΡΠ΅ΠΌΠ° Π΄Π°Π½Π½ΡΡ
. Π ΡΡΠ°ΡΡΠ΅ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°ΡΡΡΡ ΠΌΠ΅ΡΠΎΠ΄Ρ Π°Π½Π°Π»ΠΈΠ·Π° Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ. ΠΡΠ΅ΠΈΠΌΡΡΠ΅ΡΡΠ²ΠΎΠΌ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ ΠΌΠ½ΠΎΠ³ΠΎΠ·Π°Π΄Π°ΡΠ½ΡΡ
ΠΈΠ΅ΡΠ°ΡΡ
ΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ΠΎΠ² ΡΠ²Π»ΡΠ΅ΡΡΡ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡΡ ΠΈΠ·Π²Π»Π΅ΠΊΠ°ΡΡ Π½ΠΎΠ²ΡΠ΅ ΡΠΈΠΏΡ Π·Π½Π°Π½ΠΈΠΉ, Π² ΡΠΎΠΌ ΡΠΈΡΠ»Π΅ ΠΎ Π²Π»ΠΈΡΠ½ΠΈΠΈ, ΠΊΠΎΡΡΠ΅Π»ΡΡΠΈΠΈ ΠΈ Π²Π·Π°ΠΈΠΌΠΎΠ΄Π΅ΠΉΡΡΠ²ΠΈΠΈ Π½Π΅ΡΠΊΠΎΠ»ΡΠΊΠΈΡ
Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ Π΄ΡΡΠ³ Π½Π° Π΄ΡΡΠ³Π°, ΡΡΠΎ ΠΏΠΎΡΠ΅Π½ΡΠΈΠ°Π»ΡΠ½ΠΎ Π²Π»Π΅ΡΠ΅Ρ ΠΊ ΡΠ»ΡΡΡΠ΅Π½ΠΈΡ ΠΊΠ°ΡΠ΅ΡΡΠ²Π° ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ. ΠΡΠΈΠ²ΠΎΠ΄ΡΡΡΡ ΠΏΠΎΡΠ΅Π½ΡΠΈΠ°Π»ΡΠ½ΡΠ΅ ΡΡΠ΅Π±ΠΎΠ²Π°Π½ΠΈΡ ΠΊ ΡΠ°Π·ΡΠ°Π±Π°ΡΡΠ²Π°Π΅ΠΌΡΠΌ ΡΠΈΡΡΠ΅ΠΌΠ°ΠΌ Π°Π½Π°Π»ΠΈΠ·Π° Π°ΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΡ
ΡΠΎΡΡΠΎΡΠ½ΠΈΠΉ ΠΈ ΠΎΡΠ½ΠΎΠ²Π½ΡΠ΅ Π½Π°ΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ Π΄Π°Π»ΡΠ½Π΅ΠΉΡΠΈΡ
ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠΉ
A Neural Network Architecture for Childrenβs AudioβVisual Emotion Recognition
Detecting and understanding emotions are critical for our daily activities. As emotion recognition (ER) systems develop, we start looking at more difficult cases than just acted adult audioβvisual speech. In this work, we investigate the automatic classification of the audioβvisual emotional speech of children, which presents several challenges including the lack of publicly available annotated datasets and the low performance of the state-of-the art audioβvisual ER systems. In this paper, we present a new corpus of childrenβs audioβvisual emotional speech that we collected. Then, we propose a neural network solution that improves the utilization of the temporal relationships between audio and video modalities in the cross-modal fusion for childrenβs audioβvisual emotion recognition. We select a state-of-the-art neural network architecture as a baseline and present several modifications focused on a deeper learning of the cross-modal temporal relationships using attention. By conducting experiments with our proposed approach and the selected baseline model, we observe a relative improvement in performance by 2%. Finally, we conclude that focusing more on the cross-modal temporal relationships may be beneficial for building ER systems for childβmachine communications and environments where qualified professionals work with children
Automatic Speech Emotion Recognition of Younger School Age Children
This paper introduces the extended description of a database that contains emotional speech in the Russian language of younger school age (8–12-year-old) children and describes the results of validation of the database based on classical machine learning algorithms, such as Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP). The validation is performed using standard procedures and scenarios of the validation similar to other well-known databases of children’s emotional acting speech. Performance evaluation of automatic multiclass recognition on four emotion classes “Neutral (Calm)—Joy—Sadness—Anger” shows the superiority of SVM performance and also MLP performance over the results of perceptual tests. Moreover, the results of automatic recognition on the test dataset which was used in the perceptual test are even better. These results prove that emotions in the database can be reliably recognized both by experts and automatically using classical machine learning algorithms such as SVM and MLP, which can be used as baselines for comparing emotion recognition systems based on more sophisticated modern machine learning methods and deep neural networks. The results also confirm that this database can be a valuable resource for researchers studying affective reactions in speech communication during child-computer interactions in the Russian language and can be used to develop various edutainment, health care, etc. applications
Strategies of Speech Interaction between Adults and Preschool Children with Typical and Atypical Development
The goal of this research is to study the speech strategies of adults’ interactions with 4–7-year-old children. The participants are “mother–child” dyads with typically developing (TD, n = 40) children, children with autism spectrum disorders (ASDs, n = 20), Down syndrome (DS, n = 10), and “experimenter–orphan” pairs (n = 20). Spectrographic, linguistic, phonetic, and perceptual analyses (n = 465 listeners) of children’s speech and mothers’ speech (MS) are executed. The analysis of audio records by listeners (n = 10) and the elements of nonverbal behavior on the basis of video records by experts (n = 5) are made. Differences in the speech behavior strategies of mothers during interactions with TD children, children with ASD, and children with DS are revealed. The different strategies of “mother–child” interactions depending on the severity of the child’s developmental disorders and the child’s age are described. The same features of MS addressed to TD children with low levels of speech formation are used in MS directed to children with atypical development. The acoustic features of MS correlated with a high level of TD child speech development do not lead to a similar correlation in dyads with ASD and DS children. The perceptual and phonetic features of the speech of children of all groups are described
Emotion, age, and gender classification in children's speech by humans and machines
In this article, we present the first child emotional speech corpus in Russian, called EmoChildRu, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: playing with a standard set of toys; repetition of words from a toy-parrot in a game store setting; watching a cartoon and retelling of the story, respectively. This corpus is designed to study the reflection of the emotional state in the characteristics of voice and speech and for studies of the formation of emotional states in ontogenesis. A portion of the corpus is annotated for three emotional states (comfort, discomfort, neutral). Additional data include the results of the adult listeners' analysis of child speech, questionnaires, as well as annotation for gender and age in months. We also provide several baselines, comparing human and machine estimation on this corpus for prediction of age, gender and comfort state. While in age estimation, the acoustics-based automatic systems show higher performance, they do not reach human perception levels in comfort state and gender classification. The comparative results indicate the importance and necessity of developing further linguistic models for discrimination. (C) 2017 Elsevier Ltd. All rights reserved.Russian Foundation for Basic ResearchRussian Foundation for Basic Research (RFBR) [10-00-000.24, 15-06-07852, 16-37-60100]; Russian Foundation for Basic Research DHSS [17-06-00503]; Government of Russia [074-U01]; Bogazici UniversityBogazici University [BAP 16A01P4]; BAGEP Award of the Science Academy; [MD-254.2017.8]The work was supported by the Russian Foundation for Basic Research (grant nos. 10-00-000.24, 15-06-07852, and 16-37-60100), Russian Foundation for Basic Research DHSS (grant No 17-06-00503), by the grant of the President of Russia (project No MD-254.2017.8), by the Government of Russia (grant No 074-U01), by Bogazici University (project BAP 16A01P4) and by the BAGEP Award of the Science Academy
Bridging Social Sciences and AI for Understanding Child Behaviour
Child behaviour is a topic of wide scientific interest among many different disciplines, including social and behavioural sciences and artificial intelligence (AI). In this workshop, we aimed to connect researchers from these fields to address topics such as the usage of AI to better understand and model child behavioural and developmental processes, challenges and opportunities for AI in large-scale child behaviour analysis and implementing explainable ML/AI on sensitive child data. The workshop served as a successful first step towards this goal and attracted contributions from different research disciplines on the analysis of child behaviour. This paper provides a summary of the activities of the workshop and the accepted papers and abstracts