11 research outputs found

    АналитичСский ΠΎΠ±Π·ΠΎΡ€ Π°ΡƒΠ΄ΠΈΠΎΠ²ΠΈΠ·ΡƒΠ°Π»ΡŒΠ½Ρ‹Ρ… систСм для опрСдСлСния срСдств ΠΈΠ½Π΄ΠΈΠ²ΠΈΠ΄ΡƒΠ°Π»ΡŒΠ½ΠΎΠΉ Π·Π°Ρ‰ΠΈΡ‚Ρ‹ Π½Π° Π»ΠΈΡ†Π΅ Ρ‡Π΅Π»ΠΎΠ²Π΅ΠΊΠ°

    Get PDF
    Начиная с 2019 Π³ΠΎΠ΄Π° всС страны ΠΌΠΈΡ€Π° ΡΡ‚ΠΎΠ»ΠΊΠ½ΡƒΠ»ΠΈΡΡŒ со ΡΡ‚Ρ€Π΅ΠΌΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹ΠΌ распространСниСм ΠΏΠ°Π½Π΄Π΅ΠΌΠΈΠΈ, Π²Ρ‹Π·Π²Π°Π½Π½ΠΎΠΉ коронавирусной ΠΈΠ½Ρ„Π΅ΠΊΡ†ΠΈΠ΅ΠΉ COVID-19, Π±ΠΎΡ€ΡŒΠ±Π° с ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΉ продолТаСтся ΠΌΠΈΡ€ΠΎΠ²Ρ‹ΠΌ сообщСством ΠΈ ΠΏΠΎ настоящСС врСмя. НСсмотря Π½Π° ΠΎΡ‡Π΅Π²ΠΈΠ΄Π½ΡƒΡŽ ΡΡ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΡΡ‚ΡŒ срСдств ΠΈΠ½Π΄ΠΈΠ²ΠΈΠ΄ΡƒΠ°Π»ΡŒΠ½ΠΎΠΉ Π·Π°Ρ‰ΠΈΡ‚Ρ‹ ΠΎΡ€Π³Π°Π½ΠΎΠ² дыхания ΠΎΡ‚ зараТСния коронавирусной ΠΈΠ½Ρ„Π΅ΠΊΡ†ΠΈΠ΅ΠΉ, ΠΌΠ½ΠΎΠ³ΠΈΠ΅ люди ΠΏΡ€Π΅Π½Π΅Π±Ρ€Π΅Π³Π°ΡŽΡ‚ использованиСм Π·Π°Ρ‰ΠΈΡ‚Π½Ρ‹Ρ… масок для Π»ΠΈΡ†Π° Π² общСствСнных мСстах. ΠŸΠΎΡΡ‚ΠΎΠΌΡƒ для контроля ΠΈ своСврСмСнного выявлСния Π½Π°Ρ€ΡƒΡˆΠΈΡ‚Π΅Π»Π΅ΠΉ общСствСнных ΠΏΡ€Π°Π²ΠΈΠ» здравоохранСния Π½Π΅ΠΎΠ±Ρ…ΠΎΠ΄ΠΈΠΌΠΎ ΠΏΡ€ΠΈΠΌΠ΅Π½ΡΡ‚ΡŒ соврСмСнныС ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½Ρ‹Π΅ Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΠΈ, ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ Π±ΡƒΠ΄ΡƒΡ‚ Π΄Π΅Ρ‚Π΅ΠΊΡ‚ΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ Π·Π°Ρ‰ΠΈΡ‚Π½Ρ‹Π΅ маски Π½Π° Π»ΠΈΡ†Π°Ρ… людСй ΠΏΠΎ Π²ΠΈΠ΄Π΅ΠΎ- ΠΈ Π°ΡƒΠ΄ΠΈΠΎΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΈ. Π’ ΡΡ‚Π°Ρ‚ΡŒΠ΅ ΠΏΡ€ΠΈΠ²Π΅Π΄Π΅Π½ аналитичСский ΠΎΠ±Π·ΠΎΡ€ ΡΡƒΡ‰Π΅ΡΡ‚Π²ΡƒΡŽΡ‰ΠΈΡ… ΠΈ Ρ€Π°Π·Ρ€Π°Π±Π°Ρ‚Ρ‹Π²Π°Π΅ΠΌΡ‹Ρ… ΠΈΠ½Ρ‚Π΅Π»Π»Π΅ΠΊΡ‚ΡƒΠ°Π»ΡŒΠ½Ρ‹Ρ… ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½Ρ‹Ρ… Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΠΉ бимодального Π°Π½Π°Π»ΠΈΠ·Π° голосовых ΠΈ Π»ΠΈΡ†Π΅Π²Ρ‹Ρ… характСристик Ρ‡Π΅Π»ΠΎΠ²Π΅ΠΊΠ° Π² маскС. БущСствуСт ΠΌΠ½ΠΎΠ³ΠΎ исслСдований Π½Π° Ρ‚Π΅ΠΌΡƒ обнаруТСния масок ΠΏΠΎ видСоизобраТСниям, Ρ‚Π°ΠΊΠΆΠ΅ Π² ΠΎΡ‚ΠΊΡ€Ρ‹Ρ‚ΠΎΠΌ доступС ΠΌΠΎΠΆΠ½ΠΎ Π½Π°ΠΉΡ‚ΠΈ Π·Π½Π°Ρ‡ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΠ΅ количСство корпусов, содСрТащих изобраТСния Π»ΠΈΡ† ΠΊΠ°ΠΊ Π±Π΅Π· масок, Ρ‚Π°ΠΊ ΠΈ Π² масках, ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½Ρ‹Ρ… Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹ΠΌΠΈ способами. ИсслСдований ΠΈ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΎΠΊ, Π½Π°ΠΏΡ€Π°Π²Π»Π΅Π½Π½Ρ‹Ρ… Π½Π° Π΄Π΅Ρ‚Π΅ΠΊΡ‚ΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠ΅ срСдств ΠΈΠ½Π΄ΠΈΠ²ΠΈΠ΄ΡƒΠ°Π»ΡŒΠ½ΠΎΠΉ Π·Π°Ρ‰ΠΈΡ‚Ρ‹ ΠΎΡ€Π³Π°Π½ΠΎΠ² дыхания ΠΏΠΎ акустичСским характСристикам Ρ€Π΅Ρ‡ΠΈ Ρ‡Π΅Π»ΠΎΠ²Π΅ΠΊΠ° ΠΏΠΎΠΊΠ° достаточно ΠΌΠ°Π»ΠΎ, Ρ‚Π°ΠΊ ΠΊΠ°ΠΊ это Π½Π°ΠΏΡ€Π°Π²Π»Π΅Π½ΠΈΠ΅ Π½Π°Ρ‡Π°Π»ΠΎ Ρ€Π°Π·Π²ΠΈΠ²Π°Ρ‚ΡŒΡΡ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Π² ΠΏΠ΅Ρ€ΠΈΠΎΠ΄ ΠΏΠ°Π½Π΄Π΅ΠΌΠΈΠΈ, Π²Ρ‹Π·Π²Π°Π½Π½ΠΎΠΉ коронавирусной ΠΈΠ½Ρ„Π΅ΠΊΡ†ΠΈΠ΅ΠΉ COVID-19. Π‘ΡƒΡ‰Π΅ΡΡ‚Π²ΡƒΡŽΡ‰ΠΈΠ΅ систСмы ΠΏΠΎΠ·Π²ΠΎΠ»ΡΡŽΡ‚ ΠΏΡ€Π΅Π΄ΠΎΡ‚Π²Ρ€Π°Ρ‚ΠΈΡ‚ΡŒ распространСниС коронавирусной ΠΈΠ½Ρ„Π΅ΠΊΡ†ΠΈΠΈ с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ распознавания наличия/отсутствия масок Π½Π° Π»ΠΈΡ†Π΅, Ρ‚Π°ΠΊΠΆΠ΅ Π΄Π°Π½Π½Ρ‹Π΅ систСмы ΠΏΠΎΠΌΠΎΠ³Π°ΡŽΡ‚ Π² дистанционном диагностировании COVID-19 с ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ обнаруТСния ΠΏΠ΅Ρ€Π²Ρ‹Ρ… симптомов вирусной ΠΈΠ½Ρ„Π΅ΠΊΡ†ΠΈΠΈ ΠΏΠΎ акустичСским характСристикам. Однако, Π½Π° сСгодняшний дСнь сущСствуСт ряд Π½Π΅Ρ€Π΅ΡˆΠ΅Π½Π½Ρ‹Ρ… ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌ Π² области автоматичСского диагностирования симптомов COVID-19 ΠΈ наличия/отсутствия масок Π½Π° Π»ΠΈΡ†Π°Ρ… людСй. Π’ ΠΏΠ΅Ρ€Π²ΡƒΡŽ ΠΎΡ‡Π΅Ρ€Π΅Π΄ΡŒ это низкая Ρ‚ΠΎΡ‡Π½ΠΎΡΡ‚ΡŒ обнаруТСния масок ΠΈ коронавирусной ΠΈΠ½Ρ„Π΅ΠΊΡ†ΠΈΠΈ, Ρ‡Ρ‚ΠΎ Π½Π΅ позволяСт ΠΎΡΡƒΡ‰Π΅ΡΡ‚Π²Π»ΡΡ‚ΡŒ Π°Π²Ρ‚ΠΎΠΌΠ°Ρ‚ΠΈΡ‡Π΅ΡΠΊΡƒΡŽ диагностику Π±Π΅Π· присутствия экспСртов (мСдицинского пСрсонала). МногиС систСмы Π½Π΅ способны Ρ€Π°Π±ΠΎΡ‚Π°Ρ‚ΡŒ Π² Ρ€Π΅ΠΆΠΈΠΌΠ΅ Ρ€Π΅Π°Π»ΡŒΠ½ΠΎΠ³ΠΎ Π²Ρ€Π΅ΠΌΠ΅Π½ΠΈ, ΠΈΠ·-Π·Π° Ρ‡Π΅Π³ΠΎ Π½Π΅Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ ΠΏΡ€ΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡ‚ΡŒ ΠΊΠΎΠ½Ρ‚Ρ€ΠΎΠ»ΡŒ ΠΈ ΠΌΠΎΠ½ΠΈΡ‚ΠΎΡ€ΠΈΠ½Π³ ношСния Π·Π°Ρ‰ΠΈΡ‚Π½Ρ‹Ρ… масок Π² общСствСнных мСстах. Π’Π°ΠΊΠΆΠ΅ Π±ΠΎΠ»ΡŒΡˆΠΈΠ½ΡΡ‚Π²ΠΎ ΡΡƒΡ‰Π΅ΡΡ‚Π²ΡƒΡŽΡ‰ΠΈΡ… систСм Π½Π΅Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ Π²ΡΡ‚Ρ€ΠΎΠΈΡ‚ΡŒ Π² смартфон, Ρ‡Ρ‚ΠΎΠ±Ρ‹ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΠΈ ΠΌΠΎΠ³Π»ΠΈ Π² любом мСстС произвСсти диагностированиС наличия коронавирусной ΠΈΠ½Ρ„Π΅ΠΊΡ†ΠΈΠΈ. Π•Ρ‰Π΅ ΠΎΠ΄Π½ΠΎΠΉ основной ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΠΎΠΉ являСтся сбор Π΄Π°Π½Π½Ρ‹Ρ… ΠΏΠ°Ρ†ΠΈΠ΅Π½Ρ‚ΠΎΠ², Π·Π°Ρ€Π°ΠΆΠ΅Π½Π½Ρ‹Ρ… COVID-19, Ρ‚Π°ΠΊ ΠΊΠ°ΠΊ ΠΌΠ½ΠΎΠ³ΠΈΠ΅ люди Π½Π΅ согласны Ρ€Π°ΡΠΏΡ€ΠΎΡΡ‚Ρ€Π°Π½ΡΡ‚ΡŒ ΠΊΠΎΠ½Ρ„ΠΈΠ΄Π΅Π½Ρ†ΠΈΠ°Π»ΡŒΠ½ΡƒΡŽ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ

    Neural network-based method for visual recognition of driver’s voice commands using attention mechanism

    Get PDF
    Visual speech recognition or automated lip-reading systems actively apply to speech-to-text translation. Video data proves to be useful in multimodal speech recognition systems, particularly when using acoustic data is difficult or not available at all. The main purpose of this study is to improve driver command recognition by analyzing visual information to reduce touch interaction with various vehicle systems (multimedia and navigation systems, phone calls, etc.) while driving. We propose a method of automated lip-reading the driver’s speech while driving based on a deep neural network of 3DResNet18 architecture. Using neural network architecture with bi-directional LSTM model and attention mechanism allows achieving higher recognition accuracy with a slight decrease in performance. Two different variants of neural network architectures for visual speech recognition are proposed and investigated. When using the first neural network architecture, the result of voice recognition of the driver was 77.68 %, which was lower by 5.78 % than when using the second one the accuracy of which was 83.46 %. Performance of the system which is determined by a real-time indicator RTF in the case of the first neural network architecture is equal to 0.076, and the second β€” RTF is 0.183 which is more than two times higher. The proposed method was tested on the data of multimodal corpus RUSAVIC recorded in the car. Results of the study can be used in systems of audio-visual speech recognition which is recommended in high noise conditions, for example, when driving a vehicle. In addition, the analysis performed allows us to choose the optimal neural network model of visual speech recognition for subsequent incorporation into the assistive system based on a mobile device

    Анализ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½ΠΎΠ³ΠΎ ΠΈ матСматичСского обСспСчСния для распознавания Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… состояний Ρ‡Π΅Π»ΠΎΠ²Π΅ΠΊΠ°

    Get PDF
    Π’ ΡΡ‚Π°Ρ‚ΡŒΠ΅ прСдставлСн аналитичСский ΠΎΠ±Π·ΠΎΡ€ исслСдований Π² области Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… вычислСний. Π­Ρ‚ΠΎ Π½Π°ΠΏΡ€Π°Π²Π»Π΅Π½ΠΈΠ΅ являСтся ΡΠΎΡΡ‚Π°Π²Π»ΡΡŽΡ‰Π΅ΠΉ искусствСнного ΠΈΠ½Ρ‚Π΅Π»Π»Π΅ΠΊΡ‚Π°, ΠΈ ΠΈΠ·ΡƒΡ‡Π°Π΅Ρ‚ ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹, Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ‹ ΠΈ систСмы для Π°Π½Π°Π»ΠΈΠ·Π° Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… состояний Ρ‡Π΅Π»ΠΎΠ²Π΅ΠΊΠ° ΠΏΡ€ΠΈ Π΅Π³ΠΎ взаимодСйствии с Π΄Ρ€ΡƒΠ³ΠΈΠΌΠΈ людьми, ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½Ρ‹ΠΌΠΈ систСмами ΠΈΠ»ΠΈ Ρ€ΠΎΠ±ΠΎΡ‚Π°ΠΌΠΈ. Π’ области ΠΈΠ½Ρ‚Π΅Π»Π»Π΅ΠΊΡ‚ΡƒΠ°Π»ΡŒΠ½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π° Π΄Π°Π½Π½Ρ‹Ρ… ΠΏΠΎΠ΄ Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΎΠΌ подразумСваСтся проявлСниС психологичСских Ρ€Π΅Π°ΠΊΡ†ΠΈΠΉ Π½Π° Π²ΠΎΠ·Π±ΡƒΠΆΠ΄Π°Π΅ΠΌΠΎΠ΅ событиС, ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠ΅ ΠΌΠΎΠΆΠ΅Ρ‚ ΠΏΡ€ΠΎΡ‚Π΅ΠΊΠ°Ρ‚ΡŒ ΠΊΠ°ΠΊ Π² краткосрочном, Ρ‚Π°ΠΊ ΠΈ Π² долгосрочном ΠΏΠ΅Ρ€ΠΈΠΎΠ΄Π΅, Π° Ρ‚Π°ΠΊΠΆΠ΅ ΠΈΠΌΠ΅Ρ‚ΡŒ Ρ€Π°Π·Π»ΠΈΡ‡Π½ΡƒΡŽ ΠΈΠ½Ρ‚Π΅Π½ΡΠΈΠ²Π½ΠΎΡΡ‚ΡŒ ΠΏΠ΅Ρ€Π΅ΠΆΠΈΠ²Π°Π½ΠΈΠΉ. АффСкты Π² рассматриваСмой области Ρ€Π°Π·Π΄Π΅Π»Π΅Π½Ρ‹ Π½Π° 4 Π²ΠΈΠ΄Π°: Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Π΅ эмоции, Π±Π°Π·ΠΎΠ²Ρ‹Π΅ эмоции, настроСниС ΠΈ Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Π΅ расстройства. ΠŸΡ€ΠΎΡΠ²Π»Π΅Π½ΠΈΠ΅ Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… состояний отраТаСтся Π² Π²Π΅Ρ€Π±Π°Π»ΡŒΠ½Ρ‹Ρ… Π΄Π°Π½Π½Ρ‹Ρ… ΠΈ Π½Π΅Π²Π΅Ρ€Π±Π°Π»ΡŒΠ½Ρ‹Ρ… характСристиках повСдСния: акустичСских ΠΈ лингвистичСских характСристиках Ρ€Π΅Ρ‡ΠΈ, ΠΌΠΈΠΌΠΈΠΊΠ΅, ТСстах ΠΈ ΠΏΠΎΠ·Π°Ρ… Ρ‡Π΅Π»ΠΎΠ²Π΅ΠΊΠ°. Π’ ΠΎΠ±Π·ΠΎΡ€Π΅ приводится ΡΡ€Π°Π²Π½ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹ΠΉ Π°Π½Π°Π»ΠΈΠ· ΡΡƒΡ‰Π΅ΡΡ‚Π²ΡƒΡŽΡ‰Π΅Π³ΠΎ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½ΠΎΠ³ΠΎ обСспСчСния для автоматичСского распознавания Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… состояний Ρ‡Π΅Π»ΠΎΠ²Π΅ΠΊΠ° Π½Π° ΠΏΡ€ΠΈΠΌΠ΅Ρ€Π΅ эмоций, сСнтимСнта, агрСссии ΠΈ дСпрСссии. НСмногочислСнныС русскоязычныС Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Π΅ Π±Π°Π·Ρ‹ Π΄Π°Π½Π½Ρ‹Ρ… ΠΏΠΎΠΊΠ° сущСствСнно ΡƒΡΡ‚ΡƒΠΏΠ°ΡŽΡ‚ ΠΏΠΎ ΠΎΠ±ΡŠΠ΅ΠΌΡƒ ΠΈ качСству элСктронным рСсурсам Π½Π° Π΄Ρ€ΡƒΠ³ΠΈΡ… ΠΌΠΈΡ€ΠΎΠ²Ρ‹Ρ… языках, Ρ‡Ρ‚ΠΎ обуславливаСт Π½Π΅ΠΎΠ±Ρ…ΠΎΠ΄ΠΈΠΌΠΎΡΡ‚ΡŒ рассмотрСния ΡˆΠΈΡ€ΠΎΠΊΠΎΠ³ΠΎ спСктра Π΄ΠΎΠΏΠΎΠ»Π½ΠΈΡ‚Π΅Π»ΡŒΠ½Ρ‹Ρ… ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΎΠ², ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ² ΠΈ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ², примСняСмых Π² условиях ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡Π΅Π½Π½ΠΎΠ³ΠΎ объСма ΠΎΠ±ΡƒΡ‡Π°ΡŽΡ‰ΠΈΡ… ΠΈ тСстовых Π΄Π°Π½Π½Ρ‹Ρ…, ΠΈ ставит Π·Π°Π΄Π°Ρ‡Ρƒ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ Π½ΠΎΠ²Ρ‹Ρ… ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΎΠ² ΠΊ Π°ΡƒΠ³ΠΌΠ΅Π½Ρ‚Π°Ρ†ΠΈΠΈ Π΄Π°Π½Π½Ρ‹Ρ…, пСрСносу обучСния ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ ΠΈ Π°Π΄Π°ΠΏΡ‚Π°Ρ†ΠΈΠΈ иноязычных рСсурсов. Π’ ΡΡ‚Π°Ρ‚ΡŒΠ΅ приводится описаниС ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ² Π°Π½Π°Π»ΠΈΠ·Π° одномодальной Π²ΠΈΠ·ΡƒΠ°Π»ΡŒΠ½ΠΎΠΉ, акустичСской ΠΈ лингвистичСской ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΈ, Π° Ρ‚Π°ΠΊΠΆΠ΅ ΠΌΠ½ΠΎΠ³ΠΎΠΌΠΎΠ΄Π°Π»ΡŒΠ½Ρ‹Ρ… ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΎΠ² ΠΊ Ρ€Π°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡŽ Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… состояний. ΠœΠ½ΠΎΠ³ΠΎΠΌΠΎΠ΄Π°Π»ΡŒΠ½Ρ‹ΠΉ ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ ΠΊ автоматичСскому Π°Π½Π°Π»ΠΈΠ·Ρƒ Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… состояний позволяСт ΠΏΠΎΠ²Ρ‹ΡΠΈΡ‚ΡŒ Ρ‚ΠΎΡ‡Π½ΠΎΡΡ‚ΡŒ распознавания рассматриваСмых явлСний ΠΎΡ‚Π½ΠΎΡΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎ ΠΎΠ΄Π½ΠΎΠΌΠΎΠ΄Π°Π»ΡŒΠ½Ρ‹Ρ… Ρ€Π΅ΡˆΠ΅Π½ΠΈΠΉ. Π’ ΠΎΠ±Π·ΠΎΡ€Π΅ ΠΎΡ‚ΠΌΠ΅Ρ‡Π΅Π½Π° тСндСнция соврСмСнных исслСдований, Π·Π°ΠΊΠ»ΡŽΡ‡Π°ΡŽΡ‰Π°ΡΡΡ Π² Ρ‚ΠΎΠΌ, Ρ‡Ρ‚ΠΎ нСйросСтСвыС ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ постСпСнно Π²Ρ‹Ρ‚Π΅ΡΠ½ΡΡŽΡ‚ классичСскиС Π΄Π΅Ρ‚Π΅Ρ€ΠΌΠΈΠ½ΠΈΡ€ΠΎΠ²Π°Π½Π½Ρ‹Π΅ ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ благодаря Π»ΡƒΡ‡ΡˆΠ΅ΠΌΡƒ качСству распознавания состояний ΠΈ ΠΎΠΏΠ΅Ρ€Π°Ρ‚ΠΈΠ²Π½ΠΎΠΉ ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠ΅ большого объСма Π΄Π°Π½Π½Ρ‹Ρ…. Π’ ΡΡ‚Π°Ρ‚ΡŒΠ΅ Ρ€Π°ΡΡΠΌΠ°Ρ‚Ρ€ΠΈΠ²Π°ΡŽΡ‚ΡΡ ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ Π°Π½Π°Π»ΠΈΠ·Π° Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… состояний. ΠŸΡ€Π΅ΠΈΠΌΡƒΡ‰Π΅ΡΡ‚Π²ΠΎΠΌ использования ΠΌΠ½ΠΎΠ³ΠΎΠ·Π°Π΄Π°Ρ‡Π½Ρ‹Ρ… иСрархичСских ΠΏΠΎΠ΄Ρ…ΠΎΠ΄ΠΎΠ² являСтся Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡ‚ΡŒ ΠΈΠ·Π²Π»Π΅ΠΊΠ°Ρ‚ΡŒ Π½ΠΎΠ²Ρ‹Π΅ Ρ‚ΠΈΠΏΡ‹ Π·Π½Π°Π½ΠΈΠΉ, Π² Ρ‚ΠΎΠΌ числС ΠΎ влиянии, коррСляции ΠΈ взаимодСйствии Π½Π΅ΡΠΊΠΎΠ»ΡŒΠΊΠΈΡ… Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… состояний Π΄Ρ€ΡƒΠ³ Π½Π° Π΄Ρ€ΡƒΠ³Π°, Ρ‡Ρ‚ΠΎ ΠΏΠΎΡ‚Π΅Π½Ρ†ΠΈΠ°Π»ΡŒΠ½ΠΎ Π²Π»Π΅Ρ‡Π΅Ρ‚ ΠΊ ΡƒΠ»ΡƒΡ‡ΡˆΠ΅Π½ΠΈΡŽ качСства распознавания. ΠŸΡ€ΠΈΠ²ΠΎΠ΄ΡΡ‚ΡΡ ΠΏΠΎΡ‚Π΅Π½Ρ†ΠΈΠ°Π»ΡŒΠ½Ρ‹Π΅ трСбования ΠΊ Ρ€Π°Π·Ρ€Π°Π±Π°Ρ‚Ρ‹Π²Π°Π΅ΠΌΡ‹ΠΌ систСмам Π°Π½Π°Π»ΠΈΠ·Π° Π°Ρ„Ρ„Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… состояний ΠΈ основныС направлСния Π΄Π°Π»ΡŒΠ½Π΅ΠΉΡˆΠΈΡ… исслСдований

    A Multimodal User Interface for an Assistive Robotic Shopping Cart

    No full text
    This paper presents the research and development of the prototype of the assistive mobile information robot (AMIR). The main features of the presented prototype are voice and gesture-based interfaces with Russian speech and sign language recognition and synthesis techniques and a high degree of robot autonomy. AMIR prototype’s aim is to be used as a robotic cart for shopping in grocery stores and/or supermarkets. Among the main topics covered in this paper are the presentation of the interface (three modalities), the single-handed gesture recognition system (based on a collected database of Russian sign language elements), as well as the technical description of the robotic platform (architecture, navigation algorithm). The use of multimodal interfaces, namely the speech and gesture modalities, make human-robot interaction natural and intuitive, as well as sign language recognition allows hearing-impaired people to use this robotic cart. AMIR prototype has promising perspectives for real usage in supermarkets, both due to its assistive capabilities and its multimodal user interface

    Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

    No full text
    Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise. Additional visual information can be used for both automatic lip-reading and gesture recognition. Hand gestures are a form of non-verbal communication and can be used as a very important part of modern human–computer interaction systems. Currently, audio and video modalities are easily accessible by sensors of mobile devices. However, there is no out-of-the-box solution for automatic audio-visual speech and gesture recognition. This study introduces two deep neural network-based model architectures: one for AVSR and one for gesture recognition. The main novelty regarding audio-visual speech recognition lies in fine-tuning strategies for both visual and acoustic features and in the proposed end-to-end model, which considers three modality fusion approaches: prediction-level, feature-level, and model-level. The main novelty in gesture recognition lies in a unique set of spatio-temporal features, including those that consider lip articulation information. As there are no available datasets for the combined task, we evaluated our methods on two different large-scale corpora—LRW and AUTSL—and outperformed existing methods on both audio-visual speech recognition and gesture recognition tasks. We achieved AVSR accuracy for the LRW dataset equal to 98.76% and gesture recognition rate for the AUTSL dataset equal to 98.56%. The results obtained demonstrate not only the high performance of the proposed methodology, but also the fundamental possibility of recognizing audio-visual speech and gestures by sensors of mobile devices

    Inteligentní uživatelské rozhraní založené na gestech pro ovlÑdÑní asistenčního mobilního informačního robota

    No full text
    Tento člΓ‘nek pΕ™edstavuje uΕΎivatelskΓ© rozhranΓ­ zaloΕΎenΓ© na gestech pro robotickΓ½ nΓ‘kupnΓ­ vozΓ­k. VozΓ­k je navrΕΎen jako mobilnΓ­ robotickΓ‘ platforma, kterΓ‘ pomΓ‘hΓ‘ zΓ‘kaznΓ­kΕ―m v obchodech a supermarketech. Mezi hlavnΓ­ funkce patΕ™Γ­: navigace v obchodΔ›, poskytovΓ‘nΓ­ informacΓ­ o dostupnosti a umΓ­stΔ›nΓ­ a pΕ™eprava zakoupenΓ©ho zboΕΎΓ­. Jednou z dΕ―leΕΎitΓ½ch vlastnostΓ­ vyvinutΓ©ho rozhranΓ­ je gestickΓ‘ modalita, pΕ™esnΔ›ji Ε™ečeno ruskΓ½ systΓ©m rozpoznΓ‘vΓ‘nΓ­ prvkΕ― znakovΓ©ho jazyka. Pojem design rozhranΓ­, stejnΔ› jako strategie interakce, jsou prezentovΓ‘ny ve vΓ½vojovΓ½ch diagramech, byl učinΔ›n pokus demonstrovat gestickou modalitu jako pΕ™irozenou součÑst pomocnΓ©ho informačnΓ­ho robota. KromΔ› toho je v člΓ‘nku uveden krΓ‘tkΓ½ pΕ™ehled mobilnΓ­ch robotΕ― a je poskytnuta technika rozpoznΓ‘vΓ‘nΓ­ gest zaloΕΎenΓ‘ na CNN. MoΕΎnost rozpoznΓ‘vΓ‘nΓ­ ruskΓ©ho znakovΓ©ho jazyka mΓ‘ velkΓ½ vΓ½znam kvΕ―li relativnΔ› velkΓ©mu počtu rodilΓ½ch mluvčích.This article presents a gesture-based user interface for a robotic shopping trolley. The trolley is designed as a mobile robotic platform helping customers in shops and supermarkets. Among the main functions are: navigating through the store, providing information on availability and location, and transporting the items bought. One of important features of the developed interface is the gestural modality, or, more precisely, Russian sign language elements recognition system. The notion of the interface design, as well as interaction strategy, are presented in flowcharts, it was made an attempt to demonstrate the gestural modality as a natural part of an assistive information robot. Besides, a short overview of mobile robots is given in the paper, and CNN-based technique of gesture recognition is provided. The Russian sign language recognition option is of high importance due to a relatively large number of native speakers (signers). Β© 2020, Springer Nature Switzerland AG

    A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition

    No full text
    This article provides a detailed review of recent advances in audio-visual speech recognition (AVSR) methods that have been developed over the last decade (2013–2023). Despite the recent success of audio speech recognition systems, the problem of audio-visual (AV) speech decoding remains challenging. In comparison to the previous surveys, we mainly focus on the important progress brought with the introduction of deep learning (DL) to the field and skip the description of long-known traditional β€œhand-crafted” methods. In addition, we also discuss the recent application of DL toward AV speech fusion and recognition. We first discuss the main AV datasets used in the literature for AVSR experiments since we consider it a data-driven machine learning (ML) task. We then consider the methodology used for visual speech recognition (VSR). Subsequently, we also consider recent AV methodology advances. We then separately discuss the evolution of the core AVSR methods, pre-processing and augmentation techniques, and modality fusion strategies. We conclude the article with a discussion on the current state of AVSR and provide our vision for future research

    EMOLIPS: Towards Reliable Emotional Speech Lip-Reading

    No full text
    In this article, we present a novel approach for emotional speech lip-reading (EMOLIPS). This two-level approach to emotional speech to text recognition based on visual data processing is motivated by human perception and the recent developments in multimodal deep learning. The proposed approach uses visual speech data to determine the type of speech emotion. The speech data are then processed using one of the emotional lip-reading models trained from scratch. This essentially resolves the multi-emotional lip-reading issue associated with most real-life scenarios. We implemented these models as a combination of EMO-3DCNN-GRU architecture for emotion recognition and 3DCNN-BiLSTM architecture for automatic lip-reading. We evaluated the models on the CREMA-D and RAVDESS emotional speech corpora. In addition, this article provides a detailed review of recent advances in automated lip-reading and emotion recognition that have been developed over the last 5 years (2018–2023). In comparison to existing research, we mainly focus on the valuable progress brought with the introduction of deep learning to the field and skip the description of traditional approaches. The EMOLIPS approach significantly improves the state-of-the-art accuracy for phrase recognition due to considering emotional features of the pronounced audio-visual speech up to 91.9% and 90.9% for RAVDESS and CREMA-D, respectively. Moreover, we present an extensive experimental investigation that demonstrates how different emotions (happiness, anger, disgust, fear, sadness, and neutral), valence (positive, neutral, and negative) and binary (emotional and neutral) affect automatic lip-reading

    Multimodal Personality Traits Assessment (MuPTA) Corpus: The Impact of Spontaneous and Read Speech

    No full text
    Automatic personality traits assessment (PTA) provides high-level, intelligible predictive inputs for subsequent critical downstream tasks, such as job interview recommendations and mental healthcare monitoring. In this work, we introduce a novel Multimodal Personality Traits Assessment (MuPTA) corpus. Our MuPTA corpus is unique in that it contains both spontaneous and read speech collected in the midly-resourced Russian language. We present a novel audio-visual approach for PTA that is used in order to set up baseline results on this corpus. We further analyze the impact of both spontaneous and read speech types on the PTA predictive performance. We find that for the audio modality, the PTA predictive performances on short signals are almost equal regardless of the speech type, while PTA using video modality is more accurate with spontaneous speech compared to read one regardless of the signal length

    A Multimodal User Interface for an Assistive Robotic Shopping Cart

    Get PDF
    This paper presents the research and development of the prototype of the assistive mobile information robot (AMIR). The main features of the presented prototype are voice and gesture-based interfaces with Russian speech and sign language recognition and synthesis techniques and a high degree of robot autonomy. AMIR prototype’s aim is to be used as a robotic cart for shopping in grocery stores and/or supermarkets. Among the main topics covered in this paper are the presentation of the interface (three modalities), the single-handed gesture recognition system (based on a collected database of Russian sign language elements), as well as the technical description of the robotic platform (architecture, navigation algorithm). The use of multimodal interfaces, namely the speech and gesture modalities, make human-robot interaction natural and intuitive, as well as sign language recognition allows hearing-impaired people to use this robotic cart. AMIR prototype has promising perspectives for real usage in supermarkets, both due to its assistive capabilities and its multimodal user interface.Peer reviewe
    corecore