Search CORE

70 research outputs found

Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data

Author: Keren Gil
Schuller Björn
Publication venue
Publication date: 01/01/2016
Field of study

Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input. We propose a model that enhances this feature extraction process for the case of sequential data, by feeding patches of the data into a recurrent neural network and using the outputs or hidden states of the recurrent units to compute the extracted features. By doing so, we exploit the fact that a window containing a few frames of the sequential data is a sequence itself and this additional structure might encapsulate valuable information. In addition, we allow for more steps of computation in the feature extraction process, which is potentially beneficial as an affine function followed by a non-linearity can result in too simple features. Using our convolutional recurrent layers we obtain an improvement in performance in two audio classification tasks, compared to traditional convolutional layers. Tensorflow code for the convolutional recurrent layers is publicly available in https://github.com/cruvadom/Convolutional-RNN

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Recognizing emotion from Turkish speech using acoustic features

Author: Caglar Oflazoglu
Serdar Yildirim
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

Springer - Publisher Connector

Автоматическое распознавание паралингвистических характеристик говорящего: способы улучшения качества классификации

Author: Schmitt Alexander
Semenkin Eugene S.
Sidorov Maxim
Семенкин Евгений C.
Сидоров Максим
Шмитт Александр
Publication venue: Сибирский федеральный университет. Siberian Federal University.
Publication date: 01/05/2015
Field of study

The ability of artificial systems to recognize paralinguistic signals, such as emotions, depression, or openness, is useful in various applications. However, the performance of such recognizers is not yet perfect. In this study we consider several directions which can significantly improve the performance of such systems. Firstly, we propose building speaker- or gender-specific emotion models. Thus, an emotion recognition (ER) procedure is followed by a gender- or speaker-identifier. Speaker- or gender-specific information is used either for including into the feature vector directly, or for creating separate emotion recognition models for each gender or speaker. Secondly, a feature selection procedure is an important part of any classification problem; therefore, we proposed using a feature selection technique, based on a genetic algorithm or an information gain approach. Both methods result in higher performance than baseline methods without any feature selection algorithms. Finally, we suggest analysing not only audio signals, but also combined audio-visual cues. The early fusion method (or feature-based fusion) has been used in our investigations to combine different modalities into a multimodal approach. The results obtained show that the multimodal approach outperforms single modalities on the considered corpora. The suggested methods have been evaluated on a number of emotional databases of three languages (English, German and Japanese), in both acted and non-acted settings. The results of numerical experiments are also shown in the studyСпособность искусственных систем распознавать паралингвистические характеристики говоря- щего, такие как эмоциональное состояние, наличие и степень депрессии, открытость человека, является полезной для широкого круга приложений. Однако производительность таких систем далека от идеальных значений. В этой статье мы предлагаем подходы, применение которых позволяет существенно улучшить производительность систем распознавания. В работе описы- вается метод построения адаптивных эмоциональных моделей, позволяющих использовать ха- рактеристики конкретного человека для построения точных моделей. В статье представлены алгоритмы выявления наиболее значимых характеристик речевых сигналов, позволяющие одно- временно максимизировать точность решения поставленной задачи и минимизировать количе- ство используемых характеристик сигнала. Наконец, предлагается использовать комбинирован- ные аудио визуальные сигналы в качестве входов для алгоритма машинного обучения. Указанные подходы были реализованы и проверены на 9 эмоциональных речевых корпусах. Результаты прове- денных экспериментов позволяют утверждать, что предложенные в статье подходы улучшают качество решения поставленных задач с точки зрения выбранных критерие

Siberian Federal University Digital Repository

In search of the role’s footprints in client-therapist dialogues

Author: Batista F.
Lerner A.
Moniz H.
Silber-Varod V.
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2016
Field of study

The goal of this research is to identify speaker's role via machine learning of broad acoustic parameters, in order to understand how an occupation, or a role, affects voice characteristics. The examined corpus consists of recordings taken under the same psychological paradigm (Process Work). Four interns were involved in four genuine client-therapist treatment sessions, where each individual had to train her therapeutic skills on her colleague that, in her turn, participated as a client. This uniform setting provided a unique opportunity to examine how role affects speaker's prosody. By a collection of machine learning algorithms, we tested automatic classification of the role across sessions. Results based on the acoustic properties show high classification rates, suggesting that there are discriminative acoustic features of speaker's role, as either a therapist or a client.info:eu-repo/semantics/publishedVersio

Crossref

Repositório Institucional do ISCTE-IUL

Recognizing emotion from Turkish speech using acoustic features

Author: A Batliner
A Batliner
AJ Smola
B Schuller
B Schuller
B Schuller
B Schuller
B Schuller
BS Schuller
C Busso
C Clavel
C Clavel
C Oflazoglu
Caglar Oflazoglu
CC Chang
CC Lee
CM Lee
E Douglas-Cowie
E Douglas-Cowie
E Douglas-Cowie
EM Albornoz
F Burkhardt
F Eyben
G McKeown
IS Engberg
J Ang
J Fleiss
JHL Hansen
KR Scherer
M Bradley
M Grimm
M Grimm
M Grimm
M Hall
M Hall
M Liberman
M Shami
P Ekman
R Bouckaert
S Arunachalam
S Steidl
S Yildirim
Serdar Yildirim
T Banziger
T Polzehl
TL Nwe
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref