Search CORE

65 research outputs found

Fast Bayesian People Detection

Author: Englebienne Gwenn
Kröse Ben
Publication venue
Publication date: 01/01/2010
Field of study

Template-based methods have been shown to be effective at solving the problem of tracking specific objects, but their large number of free parameters can make them slow to apply and hard to optimise globally. In this work, we propose a template-based method for tracking people with fixed cameras, which automatically detects the number of people in a frame, is robust to occlusions, and can run at near-real-time frame rates. We demonstrate the effectiveness of the method by comparing it to a state-of-the-art background segmentation algorithm and show its important performance advantage

University of Twente Research Information

Ariadne's Thread - Interactive Navigation in a World of Networked Information

Author: Englebienne Gwenn
Koopman Rob
Scharnhorst Andrea
Wang Shenghui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

This work-in-progress paper introduces an interface for the interactive visual exploration of the context of queries using the ArticleFirst database, a product of OCLC. We describe a workflow which allows the user to browse live entities associated with 65 million articles. In the on-line interface, each query leads to a specific network representation of the most prevailing entities: topics (words), authors, journals and Dewey decimal classes linked to the set of terms in the query. This network represents the context of a query. Each of the network nodes is clickable: by clicking through, a user traverses a large space of articles along dimensions of authors, journals, Dewey classes and words simultaneously. We present different use cases of such an interface. This paper provides a link between the quest for maps of science and on-going debates in HCI about the use of interactive information visualisation to empower users in their search.Comment: CHI'15 Extended Abstracts, April 18-23, 2015, Seoul, Republic of Korea. ACM 978-1-4503-3146-3/15/0

arXiv.org e-Print Archive

CiteSeerX

Crossref

Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning

Author: Englebienne Gwenn
Evers Vanessa
Kim Jaebok
Truong Khiet P.
Publication venue
Publication date: 01/01/2017
Field of study

One of the challenges in Speech Emotion Recognition (SER) "in the wild" is the large mismatch between training and test data (e.g. speakers and tasks). In order to improve the generalisation capabilities of the emotion models, we propose to use Multi-Task Learning (MTL) and use gender and naturalness as auxiliary tasks in deep neural networks. This method was evaluated in within-corpus and various cross-corpus classification experiments that simulate conditions "in the wild". In comparison to Single-Task Learning (STL) based state of the art methods, we found that our MTL method proposed improved performance significantly. Particularly, models using both gender and naturalness achieved more gains than those using either gender or naturalness separately. This benefit was also found in the high-level representations of the feature space, obtained from our method proposed, where discriminative emotional clusters could be observed.Comment: Published in the proceedings of INTERSPEECH, Stockholm, September, 201

arXiv.org e-Print Archive

Crossref

University of Twente Research Information

Learning spectro-temporal features with 3D CNNs for speech emotion recognition

Author: Englebienne Gwenn
Evers Vanessa
Kim Jaebok
Truong Khiet P.
Publication venue
Publication date: 14/08/2017
Field of study

In this paper, we propose to use deep 3-dimensional convolutional networks (3D CNNs) in order to address the challenge of modelling spectro-temporal dynamics for speech emotion recognition (SER). Compared to a hybrid of Convolutional Neural Network and Long-Short-Term-Memory (CNN-LSTM), our proposed 3D CNNs simultaneously extract short-term and long-term spectral features with a moderate number of parameters. We evaluated our proposed and other state-of-the-art methods in a speaker-independent manner using aggregated corpora that give a large and diverse set of speakers. We found that 1) shallow temporal and moderately deep spectral kernels of a homogeneous architecture are optimal for the task; and 2) our 3D CNNs are more effective for spectro-temporal feature learning compared to other methods. Finally, we visualised the feature space obtained with our proposed method using t-distributed stochastic neighbour embedding (T-SNE) and could observe distinct clusters of emotions.Comment: ACII, 2017, San Antoni

arXiv.org e-Print Archive

Crossref

University of Twente Research Information