Search CORE

59 research outputs found

A deep matrix factorization method for learning attribute representations

Author: Bousmalis Konstantinos
Schuller Bjoern W.
Trigeorgis George
Zafeiriou Stefanos
Publication venue
Publication date: 10/09/2015
Field of study

Semi-Non-negative Matrix Factorization is a technique that learns a low-dimensional representation of a dataset that lends itself to a clustering interpretation. It is possible that the mapping between this new representation and our original data matrix contains rather complex hierarchical information with implicit lower-level hidden attributes, that classical one level clustering methodologies can not interpret. In this work we propose a novel model, Deep Semi-NMF, that is able to learn such hidden representations that allow themselves to an interpretation of clustering according to different, unknown attributes of a given dataset. We also present a semi-supervised version of the algorithm, named Deep WSF, that allows the use of (partial) prior information for each of the known attributes of a dataset, that allows the model to be used on datasets with mixed attribute knowledge. Finally, we show that our models are able to learn low-dimensional representations that are better suited for clustering, but also classification, outperforming Semi-Non-negative Matrix Factorization, but also other state-of-the-art methodologies variants.Comment: Submitted to TPAMI (16-Mar-2015

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Spiral - Imperial College Digital Repository

3D face morphable models "In-The-Wild"

Author: Antonakos Epameinondas
Booth James
Panagakis Yannis
Ploumpis Stylianos
Trigeorgis George
Zafeiriou Stefanos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions (in-the-wild). In this paper, we propose the first, to the best of our knowledge, in-the-wild 3DMM by combining a powerful statistical model of facial shape, which describes both identity and expression, with an in-the-wild texture model. We show that the employment of such an in-the-wild texture model greatly simplifies the fitting procedure, because there is no need to optimise with regards to the illumination parameters. Furthermore, we propose a new fast algorithm for fitting the 3DMM in arbitrary images. Finally, we have captured the first 3D facial database with relatively unconstrained conditions and report quantitative evaluations with state-of-the-art performance. Complementary qualitative reconstruction results are demonstrated on standard in-the-wild facial databases

arXiv.org e-Print Archive

Crossref

Middlesex University Research Repository

University of Oulu Repository - Jultika

Spiral - Imperial College Digital Repository

Deep Canonical Time Warping

Author: George Trigeorgis
Nicolaou Mihalis
Schuller Bjorn
Zafeiriou Stefanos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Machine learning algorithms for the analysis of timeseries often depend on the assumption that the utilised data are temporally aligned. Any temporal discrepancies arising in the data is certain to lead to ill-generalisable models, which in turn fail to correctly capture the properties of the task at hand. The temporal alignment of time-series is thus a crucial challenge manifesting in a multitude of applications. Nevertheless, the vast majority of algorithms oriented towards the temporal alignment of time-series are applied directly on the observation space, or utilise simple linear projections. Thus, they fail to capture complex, hierarchical non-linear representations which may prove to be beneficial towards the task of temporal alignment, particularly when dealing with multi-modal data (e.g., aligning visual and acoustic information). To this end, we present the Deep Canonical Time Warping (DCTW), a method which automatically learns complex non-linear representations of multiple time-series, generated such that (i) they are highly correlated, and (ii) temporally in alignment. By means of experiments on four real datasets, we show that the representations learnt via the proposed DCTW significantly outperform state-of-the-art methods in temporal alignment, elegantly handling scenarios with highly heterogeneous features, such as the temporal alignment of acoustic and visual features

Goldsmiths Research Online

Crossref

University of Oulu Repository - Jultika

Spiral - Imperial College Digital Repository

Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network

Author: Brueckner Raymond
Marchi Erik
Nicolaou Mihalis A.
Ringeval Fabien
Schuller Björn
Trigeorgis George
Zafeiriou Stefanos
Publication venue
Publication date: 21/12/2015
Field of study

The automatic recognition of spontaneous emotions from speech is a challenging task. On the one hand, acoustic features need to be robust enough to capture the emotional content for various styles of speaking, and while on the other, machine learning algorithms need to be insensitive to outliers while being able to model the context. Whereas the latter has been tackled by the use of Long Short-Term Memory (LSTM) networks, the former is still under very active investigations, even though more than a decade of research has provided a large set of acoustic descriptors. In this paper, we propose a solution to the problem of `context-aware' emotional relevant feature extraction, by combining Convolutional Neural Networks (CNNs) with LSTM networks, in order to automatically learn the best representation of the speech signal directly from the raw time representation. In this novel work on the so-called end-to-end speech emotion recognition, we show that the use of the proposed topology significantly outperforms the traditional approaches based on signal processing techniques for the prediction of spontaneous and natural emotions on the RECOLA database

OPUS Augsburg

Goldsmiths Research Online

Crossref

Spiral - Imperial College Digital Repository

Dyadic Speech-based Affect Recognition using DAMI-P2C Parent-child Multimodal Interaction Dataset

Author: Chung Junyoung
Fujita Yusuke
Gordon Goren
Hoover-Dempsey Kathleen V
Ioffe Sergey
Kamphaus Randy W
McNab Katrina
Neuman Susan B
Park Hae Won
Rudovic O.
Sainath Tara N
Spaulding Samuel
Stewart Angela
Trigeorgis George
Zhang Zixing
Zhao Huijuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/08/2020
Field of study

Automatic speech-based affect recognition of individuals in dyadic conversation is a challenging task, in part because of its heavy reliance on manual pre-processing. Traditional approaches frequently require hand-crafted speech features and segmentation of speaker turns. In this work, we design end-to-end deep learning methods to recognize each person's affective expression in an audio stream with two speakers, automatically discovering features and time regions relevant to the target speaker's affect. We integrate a local attention mechanism into the end-to-end architecture and compare the performance of three attention implementations -- one mean pooling and two weighted pooling methods. Our results show that the proposed weighted-pooling attention solutions are able to learn to focus on the regions containing target speaker's affective information and successfully extract the individual's valence and arousal intensity. Here we introduce and use a "dyadic affect in multimodal interaction - parent to child" (DAMI-P2C) dataset collected in a study of 34 families, where a parent and a child (3-7 years old) engage in reading storybooks together. In contrast to existing public datasets for affect recognition, each instance for both speakers in the DAMI-P2C dataset is annotated for the perceived affect by three labelers. To encourage more research on the challenging task of multi-speaker affect sensing, we make the annotated DAMI-P2C dataset publicly available, including acoustic features of the dyads' raw audios, affect annotations, and a diverse set of developmental, social, and demographic profiles of each dyad.Comment: Accepted by the 2020 International Conference on Multimodal Interaction (ICMI'20

arXiv.org e-Print Archive

Crossref

Deep component analysis: algorithms and applications

Author: Trigeorgis George
Publication venue: Computing, Imperial College London
Publication date: 01/03/2018
Field of study

Component Analysis (CA) methods have been crucial contributors for the large success of machine learning over the past decades. Although predominately all CA methods are linear models, depending on the formulation of the optimisation problem one can derive vastly different results tailored to different tasks. Such linear methods have the natural advantage of intepretability as they can easily be reasoned about, but also they are easy to fit onto the available data. Unfortunately, as they are mostly linear models they can not describe complex data distributions such as in-the-wild images of faces or, videos, or even auditory signals. On the other hand deep learning methodologies have excelled in modelling highly non-linear relations between highly heterogenious data distributions. Nonetheless, they were mostly used as black-boxes and usually required orders of magnitude more training samples than their linear counterparts to achieve similar performance. In this thesis we will aim to combine the best of both worlds. That is to incorporate the power of neural networks with the statistical intuition and the specially crafted ideas of component analysis methods. The result methodologies will have a diverse application set, solving problems from the areas of face clustering, timeseries alignment, domain adaptation, and face alignment in a mainly unsupervised way.Open Acces

Spiral - Imperial College Digital Repository

A deep matrix factorization method for learning attribute representations

Author: Bousmalis Konstantinos
Schuller Björn
Trigeorgis George
Zafeiriou Stefanos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2016
Field of study

Semi-Non-negative Matrix Factorization is a technique that learns a low-dimensional representation of a dataset that lends itself to a clustering interpretation. It is possible that the mapping between this new representation and our original data matrix contains rather complex hierarchical information with implicit lower-level hidden attributes, that classical one level clustering methodologies can not interpret. In this work we propose a novel model, Deep Semi-NMF, that is able to learn such hidden representations that allow themselves to an interpretation of clustering according to different, unknown attributes of a given dataset. We also present a semisupervised version of the algorithm, named Deep WSF, that allows the use of (partial) prior information for each of the known attributes of a dataset, that allows the model to be used on datasets with mixed attribute knowledge. Finally, we show that our models are able to learn low-dimensional representations that are better suited for clustering, but also classification, outperforming Semi-Non-negative Matrix Factorization, but also other state-of-the-art methodologies variants

OPUS Augsburg

Spiral - Imperial College Digital Repository

DenseReg: fully convolutional dense shape regression in-the-wild

Author: Antonakos Epameinondas
Guler Riza Alp
Kokkinos Iasonas
Snape Patrick
Trigeorgis George
Zafeiriou Stefanos
Zhou Yuxiang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/03/2017
Field of study

In this paper we propose to learn a mapping from image pixels into a dense template grid through a fully convolutional network. We formulate this task as a regression problem and train our network by leveraging upon manually annotated facial landmarks “in-the-wild”. We use such landmarks to establish a dense correspondence field between a three-dimensional object template and the input image, which then serves as the ground-truth for training our regression system. We show that we can combine ideas from semantic segmentation with regression networks, yielding a highly-accurate ‘quantized regression’ architecture. Our system, called DenseReg, allows us to estimate dense image-to-template correspondences in a fully convolutional manner. As such our network can provide useful correspondence information as a stand-alone system, while when used as an initialization for Statistical Deformable Models we obtain landmark localization results that largely outperform the current state-of-the-art on the challenging 300W benchmark. We thoroughly evaluate our method on a host of facial analysis tasks, and demonstrate its use for other correspondence estimation tasks, such as the human body and the human ear. DenseReg code is made available at http://alpguler.com/DenseReg.html along with supplementary materials

HAL-CentraleSupelec

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

UCL Discovery

HAL Descartes

Spiral - Imperial College Digital Repository

Hal-Diderot

HAL-Rennes 1

Investment under uncertainty and volatility estimation risk

Author: George Dotsis
Raphael Nicholas Markellos
Trigeorgis L.
Vasiliki Makropoulou
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2012
Field of study

This article considers the implications of volatility estimation risk in real options theory. We construct confidence intervals for critical project values and options prices. An empirical example in lease investment evaluation for an offshore petroleum tract shows that confidence intervals can be substantial when a limited amount of data are used to estimate volatility

Crossref

University of East Anglia digital repository