Search CORE

18,908 research outputs found

Support vector machine regression for robust speaker verification in mismatching and forensic conditions

Author: González-Rodríguez Joaquín
López Moreno Ignacio
Mateos García Ismael
Ramos Daniel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-01793-3_50Proceedings of Third International Conference, ICB 2009, Alghero, ItalyIn this paper we propose the use of Support Vector Machine Regression (SVR) for robust speaker verification in two scenarios: i) strong mismatch in speech conditions and ii) forensic environment. The proposed approach seeks robustness to situations where a proper background database is reduced or not present, a situation typical in forensic cases which has been called database mismatch. For the mismatching condition scenario, we use the NIST SRE 2008 core task as a highly variable environment, but with a mostly representative background set coming from past NIST evaluations. For the forensic scenario, we use the Ahumada III database, a public corpus in Spanish coming from real authored forensic cases collected by Spanish Guardia Civil. We show experiments illustrating the robustness of a SVR scheme using a GLDS kernel under strong session variability, even when no session variability is applied, and especially in the forensic scenario, under database mismatch.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-0

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes

Author: Anguera
Baby
Bagchi
Barker
Barker
Castro Martinez
DiBiase
Du
Emmanuel Vincent
Fletcher
Frigge
Fujita
Garofalo
Hermansky
Heymann
Hirsch
Hori
Jalalvand
Jon Barker
Kim
Loesch
Ma
Mestre
Mikolov
Moritz
Mostefa
Parihar
Pfeifenberger
Povey
Prudnikov
Renals
Ricard Marxer
Shinji Watanabe
Sivasankaran
Taal
Tachioka
Taghia
Veselý
Vincent
Vincent
Vu
Yoshioka
Zhao
Zhuang
Publication venue: 'Elsevier BV'
Publication date: 15/10/2016
Field of study

This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations

Crossref

INRIA a CCSD electronic archive server

White Rose Research Online

HAL-Rennes 1

Topographic controls of CH4 and N2O fluxes from temperate and boreal forest soils in eastern Canada.

Author: Moore Tim R.
Ullah Sami
Publication venue
Publication date: 01/01/2009
Field of study

Lancaster E-Prints

Machine learning for automatic analysis of affective behaviour

Author: Nicolaou Michael (Mihalis)
Publication venue: Computing, Imperial College London
Publication date: 01/03/2015
Field of study

The automated analysis of affect has been gaining rapidly increasing attention by researchers over the past two decades, as it constitutes a fundamental step towards achieving next-generation computing technologies and integrating them into everyday life (e.g. via affect-aware, user-adaptive interfaces, medical imaging, health assessment, ambient intelligence etc.). The work presented in this thesis focuses on several fundamental problems manifesting in the course towards the achievement of reliable, accurate and robust affect sensing systems. In more detail, the motivation behind this work lies in recent developments in the field, namely (i) the creation of large, audiovisual databases for affect analysis in the so-called ''Big-Data`` era, along with (ii) the need to deploy systems under demanding, real-world conditions. These developments led to the requirement for the analysis of emotion expressions continuously in time, instead of merely processing static images, thus unveiling the wide range of temporal dynamics related to human behaviour to researchers. The latter entails another deviation from the traditional line of research in the field: instead of focusing on predicting posed, discrete basic emotions (happiness, surprise etc.), it became necessary to focus on spontaneous, naturalistic expressions captured under settings more proximal to real-world conditions, utilising more expressive emotion descriptions than a set of discrete labels. To this end, the main motivation of this thesis is to deal with challenges arising from the adoption of continuous dimensional emotion descriptions under naturalistic scenarios, considered to capture a much wider spectrum of expressive variability than basic emotions, and most importantly model emotional states which are commonly expressed by humans in their everyday life. In the first part of this thesis, we attempt to demystify the quite unexplored problem of predicting continuous emotional dimensions. This work is amongst the first to explore the problem of predicting emotion dimensions via multi-modal fusion, utilising facial expressions, auditory cues and shoulder gestures. A major contribution of the work presented in this thesis lies in proposing the utilisation of various relationships exhibited by emotion dimensions in order to improve the prediction accuracy of machine learning methods - an idea which has been taken on by other researchers in the field since. In order to experimentally evaluate this, we extend methods such as the Long Short-Term Memory Neural Networks (LSTM), the Relevance Vector Machine (RVM) and Canonical Correlation Analysis (CCA) in order to exploit output relationships in learning. As it is shown, this increases the accuracy of machine learning models applied to this task. The annotation of continuous dimensional emotions is a tedious task, highly prone to the influence of various types of noise. Performed real-time by several annotators (usually experts), the annotation process can be heavily biased by factors such as subjective interpretations of the emotional states observed, the inherent ambiguity of labels related to human behaviour, the varying reaction lags exhibited by each annotator as well as other factors such as input device noise and annotation errors. In effect, the annotations manifest a strong spatio-temporal annotator-specific bias. Failing to properly deal with annotation bias and noise leads to an inaccurate ground truth, and therefore to ill-generalisable machine learning models. This deems the proper fusion of multiple annotations, and the inference of a clean, corrected version of the ``ground truth'' as one of the most significant challenges in the area. A highly important contribution of this thesis lies in the introduction of Dynamic Probabilistic Canonical Correlation Analysis (DPCCA), a method aimed at fusing noisy continuous annotations. By adopting a private-shared space model, we isolate the individual characteristics that are annotator-specific and not shared, while most importantly we model the common, underlying annotation which is shared by annotators (i.e., the derived ground truth). By further learning temporal dynamics and incorporating a time-warping process, we are able to derive a clean version of the ground truth given multiple annotations, eliminating temporal discrepancies and other nuisances. The integration of the temporal alignment process within the proposed private-shared space model deems DPCCA suitable for the problem of temporally aligning human behaviour; that is, given temporally unsynchronised sequences (e.g., videos of two persons smiling), the goal is to generate the temporally synchronised sequences (e.g., the smile apex should co-occur in the videos). Temporal alignment is an important problem for many applications where multiple datasets need to be aligned in time. Furthermore, it is particularly suitable for the analysis of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. A highly challenging scenario is when the observations are perturbed by gross, non-Gaussian noise (e.g., occlusions), as is often the case when analysing data acquired under real-world conditions. To account for non-Gaussian noise, a robust variant of Canonical Correlation Analysis (RCCA) for robust fusion and temporal alignment is proposed. The model captures the shared, low-rank subspace of the observations, isolating the gross noise in a sparse noise term. RCCA is amongst the first robust variants of CCA proposed in literature, and as we show in related experiments outperforms other, state-of-the-art methods for related tasks such as the fusion of multiple modalities under gross noise. Beyond private-shared space models, Component Analysis (CA) is an integral component of most computer vision systems, particularly in terms of reducing the usually high-dimensional input spaces in a meaningful manner pertaining to the task-at-hand (e.g., prediction, clustering). A final, significant contribution of this thesis lies in proposing the first unifying framework for probabilistic component analysis. The proposed framework covers most well-known CA methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), providing further theoretical insights into the workings of CA. Moreover, the proposed framework is highly flexible, enabling novel CA methods to be generated by simply manipulating the connectivity of latent variables (i.e. the latent neighbourhood). As shown experimentally, methods derived via the proposed framework outperform other equivalents in several problems related to affect sensing and facial expression analysis, while providing advantages such as reduced complexity and explicit variance modelling.Open Acces

Spiral - Imperial College Digital Repository

Immersive interconnected virtual and augmented reality : a 5G and IoT perspective

Author: Abadal Sergi
Cabellos-Aparicio Albert
De Turck Filip
Ergut Salih
Famaey Jeroen
Grimm Christoph
Jain Akshay
Kalem Gokhan
Liaskos Christos
Mach Marian
Mouhouche Belkacem
Papapetrou Evangelos
Sabol Tomas
Torres Vega Maria
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Despite remarkable advances, current augmented and virtual reality (AR/VR) applications are a largely individual and local experience. Interconnected AR/VR, where participants can virtually interact across vast distances, remains a distant dream. The great barrier that stands between current technology and such applications is the stringent end-to-end latency requirement, which should not exceed 20 ms in order to avoid motion sickness and other discomforts. Bringing AR/VR to the next level to enable immersive interconnected AR/VR will require significant advances towards 5G ultra-reliable low-latency communication (URLLC) and a Tactile Internet of Things (IoT). In this article, we articulate the technical challenges to enable a future AR/VR end-to-end architecture, that combines 5G URLLC and Tactile IoT technology to support this next generation of interconnected AR/VR applications. Through the use of IoT sensors and actuators, AR/VR applications will be aware of the environmental and user context, supporting human-centric adaptations of the application logic, and lifelike interactions with the virtual environment. We present potential use cases and the required technological building blocks. For each of them, we delve into the current state of the art and challenges that need to be addressed before the dream of remote AR/VR interaction can become reality

UPCommons. Portal del coneixement obert de la UPC

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

Halogenated very short lived substances in the tropical western Pacific region

Author: Atlas E. L.
Fuhlbruegge Steffen
Hepach Helmke
Krüger Kirstin
Quack Birgit
Raimund Stefan
Shi Qiang
Publication venue
Publication date: 01/01/2012
Field of study

OceanRep

Hippocampal and parahippocampal grey matter structural integrity assessed by multimodal imaging is associated with episodic memory in old age

Author: Brandmaier A.
Düzel S.
Köhncke Y.
Kühn S.
Lindenberger U.
Sander M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

MPG.PuRe

Underwater Fish Detection with Weak Multi-Domain Supervision

Author: Bradley Michael
Konovalov Dmitry A.
Marini Simone
Saleh Alzayat
Sankupellay Mangalam
Sheaves Marcus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Given a sufficiently large training dataset, it is relatively easy to train a modern convolution neural network (CNN) as a required image classifier. However, for the task of fish classification and/or fish detection, if a CNN was trained to detect or classify particular fish species in particular background habitats, the same CNN exhibits much lower accuracy when applied to new/unseen fish species and/or fish habitats. Therefore, in practice, the CNN needs to be continuously fine-tuned to improve its classification accuracy to handle new project-specific fish species or habitats. In this work we present a labelling-efficient method of training a CNN-based fish-detector (the Xception CNN was used as the base) on relatively small numbers (4,000) of project-domain underwater fish/no-fish images from 20 different habitats. Additionally, 17,000 of known negative (that is, missing fish) general-domain (VOC2012) above-water images were used. Two publicly available fish-domain datasets supplied additional 27,000 of above-water and underwater positive/fish images. By using this multi-domain collection of images, the trained Xception-based binary (fish/not-fish) classifier achieved 0.17% false-positives and 0.61% false-negatives on the project's 20,000 negative and 16,000 positive holdout test images, respectively. The area under the ROC curve (AUC) was 99.94%.Comment: Published in the 2019 International Joint Conference on Neural Networks (IJCNN-2019), Budapest, Hungary, July 14-19, 2019, https://www.ijcnn.org/ , https://ieeexplore.ieee.org/document/885190

arXiv.org e-Print Archive

ResearchOnline@JCU

Crossref

ResearchOnline at James Cook University