Search CORE

13,689 research outputs found

BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition

Author: El-Khamy Mostafa
Kim Jaeyoung
Lee Jungwon
Publication venue
Publication date: 21/02/2018
Field of study

Despite the remarkable progress achieved on automatic speech recognition, recognizing far-field speeches mixed with various noise sources is still a challenging task. In this paper, we introduce novel student-teacher transfer learning, BridgeNet which can provide a solution to improve distant speech recognition. There are two key features in BridgeNet. First, BridgeNet extends traditional student-teacher frameworks by providing multiple hints from a teacher network. Hints are not limited to the soft labels from a teacher network. Teacher's intermediate feature representations can better guide a student network to learn how to denoise or dereverberate noisy input. Second, the proposed recursive architecture in the BridgeNet can iteratively improve denoising and recognition performance. The experimental results of BridgeNet showed significant improvements in tackling the distant speech recognition problem, where it achieved up to 13.24% relative WER reductions on AMI corpus compared to a baseline neural network without teacher's hints.Comment: Accepted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018

arXiv.org e-Print Archive

Crossref

Third Person References: Forms and Functions in Two Spoken Genres of Spanish

Author: Dumont Jenny
Publication venue: The Cupola: Scholarship at Gettysburg College
Publication date: 01/01/2016
Field of study

This volume, a case study on the grammar of third person references in two genres of spoken Ecuadorian Spanish, examines from a discourse-analytic perspective how genre affects linguistic patterns and how researchers can look for and interpret genre effects. This marks a timely contribution to corpus linguistics, as many linguists are choosing to work with empirical data. Corpus based approaches have many advantages and are useful in the comparison of different languages as well as varieties of the same language, but what is often overlooked in such comparisons is the genre of language under examination. As this case study shows, genre is an important factor in interpreting patterns and distributions of forms. The book also contributes toward theories of anaphora, referentiality and Preferred Argument Structure. It is relevant for scholars who work with referentiality, genre differences, third person references, and interactional linguistics, as well as those interested in Spanish morphosyntax. [From the Publisher]https://cupola.gettysburg.edu/books/1096/thumbnail.jp

Gettysburg College

Semi-Supervised Speech Emotion Recognition with Ladder Networks

Author: Busso Carlos
Parthasarathy Srinivas
Publication venue
Publication date: 08/05/2019
Field of study

Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of ladder networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that ladder networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture

arXiv.org e-Print Archive

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

Author: Alameda-Pineda Xavier
Batrinca Ligia
Lanz Oswald
Lepri Bruno
Ricci Elisa
Sebe Nicu
Staiano Jacopo
Subramanian Ramanathan
Publication venue
Publication date: 23/06/2015
Field of study

Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

University of Canberra Research Repository

Connecting the Missing Link: Bringing Together Global Philanthropists and Global Community Philanthropy Organizations

Author: Ann Graham
Publication venue: Synergos Institute
Publication date: 06/06/2015
Field of study

In a project begun in 2011, Synergos brought together individual philanthropists and leaders of community philanthropy organizations (CPOs) from around the world to learn about and understand the potentially transformative benefits of forming partnerships to address societal problems.This project has opened a number of doors to creating opportunities for community foundations and philanthropists to extend their reach as well as significantly increase the impact of their work. It has substantially raised awareness and has also created safe spaces for constructive dialogue on how to move forward in working together. These spaces can now be transformed into more practical "laboratories" to address community problems

IssueLab