13,689 research outputs found
BridgeNets: Student-Teacher Transfer Learning Based on Recursive Neural Networks and its Application to Distant Speech Recognition
Despite the remarkable progress achieved on automatic speech recognition,
recognizing far-field speeches mixed with various noise sources is still a
challenging task. In this paper, we introduce novel student-teacher transfer
learning, BridgeNet which can provide a solution to improve distant speech
recognition. There are two key features in BridgeNet. First, BridgeNet extends
traditional student-teacher frameworks by providing multiple hints from a
teacher network. Hints are not limited to the soft labels from a teacher
network. Teacher's intermediate feature representations can better guide a
student network to learn how to denoise or dereverberate noisy input. Second,
the proposed recursive architecture in the BridgeNet can iteratively improve
denoising and recognition performance. The experimental results of BridgeNet
showed significant improvements in tackling the distant speech recognition
problem, where it achieved up to 13.24% relative WER reductions on AMI corpus
compared to a baseline neural network without teacher's hints.Comment: Accepted to 2018 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP 2018
Third Person References: Forms and Functions in Two Spoken Genres of Spanish
This volume, a case study on the grammar of third person references in two genres of spoken Ecuadorian Spanish, examines from a discourse-analytic perspective how genre affects linguistic patterns and how researchers can look for and interpret genre effects. This marks a timely contribution to corpus linguistics, as many linguists are choosing to work with empirical data. Corpus based approaches have many advantages and are useful in the comparison of different languages as well as varieties of the same language, but what is often overlooked in such comparisons is the genre of language under examination. As this case study shows, genre is an important factor in interpreting patterns and distributions of forms.
The book also contributes toward theories of anaphora, referentiality and Preferred Argument Structure. It is relevant for scholars who work with referentiality, genre differences, third person references, and interactional linguistics, as well as those interested in Spanish morphosyntax. [From the Publisher]https://cupola.gettysburg.edu/books/1096/thumbnail.jp
Semi-Supervised Speech Emotion Recognition with Ladder Networks
Speech emotion recognition (SER) systems find applications in various fields
such as healthcare, education, and security and defense. A major drawback of
these systems is their lack of generalization across different conditions. This
problem can be solved by training models on large amounts of labeled data from
the target domain, which is expensive and time-consuming. Another approach is
to increase the generalization of the models. An effective way to achieve this
goal is by regularizing the models through multitask learning (MTL), where
auxiliary tasks are learned along with the primary task. These methods often
require the use of labeled data which is computationally expensive to collect
for emotion recognition (gender, speaker identity, age or other emotional
descriptors). This study proposes the use of ladder networks for emotion
recognition, which utilizes an unsupervised auxiliary task. The primary task is
a regression problem to predict emotional attributes. The auxiliary task is the
reconstruction of intermediate feature representations using a denoising
autoencoder. This auxiliary task does not require labels so it is possible to
train the framework in a semi-supervised fashion with abundant unlabeled data
from the target domain. This study shows that the proposed approach creates a
powerful framework for SER, achieving superior performance than fully
supervised single-task learning (STL) and MTL baselines. The approach is
implemented with several acoustic features, showing that ladder networks
generalize significantly better in cross-corpus settings. Compared to the STL
baselines, the proposed approach achieves relative gains in concordance
correlation coefficient (CCC) between 3.0% and 3.5% for within corpus
evaluations, and between 16.1% and 74.1% for cross corpus evaluations,
highlighting the power of the architecture
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
Connecting the Missing Link: Bringing Together Global Philanthropists and Global Community Philanthropy Organizations
In a project begun in 2011, Synergos brought together individual philanthropists and leaders of community philanthropy organizations (CPOs) from around the world to learn about and understand the potentially transformative benefits of forming partnerships to address societal problems.This project has opened a number of doors to creating opportunities for community foundations and philanthropists to extend their reach as well as significantly increase the impact of their work. It has substantially raised awareness and has also created safe spaces for constructive dialogue on how to move forward in working together. These spaces can now be transformed into more practical "laboratories" to address community problems
- …