115,935 research outputs found
Semi-Supervised Speech Emotion Recognition with Ladder Networks
Speech emotion recognition (SER) systems find applications in various fields
such as healthcare, education, and security and defense. A major drawback of
these systems is their lack of generalization across different conditions. This
problem can be solved by training models on large amounts of labeled data from
the target domain, which is expensive and time-consuming. Another approach is
to increase the generalization of the models. An effective way to achieve this
goal is by regularizing the models through multitask learning (MTL), where
auxiliary tasks are learned along with the primary task. These methods often
require the use of labeled data which is computationally expensive to collect
for emotion recognition (gender, speaker identity, age or other emotional
descriptors). This study proposes the use of ladder networks for emotion
recognition, which utilizes an unsupervised auxiliary task. The primary task is
a regression problem to predict emotional attributes. The auxiliary task is the
reconstruction of intermediate feature representations using a denoising
autoencoder. This auxiliary task does not require labels so it is possible to
train the framework in a semi-supervised fashion with abundant unlabeled data
from the target domain. This study shows that the proposed approach creates a
powerful framework for SER, achieving superior performance than fully
supervised single-task learning (STL) and MTL baselines. The approach is
implemented with several acoustic features, showing that ladder networks
generalize significantly better in cross-corpus settings. Compared to the STL
baselines, the proposed approach achieves relative gains in concordance
correlation coefficient (CCC) between 3.0% and 3.5% for within corpus
evaluations, and between 16.1% and 74.1% for cross corpus evaluations,
highlighting the power of the architecture
GP-SUM. Gaussian Processes Filtering of non-Gaussian Beliefs
This work studies the problem of stochastic dynamic filtering and state
propagation with complex beliefs. The main contribution is GP-SUM, a filtering
algorithm tailored to dynamic systems and observation models expressed as
Gaussian Processes (GP), and to states represented as a weighted sum of
Gaussians. The key attribute of GP-SUM is that it does not rely on
linearizations of the dynamic or observation models, or on unimodal Gaussian
approximations of the belief, hence enables tracking complex state
distributions. The algorithm can be seen as a combination of a sampling-based
filter with a probabilistic Bayes filter. On the one hand, GP-SUM operates by
sampling the state distribution and propagating each sample through the dynamic
system and observation models. On the other hand, it achieves effective
sampling and accurate probabilistic propagation by relying on the GP form of
the system, and the sum-of-Gaussian form of the belief. We show that GP-SUM
outperforms several GP-Bayes and Particle Filters on a standard benchmark. We
also demonstrate its use in a pushing task, predicting with experimental
accuracy the naturally occurring non-Gaussian distributions.Comment: WAFR 2018, 16 pages, 7 figure
An augmented three-pass system combination framework: DCU combination system for WMT 2010
This paper describes the augmented threepass
system combination framework of
the Dublin City University (DCU) MT
group for the WMT 2010 system combination
task. The basic three-pass framework
includes building individual confusion
networks (CNs), a super network, and
a modified Minimum Bayes-risk (mCon-
MBR) decoder. The augmented parts for
WMT2010 tasks include 1) a rescoring
component which is used to re-rank the
N-best lists generated from the individual
CNs and the super network, 2) a new hypothesis
alignment metric â TERp â that
is used to carry out English-targeted hypothesis
alignment, and 3) more different
backbone-based CNs which are employed
to increase the diversity of the
mConMBR decoding phase. We took
part in the combination tasks of Englishto-
Czech and French-to-English. Experimental
results show that our proposed
combination framework achieved 2.17 absolute
points (13.36 relative points) and
1.52 absolute points (5.37 relative points)
in terms of BLEU score on English-to-
Czech and French-to-English tasks respectively
than the best single system. We
also achieved better performance on human
evaluation
Applying digital content management to support localisation
The retrieval and presentation of digital content such as that on the World Wide Web (WWW) is a substantial area of research. While recent years have seen huge expansion in the size of web-based archives that can be searched efficiently by commercial search engines, the presentation of potentially relevant content is still limited to ranked document lists represented by simple text snippets or image keyframe surrogates. There is expanding interest in techniques to personalise the presentation of content to improve the richness and effectiveness of the user experience. One of the most significant challenges to achieving this is the increasingly multilingual nature of this data, and the need to provide suitably localised responses to users based on this content. The Digital Content Management (DCM) track of the Centre for Next Generation Localisation (CNGL) is seeking to develop technologies to support advanced personalised access and presentation of information by combining elements from the existing research areas of Adaptive Hypermedia and Information Retrieval. The combination of these technologies is intended to produce significant improvements in the way users access information. We review key features of these technologies and introduce early ideas for how these technologies can support localisation and localised content before concluding with some impressions of future directions in DCM
- âŠ