186 research outputs found
Towards Multilingual Automatic Dialogue Evaluation
The main limiting factor in the development of robust multilingual dialogue
evaluation metrics is the lack of multilingual data and the limited
availability of open sourced multilingual dialogue systems. In this work, we
propose a workaround for this lack of data by leveraging a strong multilingual
pretrained LLM and augmenting existing English dialogue data using Machine
Translation. We empirically show that the naive approach of finetuning a
pretrained multilingual encoder model with translated data is insufficient to
outperform the strong baseline of finetuning a multilingual model with only
source data. Instead, the best approach consists in the careful curation of
translated data using MT Quality Estimation metrics, excluding low quality
translations that hinder its performance.Comment: SIGDIAL2
Disfluency Detection Across Domains
This paper focuses on disfluency detection across distinct domains using a large set of openSMILE features, derived from the Interspeech 2013 Paralinguistic challenge. Amongst different machine learning methods being applied, SVMs achieved the best performance. Feature selection experiments revealed that the dimensionality of the larger set of features can be further reduced at the cost of a small degradation. Different models trained with one corpus were tested on the other corpus, revealing that models can be quite robust across corpora for this task, despite their distinct nature. We have conducted additional experiments aiming at disfluency prediction in the context of IVR systems, and results reveal that there is no substantial degradation on the performance, encouraging the use of the models in IVR domains.info:eu-repo/semantics/publishedVersio
Towards End-to-End Private Automatic Speaker Recognition
The development of privacy-preserving automatic speaker verification systems
has been the focus of a number of studies with the intent of allowing users to
authenticate themselves without risking the privacy of their voice. However,
current privacy-preserving methods assume that the template voice
representations (or speaker embeddings) used for authentication are extracted
locally by the user. This poses two important issues: first, knowledge of the
speaker embedding extraction model may create security and robustness
liabilities for the authentication system, as this knowledge might help
attackers in crafting adversarial examples able to mislead the system; second,
from the point of view of a service provider the speaker embedding extraction
model is arguably one of the most valuable components in the system and, as
such, disclosing it would be highly undesirable. In this work, we show how
speaker embeddings can be extracted while keeping both the speaker's voice and
the service provider's model private, using Secure Multiparty Computation.
Further, we show that it is possible to obtain reasonable trade-offs between
security and computational cost. This work is complementary to those showing
how authentication may be performed privately, and thus can be considered as
another step towards fully private automatic speaker recognition.Comment: Accepted for publication at Interspeech 202
Assessing User Expertise in Spoken Dialog System Interactions
Identifying the level of expertise of its users is important for a system
since it can lead to a better interaction through adaptation techniques.
Furthermore, this information can be used in offline processes of root cause
analysis. However, not much effort has been put into automatically identifying
the level of expertise of an user, especially in dialog-based interactions. In
this paper we present an approach based on a specific set of task related
features. Based on the distribution of the features among the two classes -
Novice and Expert - we used Random Forests as a classification approach.
Furthermore, we used a Support Vector Machine classifier, in order to perform a
result comparison. By applying these approaches on data from a real system,
Let's Go, we obtained preliminary results that we consider positive, given the
difficulty of the task and the lack of competing approaches for comparison.Comment: 10 page
Privacy-preserving Automatic Speaker Diarization
Automatic Speaker Diarization (ASD) is an enabling technology with numerous
applications, which deals with recordings of multiple speakers, raising special
concerns in terms of privacy. In fact, in remote settings, where recordings are
shared with a server, clients relinquish not only the privacy of their
conversation, but also of all the information that can be inferred from their
voices. However, to the best of our knowledge, the development of
privacy-preserving ASD systems has been overlooked thus far. In this work, we
tackle this problem using a combination of two cryptographic techniques, Secure
Multiparty Computation (SMC) and Secure Modular Hashing, and apply them to the
two main steps of a cascaded ASD system: speaker embedding extraction and
agglomerative hierarchical clustering. Our system is able to achieve a
reasonable trade-off between performance and efficiency, presenting real-time
factors of 1.1 and 1.6, for two different SMC security settings
Privacy-oriented manipulation of speaker representations
Speaker embeddings are ubiquitous, with applications ranging from speaker
recognition and diarization to speech synthesis and voice anonymisation. The
amount of information held by these embeddings lends them versatility, but also
raises privacy concerns. Speaker embeddings have been shown to contain
information on age, sex, health and more, which speakers may want to keep
private, especially when this information is not required for the target task.
In this work, we propose a method for removing and manipulating private
attributes from speaker embeddings that leverages a Vector-Quantized
Variational Autoencoder architecture, combined with an adversarial classifier
and a novel mutual information loss. We validate our model on two attributes,
sex and age, and perform experiments with ignorant and fully-informed
attackers, and with in-domain and out-of-domain data
Improving ASR error detection with non-decoder based features
Abstract This study reports error detection experiments in large vocabulary automatic speech recognition (ASR) systems, by using statistical classifiers. We explored new features gathered from other knowledge sources than the decoder itself: a binary feature that compares outputs from two different ASR systems (word by word), a feature based on the number of hits of the hypothesized bigrams, obtained by queries entered into a very popular Web search engine, and finally a feature related to automatically infered topics at sentence and word levels. Experiments were conducted on a European Portuguese broadcast news corpus. The combination of baseline decoder-based features and two of these additional features led to significant improvements, from 13.87% to 12.16% classification error rate (CER) with a maximum entropy model, and from 14.01% to 12.39% CER with linear-chain conditional random fields, comparing to a baseline using only decoder-based features
Epigenética nas Perturbações de Ansiedade
Introdução: As perturbações de ansiedade são das perturbações psiquiátricas mais comuns,
caracterizadas por medo excessivo, ansiedade e evitação de ameaças. São uma das principais
causas de incapacidade mundial, com uma prevalência cada vez maior. Está normalmente
associada a outras comorbilidades como depressão ou abuso de substâncias. A patogénese das
perturbações de ansiedade é complexa, envolvendo interações entre fatores biológicos,
influências do ambiente e mecanismos psicológicos.
Objetivos: Esta dissertação tem como objetivo reunir a literatura sobre o papel dos fatores
epigenéticos no desenvolvimento de perturbações de ansiedade.
Métodos: A pesquisa foi realizada na plataforma PubMed entre setembro de 2021 até abril de
2022.
Resultados: A análise da literatura sobre as influências da epigenética no desenvolvimento
de perturbações de ansiedade implica o envolvimento de genes que regulam o eixo hipotálamohipofise-adrenal, sistemas neurotransmissores ou plasticidade neuronal, e vários estudos
sugerem que a disrupção da expressão genética destes genes através de mecanismos de
epigenética contribuem para a patogénese de perturbações de ansiedade.
Conclusão: A investigação sobre qual a contribuição dos mecanismos epigenéticos para a
suscetibilidade ou resistência para o desenvolvimento das perturbações de ansiedade ainda se
encontra na sua infância pelo que será necessária mais pesquisa para clarificar detalhes nesta
área.Introduction: Anxiety disorders are some of the most common psychiatric disorders. They’re
characterized by excessive fear, anxiety, and threat avoidance. They’re one of the leading
causes for disability worldwide, with a rising prevalence. It’s usually associated with
comorbidities like depression and substance abuse. Anxiety disorder’s pathogenesis is
complex, involving interactions between biology factors, environment influences and
psychological mechanisms.
Objectives: This dissertation aims to report on current literature findings about the role of
epigenetic factors on the development of anxiety disorders.
Methods: The research was done on the platform PubMed between the dates September 2021
until April 2022.
Results: Current literature about epigenetics’ influence on the development of anxiety
disorders implicates genes that regulate the axis hypothalamic-pituitary-adrenal,
neurotransmitter systems, and neuroplasticity, and several studies suggest the disruption of
the expression of these genes through epigenetic mechanisms contribute to the pathogenesis
of AD.
Conclusion: Research about the epigenetic mechanisms’ contribution towards susceptibility
or resilience in developing anxiety disorders is still in its infancy and requires more more work
to clarify details in the area
- …