194 research outputs found
Design of a Controlled Language for Critical Infrastructures Protection
We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates
from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically
represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of
traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an
analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
A Review of Deep Learning Techniques for Speech Processing
The field of speech processing has undergone a transformative shift with the
advent of deep learning. The use of multiple processing layers has enabled the
creation of models capable of extracting intricate features from speech data.
This development has paved the way for unparalleled advancements in speech
recognition, text-to-speech synthesis, automatic speech recognition, and
emotion recognition, propelling the performance of these tasks to unprecedented
heights. The power of deep learning techniques has opened up new avenues for
research and innovation in the field of speech processing, with far-reaching
implications for a range of industries and applications. This review paper
provides a comprehensive overview of the key deep learning models and their
applications in speech-processing tasks. We begin by tracing the evolution of
speech processing research, from early approaches, such as MFCC and HMM, to
more recent advances in deep learning architectures, such as CNNs, RNNs,
transformers, conformers, and diffusion models. We categorize the approaches
and compare their strengths and weaknesses for solving speech-processing tasks.
Furthermore, we extensively cover various speech-processing tasks, datasets,
and benchmarks used in the literature and describe how different deep-learning
networks have been utilized to tackle these tasks. Additionally, we discuss the
challenges and future directions of deep learning in speech processing,
including the need for more parameter-efficient, interpretable models and the
potential of deep learning for multimodal speech processing. By examining the
field's evolution, comparing and contrasting different approaches, and
highlighting future directions and challenges, we hope to inspire further
research in this exciting and rapidly advancing field
Modelling talking human faces
This thesis investigates a number of new approaches for visual speech
synthesis using data-driven methods to implement a talking face.
The main contributions in this thesis are the following. The accuracy
of shared Gaussian process latent variable model (SGPLVM)
built using the active appearance model (AAM) and relative spectral
transform-perceptual linear prediction (RASTAPLP) features is improved
by employing a more accurate AAM. This is the first study
to report that using a more accurate AAM improves the accuracy of
SGPLVM. Objective evaluation via reconstruction error is performed
to compare the proposed approach against previously existing methods.
In addition, it is shown experimentally that the accuracy of AAM
can be improved by using a larger number of landmarks and/or larger
number of samples in the training data.
The second research contribution is a new method for visual speech
synthesis utilising a fully Bayesian method namely the manifold relevance
determination (MRD) for modelling dynamical systems through
probabilistic non-linear dimensionality reduction. This is the first time
MRD was used in the context of generating talking faces from the
input speech signal. The expressive power of this model is in the ability
to consider non-linear mappings between audio and visual features
within a Bayesian approach. An efficient latent space has been learnt
iii
Abstract iv
using a fully Bayesian latent representation relying on conditional nonlinear
independence framework. In the SGPLVM the structure of the
latent space cannot be automatically estimated because of using a maximum
likelihood formulation. In contrast to SGPLVM the Bayesian approaches
allow the automatic determination of the dimensionality of the
latent spaces. The proposed method compares favourably against several
other state-of-the-art methods for visual speech generation, which
is shown in quantitative and qualitative evaluation on two different
datasets.
Finally, the possibility of incremental learning of AAM for inclusion
in the proposed MRD approach for visual speech generation is
investigated. The quantitative results demonstrate that using MRD in
conjunction with incremental AAMs produces only slightly less accurate
results than using batch methods. These results support a way of
training this kind of models on computers with limited resources, for
example in mobile computing.
Overall, this thesis proposes several improvements to the current
state-of-the-art in generating talking faces from speech signal leading
to perceptually more convincing results
Negative vaccine voices in Swedish social media
Vaccinations are one of the most significant interventions to public health, but vaccine hesitancy creates concerns for a portion of the population in many countries, including Sweden. Since discussions on vaccine hesitancy are often taken on social networking sites, data from Swedish social media are used to study and quantify the sentiment among the discussants on the vaccination-or-not topic during phases of the COVID-19 pandemic. Out of all the posts analyzed a majority showed a stronger negative sentiment, prevailing throughout the whole of the examined period, with some spikes or jumps due to the occurrence of certain vaccine-related events distinguishable in the results. Sentiment analysis can be a valuable tool to track public opinions regarding the use, efficacy, safety, and importance of vaccination
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
Pragmatics and Prosody
Most of the papers collected in this book resulted from presentations and discussions undertaken during the V Lablita Workshop that took place at the Federal University of Minas Gerais, Brazil, on August 23-25, 2011. The workshop was held in conjunction with the II Brazilian Seminar on Pragmatics and Prosody. The guiding themes for the joint event were illocution, modality, attitude, information patterning and speech annotation. Thus, all papers presented here are concerned with theoretical and methodological issues related to the study of speech. Among the papers in this volume, there are different theoretical orientations, which are mirrored through the methodological designs of studies pursued. However, all papers are based on the analysis of actual speech, be it from corpora or from experimental contexts trying to emulate natural speech. Prosody is the keyword that comes out from all the papers in this publication, which indicates the high standing of this category in relation to studies that are geared towards the understanding of major elements that are constitutive of the structuring of speech
The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe
Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009
Methods in prosody
This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study
- …