26 research outputs found

    Non-parallel articulatory-to-acoustic conversion using multiview-based time warping

    Get PDF
    In this paper, we propose a novel algorithm called multiview temporal alignment by dependence maximisation in the latent space (TRANSIENCE) for the alignment of time series consisting of sequences of feature vectors with different length and dimensionality of the feature vectors. The proposed algorithm, which is based on the theory of multiview learning, can be seen as an extension of the well-known dynamic time warping (DTW) algorithm but, as mentioned, it allows the sequences to have different dimensionalities. Our algorithm attempts to find an optimal temporal alignment between pairs of nonaligned sequences by first projecting their feature vectors into a common latent space where both views are maximally similar. To do this, powerful, nonlinear deep neural network (DNN) models are employed. Then, the resulting sequences of embedding vectors are aligned using DTW. Finally, the alignment paths obtained in the previous step are applied to the original sequences to align them. In the paper, we explore several variants of the algorithm that mainly differ in the way the DNNs are trained. We evaluated the proposed algorithm on a articulatory-to-acoustic (A2A) synthesis task involving the generation of audible speech from motion data captured from the lips and tongue of healthy speakers using a technique known as permanent magnet articulography (PMA). In this task, our algorithm is applied during the training stage to align pairs of nonaligned speech and PMA recordings that are later used to train DNNs able to synthesis speech from PMA data. Our results show the quality of speech generated in the nonaligned scenario is comparable to that obtained in the parallel scenario

    On Joint Optimization of Automatic Speaker Verification and Anti-spoofing in the Embedding Space

    Get PDF
    Biometric systems are exposed to spoofing attacks which may compromise their security, and voice biometrics based on automatic speaker verification (ASV), is no exception. To increase the robustness against such attacks, anti-spoofing systems have been proposed for the detection of replay, synthesis and voice conversion-based attacks. However, most proposed anti- spoofing techniques are loosely integrated with the ASV system. In this work, we develop a new integration neural network which jointly processes the embeddings extracted from ASV and anti- spoofing systems in order to detect both zero-effort impostors and spoofing attacks. Moreover, we propose a new loss function based on the minimization of the area under the expected (AUE) performance and spoofability curve (EPSC), which allows us to optimize the integration neural network on the desired operating range in which the biometric system is expected to work. To evaluate our proposals, experiments were carried out on the recent ASVspoof 2019 corpus, including both logical access (LA) and physical access (PA) scenarios. The experimental results show that our proposal clearly outperforms some well-known techniques based on the integration at the score- and embedding- level. Specifically, our proposal achieves up to 23.62% and 22.03% relative equal error rate (EER) improvement over the best performing baseline in the LA and PA scenarios, respectively, as well as relative gains of 27.62% and 29.15% on the AUE metric.Spanish Ministry of Science and Innovation Project No. PID2019-104206GB- I00/AEI/10.13039/50110001103

    PANACEA cough sound-based diagnosis of COVID-19 for the DiCOVA 2021 Challenge

    Get PDF
    The COVID-19 pandemic has led to the saturation of public health services worldwide. In this scenario, the early diagnosis of SARS-Cov-2 infections can help to stop or slow the spread of the virus and to manage the demand upon health services. This is especially important when resources are also being stretched by heightened demand linked to other seasonal diseases, such as the flu. In this context, the organisers of the DiCOVA 2021 challenge have collected a database with the aim of diagnosing COVID-19 through the use of coughing audio samples. This work presents the details of the automatic system for COVID-19 detection from cough recordings presented by team PANACEA. This team consists of researchers from two European academic institutions and one company: EURECOM (France), University of Granada (Spain), and Biometric Vox S.L. (Spain). We developed several systems based on established signal processing and machine learning methods. Our best system employs a Teager energy operator cepstral coefficients (TECCs) based frontend and Light gradient boosting machine (LightGBM) backend. The AUC obtained by this system on the test set is 76.31% which corresponds to a 10% improvement over the official baseline.Comment: Accepted in INTERSPEECH 202
    corecore