Search CORE

62 research outputs found

EmoFake: An Initial Dataset for Emotion Fake Audio Detection

Author: Dong Yongfeng
Tao Jianhua
Wang Chenglong
Yi Jiangyan
Zhang Xiaohui
Zhao Yan
Publication venue
Publication date: 14/09/2023
Field of study

Many datasets have been designed to further the development of fake audio detection, such as datasets of the ASVspoof and ADD challenges. However, these datasets do not consider a situation that the emotion of the audio has been changed from one to another, while other information (e.g. speaker identity and content) remains the same. Changing the emotion of an audio can lead to semantic changes. Speech with tampered semantics may pose threats to people's lives. Therefore, this paper reports our progress in developing such an emotion fake audio detection dataset involving changing emotion state of the origin audio named EmoFake. The fake audio in EmoFake is generated by open source emotion voice conversion models. Furthermore, we proposed a method named Graph Attention networks using Deep Emotion embedding (GADE) for the detection of emotion fake audio. Some benchmark experiments are conducted on this dataset. The results show that our designed dataset poses a challenge to the fake audio detection model trained with the LA dataset of ASVspoof 2019. The proposed GADE shows good performance in the face of emotion fake audio

arXiv.org e-Print Archive

Audio Deepfake Detection: A Survey

Author: Tao Jianhua
Wang Chenglong
Yi Jiangyan
Zhang Chu Yuan
Zhang Xiaohui
Zhao Yan
Publication venue
Publication date: 28/08/2023
Field of study

Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results

arXiv.org e-Print Archive

Learning to Behave Like Clean Speech: Dual-Branch Knowledge Distillation for Noise-Robust Fake Audio Detection

Author: Ding Mingming
Fan Cunhang
Fu Ruibo
Lv Zhao
Tao Jianhua
Wen Zhengqi
Yi Jiangyan
Publication venue
Publication date: 13/10/2023
Field of study

Most research in fake audio detection (FAD) focuses on improving performance on standard noise-free datasets. However, in actual situations, there is usually noise interference, which will cause significant performance degradation in FAD systems. To improve the noise robustness, we propose a dual-branch knowledge distillation fake audio detection (DKDFAD) method. Specifically, a parallel data flow of the clean teacher branch and the noisy student branch is designed, and interactive fusion and response-based teacher-student paradigms are proposed to guide the training of noisy data from the data distribution and decision-making perspectives. In the noise branch, speech enhancement is first introduced for denoising, which reduces the interference of strong noise. The proposed interactive fusion combines denoising features and noise features to reduce the impact of speech distortion and seek consistency with the data distribution of clean branch. The teacher-student paradigm maps the student's decision space to the teacher's decision space, making noisy speech behave as clean. In addition, a joint training method is used to optimize the two branches to achieve global optimality. Experimental results based on multiple datasets show that the proposed method performs well in noisy environments and maintains performance in cross-dataset experiments

arXiv.org e-Print Archive

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

Author: Fan Cunhang
Huang Wei
Lv Zhao
Tao Jianhua
Wu Xiaopei
Xue Jun
Yi Jiangyan
Zhang Hongyu
Publication venue
Publication date: 07/09/2023
Field of study

Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times

arXiv.org e-Print Archive

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Author: Fan Cunhang
Lv Zhao
Shao Shegang
Tao Jianhua
Wen Zhengqi
Xue Jun
Yi Jiangyan
Yuan Minmin
Zheng Chengshi
Publication venue
Publication date: 01/08/2022
Field of study

Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific information in the subband, and these features also lose information such as phase. Inspired by the mechanism of synthetic speech, the fundamental frequency (F0) information is used to improve the quality of synthetic speech, while the F0 of synthetic speech is still too average, which differs significantly from that of real speech. It is expected that F0 can be used as important information to discriminate between bonafide and fake speech, while this information cannot be used directly due to the irregular distribution of F0. Insteadly, the frequency band containing most of F0 is selected as the input feature. Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately. Finally, the results of F0, real and imaginary spectrogram features are fused. Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems

arXiv.org e-Print Archive

基于雷电物理的风机叶片动态击距与电气几何模型

Author: Guo Zixin
Li Qingmin
Ma Yufei
Siew Wah Hoon
Wang Guozheng
Yan Jiangyan
Zhang Li
Zhao Tong
Zou Liang
Publication venue
Publication date: 05/12/2017
Field of study

The damage of wind turbine blades suffered lightning strikes has been a key factor of the safe and reliable operation of wind farms. The electric geometrical model of wind turbine blades (EGMTB) was presented based on the traditional electric geometrical method and the physical process of lightning leader. The concept of dynamic striking distance was introduced and clarified the physical meaning of striking distance. And the calculation method of blade lightning protection system (LPS) efficiency was deduced. Finally, the effectiveness of EGMTB was validated by the long gap breakdown experiment of blades. The EGMTB was used to analyze the influence factors of blade LPS efficiency. It is indicated that the efficiency of blade LPS reduces with the decrease of lightning current and the angle between the blade and horizontal, and the efficiency of blade LPS can be improved by increasing the side lightning receptors. The EGMTB is intended to provide a theory for lightning protection design and evaluation of wind turbine blades

University of Strathclyde Institutional Repository

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Author: Cambria Erik
Lian Zheng
Liu Bin
Liu Ye
Schuller Björn W.
Sun Haiyang
Sun Licai
Tao Jianhua
Wang Meng
Yi Jiangyan
Zhao Guoying
Zhao Jinming
Publication venue
Publication date: 18/04/2023
Field of study

Over the past few decades, multimodal emotion recognition has made remarkable progress with the development of deep learning. However, existing technologies are difficult to meet the demand for practical applications. To improve the robustness, we launch a Multimodal Emotion Recognition Challenge (MER 2023) to motivate global researchers to build innovative technologies that can further accelerate and foster research. For this year's challenge, we present three distinct sub-challenges: (1) MER-MULTI, in which participants recognize both discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to test videos for modality robustness evaluation; (3) MER-SEMI, which provides large amounts of unlabeled samples for semi-supervised learning. In this paper, we test a variety of multimodal features and provide a competitive baseline for each sub-challenge. Our system achieves 77.57% on the F1 score and 0.82 on the mean squared error (MSE) for MER-MULTI, 69.82% on the F1 score and 1.12 on MSE for MER-NOISE, and 86.75% on the F1 score for MER-SEMI, respectively. Baseline code is available at https://github.com/zeroQiaoba/MER2023-Baseline

arXiv.org e-Print Archive

Tirzepatide ameliorates spatial learning and memory impairment through modulation of aberrant insulin resistance and inflammation response in diabetic rats

Author: Changhan Ouyang
Chao Liu
Jiangyan Zhao
Min Lei
Min Wu
Qingjie Chen
Xiaosong Yang
Xiufen Liu
Xiying Guo
Zhanhong Ren
Publication venue: Frontiers Media S.A.
Publication date: 01/08/2023
Field of study

Background: One of the typical symptoms of diabetes mellitus patients was memory impairment, which was followed by gradual cognitive deterioration and for which there is no efficient treatment. The anti-diabetic incretin hormones glucose-dependent insulinotropic polypeptide (GIP) and glucagon-like peptide-1 (GLP-1) were demonstrated to have highly neuroprotective benefits in animal models of AD. We wanted to find out how the GLP-1/GIP dual agonist tirzepatide affected diabetes’s impairment of spatial learning memory.Methods: High fat diet and streptozotocin injection-induced diabetic rats were injected intraperitoneally with Tirzepatide (1.35 mg/kg) once a week. The protective effects were assessed using the Morris water maze test, immunofluorescence, and Western blot analysis. Golgi staining was adopted for quantified dendritic spines.Results: Tirzepatide significantly improved impaired glucose tolerance, fasting blood glucose level, and insulin level in diabetic rats. Then, tirzepatide dramatically alleviated spatial learning and memory impairment, inhibited Aβ accumulation, prevented structural damage, boosted the synthesis of synaptic proteins and increased dendritic spines formation in diabetic hippocampus. Furthermore, some aberrant changes in signal molecules concerning inflammation signaling pathways were normalized after tirzepatide treatment in diabetic rats. Finally, PI3K/Akt/GSK3β signaling pathway was restored by tirzepatide.Conclusion: Tirzepatide obviously exerts a protective effect against spatial learning and memory impairment, potentially through regulating abnormal insulin resistance and inflammatory responses

Directory of Open Access Journals

ADD 2023: the Second Audio Deepfake Detection Challenge

Author: Fu Ruibo
Gu Hao
Li Haizhou
Lian Zheng
Liang Shan
Nie Shuai
Ren Yong
Tao Jianhua
Wang Chenglong
Wang Tao
Wen Zhengqi
Xu Le
Yan Xinrui
Yi Jiangyan
Zhang Chu Yuan
Zhang Xiaohui
Zhao Yan
Zhou Junzuo
Publication venue
Publication date: 23/05/2023
Field of study

Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks

arXiv.org e-Print Archive

Bayesian Optimization via Exact Penalty

Author: Jiangyan Zhao (9425534)
Jin Xu (31283)
Publication venue
Publication date: 07/02/2024
Field of study

Constrained optimization problems pose challenges when the objective function and constraints are nonconvex and their evaluation requires expensive black-box simulations. Recently, hybrid optimization methods that integrate statistical surrogate modeling with numerical optimization algorithms have shown great promise, as they inherit the properties of global convergence from statistical surrogate modeling and fast local convergence from numerical optimization algorithms. However, the computational efficiency is not satisfied by practical needs under limited budgets and in the presence of equality constraints. In this article, we propose a novel hybrid optimization method, called exact penalty Bayesian optimization (EPBO), which employs Bayesian optimization within the exact penalty framework. We model the composite penalty function by a weighted sum of Gaussian processes, where the qualitative components of the constraint violations are smoothed by their predictive means. The proposed method features (i) closed-form acquisition functions, (ii) robustness to initial designs, (iii) the capability to start from infeasible points, and (iv) effective handling of equality constraints. We demonstrate the superiority of EPBO to state-of-the-art competitors using a suite of benchmark synthetic test problems and two real-world engineering design problems.</p

FigShare