62 research outputs found
EmoFake: An Initial Dataset for Emotion Fake Audio Detection
Many datasets have been designed to further the development of fake audio
detection, such as datasets of the ASVspoof and ADD challenges. However, these
datasets do not consider a situation that the emotion of the audio has been
changed from one to another, while other information (e.g. speaker identity and
content) remains the same. Changing the emotion of an audio can lead to
semantic changes. Speech with tampered semantics may pose threats to people's
lives. Therefore, this paper reports our progress in developing such an emotion
fake audio detection dataset involving changing emotion state of the origin
audio named EmoFake. The fake audio in EmoFake is generated by open source
emotion voice conversion models. Furthermore, we proposed a method named Graph
Attention networks using Deep Emotion embedding (GADE) for the detection of
emotion fake audio. Some benchmark experiments are conducted on this dataset.
The results show that our designed dataset poses a challenge to the fake audio
detection model trained with the LA dataset of ASVspoof 2019. The proposed GADE
shows good performance in the face of emotion fake audio
Audio Deepfake Detection: A Survey
Audio deepfake detection is an emerging active topic. A growing number of
literatures have aimed to study deepfake detection algorithms and achieved
effective performance, the problem of which is far from being solved. Although
there are some review literatures, there has been no comprehensive survey that
provides researchers with a systematic overview of these developments with a
unified evaluation. Accordingly, in this survey paper, we first highlight the
key differences across various types of deepfake audio, then outline and
analyse competitions, datasets, features, classifications, and evaluation of
state-of-the-art approaches. For each aspect, the basic techniques, advanced
developments and major challenges are discussed. In addition, we perform a
unified comparison of representative features and classifiers on ASVspoof 2021,
ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively.
The survey shows that future research should address the lack of large scale
datasets in the wild, poor generalization of existing detection methods to
unknown fake attacks, as well as interpretability of detection results
Learning to Behave Like Clean Speech: Dual-Branch Knowledge Distillation for Noise-Robust Fake Audio Detection
Most research in fake audio detection (FAD) focuses on improving performance
on standard noise-free datasets. However, in actual situations, there is
usually noise interference, which will cause significant performance
degradation in FAD systems. To improve the noise robustness, we propose a
dual-branch knowledge distillation fake audio detection (DKDFAD) method.
Specifically, a parallel data flow of the clean teacher branch and the noisy
student branch is designed, and interactive fusion and response-based
teacher-student paradigms are proposed to guide the training of noisy data from
the data distribution and decision-making perspectives. In the noise branch,
speech enhancement is first introduced for denoising, which reduces the
interference of strong noise. The proposed interactive fusion combines
denoising features and noise features to reduce the impact of speech distortion
and seek consistency with the data distribution of clean branch. The
teacher-student paradigm maps the student's decision space to the teacher's
decision space, making noisy speech behave as clean. In addition, a joint
training method is used to optimize the two branches to achieve global
optimality. Experimental results based on multiple datasets show that the
proposed method performs well in noisy environments and maintains performance
in cross-dataset experiments
DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection
Auditory Attention Detection (AAD) aims to detect target speaker from brain
signals in a multi-speaker environment. Although EEG-based AAD methods have
shown promising results in recent years, current approaches primarily rely on
traditional convolutional neural network designed for processing Euclidean data
like images. This makes it challenging to handle EEG signals, which possess
non-Euclidean characteristics. In order to address this problem, this paper
proposes a dynamical graph self-distillation (DGSD) approach for AAD, which
does not require speech stimuli as input. Specifically, to effectively
represent the non-Euclidean properties of EEG signals, dynamical graph
convolutional networks are applied to represent the graph structure of EEG
signals, which can also extract crucial features related to auditory spatial
attention in EEG signals. In addition, to further improve AAD detection
performance, self-distillation, consisting of feature distillation and
hierarchical distillation strategies at each layer, is integrated. These
strategies leverage features and classification results from the deepest
network layers to guide the learning of shallow layers. Our experiments are
conducted on two publicly available datasets, KUL and DTU. Under a 1-second
time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU,
respectively. We compare our DGSD method with competitive baselines, and the
experimental results indicate that the detection performance of our proposed
DGSD method is not only superior to the best reproducible baseline but also
significantly reduces the number of trainable parameters by approximately 100
times
Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Recently, pioneer research works have proposed a large number of acoustic
features (log power spectrogram, linear frequency cepstral coefficients,
constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining
good performance, and showing that different subbands have different
contributions to audio deepfake detection. However, this lacks an explanation
of the specific information in the subband, and these features also lose
information such as phase. Inspired by the mechanism of synthetic speech, the
fundamental frequency (F0) information is used to improve the quality of
synthetic speech, while the F0 of synthetic speech is still too average, which
differs significantly from that of real speech. It is expected that F0 can be
used as important information to discriminate between bonafide and fake speech,
while this information cannot be used directly due to the irregular
distribution of F0. Insteadly, the frequency band containing most of F0 is
selected as the input feature. Meanwhile, to make full use of the phase and
full-band information, we also propose to use real and imaginary spectrogram
features as complementary input features and model the disjoint subbands
separately. Finally, the results of F0, real and imaginary spectrogram features
are fused. Experimental results on the ASVspoof 2019 LA dataset show that our
proposed system is very effective for the audio deepfake detection task,
achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all
systems
基于雷电物理的风机叶片动态击距与 电气几何模型
The damage of wind turbine blades suffered lightning strikes has been a key factor of the safe and reliable operation of wind farms. The electric geometrical model of wind turbine blades (EGMTB) was presented based on the traditional electric geometrical method and the physical process of lightning leader. The concept of dynamic striking distance was introduced and clarified the physical meaning of striking distance. And the calculation method of blade lightning protection system (LPS) efficiency was deduced. Finally, the effectiveness of EGMTB was validated by the long gap breakdown experiment of blades. The EGMTB was used to analyze the influence factors of blade LPS efficiency. It is indicated that the efficiency of blade LPS reduces with the decrease of lightning current and the angle between the blade and horizontal, and the efficiency of blade LPS can be improved by increasing the side lightning receptors. The EGMTB is intended to provide a theory for lightning protection design and evaluation of wind turbine blades
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning
Over the past few decades, multimodal emotion recognition has made remarkable
progress with the development of deep learning. However, existing technologies
are difficult to meet the demand for practical applications. To improve the
robustness, we launch a Multimodal Emotion Recognition Challenge (MER 2023) to
motivate global researchers to build innovative technologies that can further
accelerate and foster research. For this year's challenge, we present three
distinct sub-challenges: (1) MER-MULTI, in which participants recognize both
discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to
test videos for modality robustness evaluation; (3) MER-SEMI, which provides
large amounts of unlabeled samples for semi-supervised learning. In this paper,
we test a variety of multimodal features and provide a competitive baseline for
each sub-challenge. Our system achieves 77.57% on the F1 score and 0.82 on the
mean squared error (MSE) for MER-MULTI, 69.82% on the F1 score and 1.12 on MSE
for MER-NOISE, and 86.75% on the F1 score for MER-SEMI, respectively. Baseline
code is available at https://github.com/zeroQiaoba/MER2023-Baseline
Tirzepatide ameliorates spatial learning and memory impairment through modulation of aberrant insulin resistance and inflammation response in diabetic rats
Background: One of the typical symptoms of diabetes mellitus patients was memory impairment, which was followed by gradual cognitive deterioration and for which there is no efficient treatment. The anti-diabetic incretin hormones glucose-dependent insulinotropic polypeptide (GIP) and glucagon-like peptide-1 (GLP-1) were demonstrated to have highly neuroprotective benefits in animal models of AD. We wanted to find out how the GLP-1/GIP dual agonist tirzepatide affected diabetes’s impairment of spatial learning memory.Methods: High fat diet and streptozotocin injection-induced diabetic rats were injected intraperitoneally with Tirzepatide (1.35 mg/kg) once a week. The protective effects were assessed using the Morris water maze test, immunofluorescence, and Western blot analysis. Golgi staining was adopted for quantified dendritic spines.Results: Tirzepatide significantly improved impaired glucose tolerance, fasting blood glucose level, and insulin level in diabetic rats. Then, tirzepatide dramatically alleviated spatial learning and memory impairment, inhibited Aβ accumulation, prevented structural damage, boosted the synthesis of synaptic proteins and increased dendritic spines formation in diabetic hippocampus. Furthermore, some aberrant changes in signal molecules concerning inflammation signaling pathways were normalized after tirzepatide treatment in diabetic rats. Finally, PI3K/Akt/GSK3β signaling pathway was restored by tirzepatide.Conclusion: Tirzepatide obviously exerts a protective effect against spatial learning and memory impairment, potentially through regulating abnormal insulin resistance and inflammatory responses
ADD 2023: the Second Audio Deepfake Detection Challenge
Audio deepfake detection is an emerging topic in the artificial intelligence
community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to
spur researchers around the world to build new innovative technologies that can
further accelerate and foster research on detecting and analyzing deepfake
speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023
focuses on surpassing the constraints of binary real/fake classification, and
actually localizing the manipulated intervals in a partially fake speech as
well as pinpointing the source responsible for generating any fake audio.
Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio
game sub-challenge. The ADD 2023 challenge includes three subchallenges: audio
fake game (FG), manipulation region location (RL) and deepfake algorithm
recognition (AR). This paper describes the datasets, evaluation metrics, and
protocols. Some findings are also reported in audio deepfake detection tasks
Bayesian Optimization via Exact Penalty
Constrained optimization problems pose challenges when the objective function and constraints are nonconvex and their evaluation requires expensive black-box simulations. Recently, hybrid optimization methods that integrate statistical surrogate modeling with numerical optimization algorithms have shown great promise, as they inherit the properties of global convergence from statistical surrogate modeling and fast local convergence from numerical optimization algorithms. However, the computational efficiency is not satisfied by practical needs under limited budgets and in the presence of equality constraints. In this article, we propose a novel hybrid optimization method, called exact penalty Bayesian optimization (EPBO), which employs Bayesian optimization within the exact penalty framework. We model the composite penalty function by a weighted sum of Gaussian processes, where the qualitative components of the constraint violations are smoothed by their predictive means. The proposed method features (i) closed-form acquisition functions, (ii) robustness to initial designs, (iii) the capability to start from infeasible points, and (iv) effective handling of equality constraints. We demonstrate the superiority of EPBO to state-of-the-art competitors using a suite of benchmark synthetic test problems and two real-world engineering design problems.</p
- …