Search CORE

34 research outputs found

Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

Author: Tao Jianhua
Wang Chenglong
Yan Xinrui
Yi Jiangyan
Zhang Chu Yuan
Publication venue
Publication date: 13/09/2023
Field of study

Recent strides in neural speech synthesis technologies, while enjoying widespread applications, have nonetheless introduced a series of challenges, spurring interest in the defence against the threat of misuse and abuse. Notably, source attribution of synthesized speech has value in forensics and intellectual property protection, but prior work in this area has certain limitations in scope. To address the gaps, we present our findings concerning the identification of the sources of synthesized speech in this paper. We investigate the existence of speech synthesis model fingerprints in the generated speech waveforms, with a focus on the acoustic model and the vocoder, and study the influence of each component on the fingerprint in the overall speech waveforms. Our research, conducted using the multi-speaker LibriTTS dataset, demonstrates two key insights: (1) vocoders and acoustic models impart distinct, model-specific fingerprints on the waveforms they generate, and (2) vocoder fingerprints are the more dominant of the two, and may mask the fingerprints from the acoustic model. These findings strongly suggest the existence of model-specific fingerprints for both the acoustic model and the vocoder, highlighting their potential utility in source identification applications.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Audio Deepfake Detection: A Survey

Author: Tao Jianhua
Wang Chenglong
Yi Jiangyan
Zhang Chu Yuan
Zhang Xiaohui
Zhao Yan
Publication venue
Publication date: 28/08/2023
Field of study

Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results

arXiv.org e-Print Archive

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Author: Fan Cunhang
Lv Zhao
Shao Shegang
Tao Jianhua
Wen Zhengqi
Xue Jun
Yi Jiangyan
Yuan Minmin
Zheng Chengshi
Publication venue
Publication date: 01/08/2022
Field of study

Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific information in the subband, and these features also lose information such as phase. Inspired by the mechanism of synthetic speech, the fundamental frequency (F0) information is used to improve the quality of synthetic speech, while the F0 of synthetic speech is still too average, which differs significantly from that of real speech. It is expected that F0 can be used as important information to discriminate between bonafide and fake speech, while this information cannot be used directly due to the irregular distribution of F0. Insteadly, the frequency band containing most of F0 is selected as the input feature. Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately. Finally, the results of F0, real and imaginary spectrogram features are fused. Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems

arXiv.org e-Print Archive

ADD 2023: the Second Audio Deepfake Detection Challenge

Author: Fu Ruibo
Gu Hao
Li Haizhou
Lian Zheng
Liang Shan
Nie Shuai
Ren Yong
Tao Jianhua
Wang Chenglong
Wang Tao
Wen Zhengqi
Xu Le
Yan Xinrui
Yi Jiangyan
Zhang Chu Yuan
Zhang Xiaohui
Zhao Yan
Zhou Junzuo
Publication venue
Publication date: 23/05/2023
Field of study

Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks

arXiv.org e-Print Archive

Complete convergence and complete moment convergence for arrays of rowwise ANA random variables

Author: A Adler
A Adler
Bin Wang
DM Yuan
Haiwu Huang
JF Wang
JF Wang
Jiangyan Peng
K Budsaba
LX Zhang
LX Zhang
LX Zhang
MH Ko
PL Hsu
SH Sung
SH Sung
XC Zhou
XD Liu
Xiongtao Wu
XL Tan
Y Zhang
YF Wu
YS Chow
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Synthesis of Analcime Crystals and Simultaneous Potassium Extraction from Natrolite Syenite

Author: Changjiang Liu
Hongwen Ma
Jian Chen
Jiangyan Yuan
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Analcime single crystals were successfully synthesized from natrolite syenite powder (K2O 10.89%) and 92.6% of potassium was extracted simultaneously by means of soda roasting followed by alkali-hydrothermal method. Effects of NaOH concentration, reaction temperature, and holding period on the analcime formation and potassium extraction were investigated systemically. The results indicated that NaOH concentration plays an important role in determining the chemical composition of zeolites and size distribution; by turning the NaOH concentrations, three different pure zeolites (i.e., the phillipsite-Na, the analcime, and the sodalite) were prepared. Besides, a higher temperature could accelerate the dissolution of K+ ions and enhance the crystallinity degree of zeolite. The reactions involved in the analcime synthesis can be summarized as follows: sodium aluminum silicate dissolution → precipitation and dissolution of metastable zeolite-P → analcime nucleation → analcime growth. The extraction ratio of K+ is associated with the types of synthesized zeolites, among which analcime is the most effective to promote potassium leaching out from zeolite lattice position. The optimal condition for analcime crystallization and K+ leaching is found to be as follows: 175°C for 4 h in 0.5 mol/L NaOH solution

Directory of Open Access Journals

Optic Disc and Cup Segmentation in Retinal Images for Glaucoma Diagnosis by Locally Statistical Active Contour Model with Structure Prior

Author: Jiangyan Dai
Wei Zhou
Yuan Gao
Yugen Yi
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

Recent Advances in Field Effect Transistor Biosensors: Designing Strategies and Applications for Sensitive Assay

Author: Jiangyan Yuan
Lei Liu
Lingli Wu
Ruisha Hao
Shengbin Lei
Publication venue: 'MDPI AG'
Publication date: 01/03/2023
Field of study

In comparison with traditional clinical diagnosis methods, field–effect transistor (FET)–based biosensors have the advantages of fast response, easy miniaturization and integration for high–throughput screening, which demonstrates their great technical potential in the biomarker detection platform. This mini review mainly summarizes recent advances in FET biosensors. Firstly, the review gives an overview of the design strategies of biosensors for sensitive assay, including the structures of devices, functionalization methods and semiconductor materials used. Having established this background, the review then focuses on the following aspects: immunoassay based on a single biosensor for disease diagnosis; the efficient integration of FET biosensors into a large–area array, where multiplexing provides valuable insights for high–throughput testing options; and the integration of FET biosensors into microfluidics, which contributes to the rapid development of lab–on–chip (LOC) sensing platforms and the integration of biosensors with other types of sensors for multifunctional applications. Finally, we summarize the long–term prospects for the commercialization of FET sensing systems

Directory of Open Access Journals

Dynamic Soft Real-Time Scheduling with Preemption Threshold for Streaming Media

Author: JiangYan Dai
Wenle Wang
Yuan Wang
Zhonghua Cao
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

Synthesis of KAlSiO4 by Hydrothermal Processing on Biotite Syenite and Dissolution Reaction Kinetics

Author: Hongwen Ma
Jiangyan Yuan
Qian Guo
Xi Ma
Zheng Luo
Publication venue: 'MDPI AG'
Publication date: 30/12/2020
Field of study

To make potassium from K-bearing rocks accessible to agriculture, processing on biotite syenite powder under mild alkaline hydrothermal conditions was carried out, in which two types of KAlSiO4 were obtained successfully. The dissolution-precipitation process of silicate rocks is a significant process in lithospheric evolution. Its effective utilization will be of importance for realizing the comprehensiveness of aluminosilicate minerals in nature. Two kinds of KAlSiO4 were precipitated in sequence during the dissolution process of biotite syenite. The crystal structures of two kinds of KAlSiO4 were compared by Rietveld structure refinements. The kinetics model derived from geochemical research was adopted to describe the dissolution behavior. The reaction order and apparent activation energy at the temperature range of 240–300 °C were 2.992 and 97.41 kJ/mol, respectively. The higher dissolution reaction rate of K-feldspar mainly relies on the alkaline solution, which gives rise to higher reaction order. During the dissolution-precipitation process of K-feldspar, two types of KAlSiO4 with different crystal structure were precipitated. This study provides novel green chemical routes for the comprehensive utilization of potassium-rich silicates

Multidisciplinary Digital Publishing Institute