34 research outputs found

    Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms

    Full text link
    Recent strides in neural speech synthesis technologies, while enjoying widespread applications, have nonetheless introduced a series of challenges, spurring interest in the defence against the threat of misuse and abuse. Notably, source attribution of synthesized speech has value in forensics and intellectual property protection, but prior work in this area has certain limitations in scope. To address the gaps, we present our findings concerning the identification of the sources of synthesized speech in this paper. We investigate the existence of speech synthesis model fingerprints in the generated speech waveforms, with a focus on the acoustic model and the vocoder, and study the influence of each component on the fingerprint in the overall speech waveforms. Our research, conducted using the multi-speaker LibriTTS dataset, demonstrates two key insights: (1) vocoders and acoustic models impart distinct, model-specific fingerprints on the waveforms they generate, and (2) vocoder fingerprints are the more dominant of the two, and may mask the fingerprints from the acoustic model. These findings strongly suggest the existence of model-specific fingerprints for both the acoustic model and the vocoder, highlighting their potential utility in source identification applications.Comment: Submitted to ICASSP 202

    Audio Deepfake Detection: A Survey

    Full text link
    Audio deepfake detection is an emerging active topic. A growing number of literatures have aimed to study deepfake detection algorithms and achieved effective performance, the problem of which is far from being solved. Although there are some review literatures, there has been no comprehensive survey that provides researchers with a systematic overview of these developments with a unified evaluation. Accordingly, in this survey paper, we first highlight the key differences across various types of deepfake audio, then outline and analyse competitions, datasets, features, classifications, and evaluation of state-of-the-art approaches. For each aspect, the basic techniques, advanced developments and major challenges are discussed. In addition, we perform a unified comparison of representative features and classifiers on ASVspoof 2021, ADD 2023 and In-the-Wild datasets for audio deepfake detection, respectively. The survey shows that future research should address the lack of large scale datasets in the wild, poor generalization of existing detection methods to unknown fake attacks, as well as interpretability of detection results

    Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

    Full text link
    Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific information in the subband, and these features also lose information such as phase. Inspired by the mechanism of synthetic speech, the fundamental frequency (F0) information is used to improve the quality of synthetic speech, while the F0 of synthetic speech is still too average, which differs significantly from that of real speech. It is expected that F0 can be used as important information to discriminate between bonafide and fake speech, while this information cannot be used directly due to the irregular distribution of F0. Insteadly, the frequency band containing most of F0 is selected as the input feature. Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately. Finally, the results of F0, real and imaginary spectrogram features are fused. Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems

    ADD 2023: the Second Audio Deepfake Detection Challenge

    Full text link
    Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks

    Synthesis of Analcime Crystals and Simultaneous Potassium Extraction from Natrolite Syenite

    No full text
    Analcime single crystals were successfully synthesized from natrolite syenite powder (K2O 10.89%) and 92.6% of potassium was extracted simultaneously by means of soda roasting followed by alkali-hydrothermal method. Effects of NaOH concentration, reaction temperature, and holding period on the analcime formation and potassium extraction were investigated systemically. The results indicated that NaOH concentration plays an important role in determining the chemical composition of zeolites and size distribution; by turning the NaOH concentrations, three different pure zeolites (i.e., the phillipsite-Na, the analcime, and the sodalite) were prepared. Besides, a higher temperature could accelerate the dissolution of K+ ions and enhance the crystallinity degree of zeolite. The reactions involved in the analcime synthesis can be summarized as follows: sodium aluminum silicate dissolution → precipitation and dissolution of metastable zeolite-P → analcime nucleation → analcime growth. The extraction ratio of K+ is associated with the types of synthesized zeolites, among which analcime is the most effective to promote potassium leaching out from zeolite lattice position. The optimal condition for analcime crystallization and K+ leaching is found to be as follows: 175°C for 4 h in 0.5 mol/L NaOH solution

    Recent Advances in Field Effect Transistor Biosensors: Designing Strategies and Applications for Sensitive Assay

    No full text
    In comparison with traditional clinical diagnosis methods, field–effect transistor (FET)–based biosensors have the advantages of fast response, easy miniaturization and integration for high–throughput screening, which demonstrates their great technical potential in the biomarker detection platform. This mini review mainly summarizes recent advances in FET biosensors. Firstly, the review gives an overview of the design strategies of biosensors for sensitive assay, including the structures of devices, functionalization methods and semiconductor materials used. Having established this background, the review then focuses on the following aspects: immunoassay based on a single biosensor for disease diagnosis; the efficient integration of FET biosensors into a large–area array, where multiplexing provides valuable insights for high–throughput testing options; and the integration of FET biosensors into microfluidics, which contributes to the rapid development of lab–on–chip (LOC) sensing platforms and the integration of biosensors with other types of sensors for multifunctional applications. Finally, we summarize the long–term prospects for the commercialization of FET sensing systems

    Synthesis of KAlSiO4 by Hydrothermal Processing on Biotite Syenite and Dissolution Reaction Kinetics

    No full text
    To make potassium from K-bearing rocks accessible to agriculture, processing on biotite syenite powder under mild alkaline hydrothermal conditions was carried out, in which two types of KAlSiO4 were obtained successfully. The dissolution-precipitation process of silicate rocks is a significant process in lithospheric evolution. Its effective utilization will be of importance for realizing the comprehensiveness of aluminosilicate minerals in nature. Two kinds of KAlSiO4 were precipitated in sequence during the dissolution process of biotite syenite. The crystal structures of two kinds of KAlSiO4 were compared by Rietveld structure refinements. The kinetics model derived from geochemical research was adopted to describe the dissolution behavior. The reaction order and apparent activation energy at the temperature range of 240–300 °C were 2.992 and 97.41 kJ/mol, respectively. The higher dissolution reaction rate of K-feldspar mainly relies on the alkaline solution, which gives rise to higher reaction order. During the dissolution-precipitation process of K-feldspar, two types of KAlSiO4 with different crystal structure were precipitated. This study provides novel green chemical routes for the comprehensive utilization of potassium-rich silicates
    corecore