5 research outputs found

    Faked Speech Detection with Zero Knowledge

    Full text link
    Audio is one of the most used ways of human communication, but at the same time it can be easily misused to trick people. With the revolution of AI, the related technologies are now accessible to almost everyone thus making it simple for the criminals to commit crimes and forgeries. In this work, we introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked; the word 'blindly' refers to the ability to detect mimicked audio without references or real sources. The proposed model was trained on a set of important features extracted from a large dataset of audios to get a classifier that was tested on the same set of features from different audios. The data was extracted from two raw datasets, especially composed for this work; an all English dataset and a mixed dataset (Arabic plus English). These datasets have been made available, in raw form, through GitHub for the use of the research community at https://github.com/SaSs7/Dataset. For the purpose of comparison, the audios were also classified through human inspection with the subjects being the native speakers. The ensued results were interesting and exhibited formidable accuracy.Comment: 14 pages, 4 figures (6 if you count subfigures), 2 table

    Replay detection in voice biometrics: an investigation of adaptive and non-adaptive front-ends

    Full text link
    Among various physiological and behavioural traits, speech has gained popularity as an effective mode of biometric authentication. Even though they are gaining popularity, automatic speaker verification systems are vulnerable to malicious attacks, known as spoofing attacks. Among various types of spoofing attacks, replay attack poses the biggest threat due to its simplicity and effectiveness. This thesis investigates the importance of 1) improving front-end feature extraction via novel feature extraction techniques and 2) enhancing spectral components via adaptive front-end frameworks to improve replay attack detection. This thesis initially focuses on AM-FM modelling techniques and their use in replay attack detection. A novel method to extract the sub-band frequency modulation (FM) component using the spectral centroid of a signal is proposed, and its use as a potential acoustic feature is also discussed. Frequency Domain Linear Prediction (FDLP) is explored as a method to obtain the temporal envelope of a speech signal. The temporal envelope carries amplitude modulation (AM) information of speech resonances. Several features are extracted from the temporal envelope and the FDLP residual signal. These features are then evaluated for replay attack detection and shown to have significant capability in discriminating genuine and spoofed signals. Fusion of AM and FM-based features has shown that AM and FM carry complementary information that helps distinguish replayed signals from genuine ones. The importance of frequency band allocation when creating filter banks is studied as well to further advance the understanding of front-ends for replay attack detection. Mechanisms inspired by the human auditory system that makes the human ear an excellent spectrum analyser have been investigated and integrated into front-ends. Spatial differentiation, a mechanism that provides additional sharpening to auditory filters is one of them that is used in this work to improve the selectivity of the sub-band decomposition filters. Two features are extracted using the improved filter bank front-end: spectral envelope centroid magnitude (SECM) and spectral envelope centroid frequency (SECF). These are used to establish the positive effect of spatial differentiation on discriminating spoofed signals. Level-dependent filter tuning, which allows the ear to handle a large dynamic range, is integrated into the filter bank to further improve the front-end. This mechanism converts the filter bank into an adaptive one where the selectivity of the filters is varied based on the input signal energy. Experimental results show that this leads to improved spoofing detection performance. Finally, deep neural network (DNN) mechanisms are integrated into sub-band feature extraction to develop an adaptive front-end that adjusts its characteristics based on the sub-band signals. A DNN-based controller that takes sub-band FM components as input, is developed to adaptively control the selectivity and sensitivity of a parallel filter bank to enhance the artifacts that differentiate a replayed signal from a genuine signal. This work illustrates gradient-based optimization of a DNN-based controller using the feedback from a spoofing detection back-end classifier, thus training it to reduce spoofing detection error. The proposed framework has displayed a superior ability in identifying high-quality replayed signals compared to conventional non-adaptive frameworks. All techniques proposed in this thesis have been evaluated on well-established databases on replay attack detection and compared with state-of-the-art baseline systems

    Exploring the use of Technology for Assessment and Intensive Treatment of Childhood Apraxia of Speech

    Get PDF
    Given the rapid advances in technology over the past decade, this thesis examines the potential for automatic speech recognition (ASR) technology to expedite the process of objective analysis of speech, particularly for lexical stress patterns in childhood apraxia of speech. This dissertation also investigates the potential for mobile technology to bridge the gap between current service delivery models in Australia and best practice treatment intensity for CAS. To address these two broad aims, this thesis describes three main projects. The first is a systematic literature review summarising the development, implementation and accuracy of automatic speech analysis tools when applied to evaluation and modification of children’s speech production skills. Guided by the results of the systematic review, the second project presents data on the accuracy and clinical utility of a custom-designed lexical stress classification tool, designed as part of a multi-component speech analysis system for a mobile therapy application, Tabby Talks, for use with children with CAS. The third project is a randomised control trial exploring the effect of different types of feedback on response to intervention for children with CAS. The intervention was designed to specifically explore the feasibility and effectiveness of using an app equipped with ASR technology to provide feedback on speech production accuracy during home practice sessions, simulating the common service delivery model in Australia. The thesis concludes with a discussion of future directions for technology-based speech assessment and intensive speech production practice, guidelines for future development of therapy tools that include more game-based practice activities and the contexts in which children can be transferred from predominantly clinician-delivered augmented feedback to ASR-delivered right/wrong feedback and continue to make optimal gains in acquisition and retention of speech production targets

    Advanced Biometrics with Deep Learning

    Get PDF
    Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

    Selected Papers from the 5th International Electronic Conference on Sensors and Applications

    Get PDF
    This Special Issue comprises selected papers from the proceedings of the 5th International Electronic Conference on Sensors and Applications, held on 15–30 November 2018, on sciforum.net, an online platform for hosting scholarly e-conferences and discussion groups. In this 5th edition of the electronic conference, contributors were invited to provide papers and presentations from the field of sensors and applications at large, resulting in a wide variety of excellent submissions and topic areas. Papers which attracted the most interest on the web or that provided a particularly innovative contribution were selected for publication in this collection. These peer-reviewed papers are published with the aim of rapid and wide dissemination of research results, developments, and applications. We hope this conference series will grow rapidly in the future and become recognized as a new way and venue by which to (electronically) present new developments related to the field of sensors and their applications
    corecore