466 research outputs found

    Fast Blind Audio Copy-Move Detection and Localization Using Local Feature Tensors in Noise

    Full text link
    The increasing availability of audio editing software altering digital audios and their ease of use allows create forgeries at low cost. A copy-move forgery (CMF) is one of easiest and popular audio forgeries, which created by copying and pasting audio segments within the same audio, and potentially post-processing it. Three main approaches to audio copy-move detection exist nowadays: samples/frames comparison, acoustic features coherence searching and dynamic time warping. But these approaches will suffer from computational complexity and/or sensitive to noise and post-processing. In this paper, we propose a new local feature tensors-based copy-move detection algorithm that can be applied to transformed duplicates detection and localization problem to a special locality sensitive hash like procedure. The experimental results with massive online real-time audios datasets reveal that the proposed technique effectively determines and locating copy-move forgeries even on a forged speech segment are as short as fractional second. This method is also computational efficient and robust against the audios processed with severe nonlinear transformation, such as resampling, filtering, jsittering, compression and cropping, even contaminated with background noise and music. Hence, the proposed technique provides an efficient and reliable way of copy-move forgery detection that increases the credibility of audio in practical forensics application

    ANALYSIS OF THE IMPACT OF DISTORTION ON SOUND RECORDINGS AS ANTI FORENSIC ACTIVITIES

    Get PDF
    Anti-forensics on audio is aimed at complicating investigations on audio forensics, on sound recordings. Sound recordings can be altered or manipulated in various ways as well as the provision of distortion effects on sound recordings. Effect such distortions will make it difficult for investigators to find out the owner of the original voice. Analysis of distortion effects on sound recordings for anti-forensic activities, has not been widely carried out. Distortion can be an effective anti-forensic technique because the sound produced will be noisy, making it difficult for investigators to conduct investigations. In this study, testing was carried out using 3 types of distortion, namely Hard Clipping, Hard Overdrive and Odd Harmonics. To find out the extent to which the three types of distortions make it difficult to identify the owner of the original sound, the variables that affect each type of distortion are set at low, medium, and high levels. Formant values from the original and distorted sound samples were compared for later analysis using the Anova One-Way approach to show whether the original sound was identical and the other three voices were distorted. The test was carried out using 10 sound samples. From the results of the anova analysis, it is known that the types of Distortion of Hard Clipping and Odd Harmonics with variables at high levels can manipulate sound recordings, making it difficult to recognize the authenticity of a sound recording. Unlike the case with the type of Distortion of Hard Overdrive with variable level high low and Hard Clipping and Odd Harmonics with variable level low medium, it proves that sound recordings can still be identified

    ANALYSIS OF THE IMPACT OF DISTORTION ON SOUND RECORDINGS AS ANTI FORENSIC ACTIVITIES

    Get PDF
    Anti-forensics on audio is aimed at complicating investigations on audio forensics, on sound recordings. Sound recordings can be altered or manipulated in various ways as well as the provision of distortion effects on sound recordings. Effect such distortions will make it difficult for investigators to find out the owner of the original voice. Analysis of distortion effects on sound recordings for anti-forensic activities, has not been widely carried out. Distortion can be an effective anti-forensic technique because the sound produced will be noisy, making it difficult for investigators to conduct investigations. In this study, testing was carried out using 3 types of distortion, namely Hard Clipping, Hard Overdrive and Odd Harmonics. To find out the extent to which the three types of distortions make it difficult to identify the owner of the original sound, the variables that affect each type of distortion are set at low, medium, and high levels. Formant values from the original and distorted sound samples were compared for later analysis using the Anova One-Way approach to show whether the original sound was identical and the other three voices were distorted. The test was carried out using 10 sound samples. From the results of the anova analysis, it is known that the types of Distortion of Hard Clipping and Odd Harmonics with variables at high levels can manipulate sound recordings, making it difficult to recognize the authenticity of a sound recording. Unlike the case with the type of Distortion of Hard Overdrive with variable level high low and Hard Clipping and Odd Harmonics with variable level low medium, it proves that sound recordings can still be identified

    Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture

    Full text link
    Synthetic voice and splicing audio clips have been generated to spoof Internet users and artificial intelligence (AI) technologies such as voice authentication. Existing research work treats spoofing countermeasures as a binary classification problem: bonafide vs. spoof. This paper extends the existing Res2Net by involving the recent Conformer block to further exploit the local patterns on acoustic features. Experimental results on ASVspoof 2019 database show that the proposed SE-Res2Net-Conformer architecture is able to improve the spoofing countermeasures performance for the logical access scenario. In addition, this paper also proposes to re-formulate the existing audio splicing detection problem. Instead of identifying the complete splicing segments, it is more useful to detect the boundaries of the spliced segments. Moreover, a deep learning approach can be used to solve the problem, which is different from the previous signal processing techniques.Comment: Accepted by the 13th International Symposium on Chinese Spoken Language Processing (ISCSLP 2022

    An Examination of the Factors that Dictate the Relative Weighting of Feedback and Feedforward Input for Speech Motor Control

    Get PDF
    Speech is arguably the most important form of human communication. Fluent speech production relies on auditory feedback for the planning, execution, and monitoring of speech movements. Auditory feedback is particularly important during the acquisition of speech, however, it has been suggested that over time speakers rely less on auditory feedback as they develop robust sensorimotor representations that allow speech motor commands to be executed in a feedforward manner. The studies reported in this thesis recorded speaker’s vocal and neural responses to altered auditory feedback in order to explore the factors that dictate the relative importance of auditory feedback for speech motor control. More specifically, studies 1 through 3 examined how the role of auditory feedback changes throughout development, while studies 4 and 5 examined the relationship between vocal variability and auditory feedback control, and lastly study 6 looked at how the predictability of auditory feedback errors influences the role of auditory feedback for speech motor control. Results of the first study demonstrated that toddlers use auditory feedback to regulate their speech motor commands, supporting the long held notion that auditory feedback is important during the acquisition of speech. While mapping out the developmental trajectory of vocal and event related potential responses to altered auditory feedback, the second study demonstrated that vocal variability, rather than age, best predicts responses to altered auditory feedback. Importantly, this suggests that the maturation of the speech motor control system is not strictly dependent on age. The third study in this thesis demonstrated that children and adults show similar rates of sensorimotor adaptation, suggesting that once speech is acquired, speakers are proficient at using sensory information to modify the planning of future speech motor commands. However, since adults produced larger compensatory responses, these results also suggested that adults are more proficient at comparing incoming auditory feedback with the feedback predicted by their sensorimotor representations, as a result of possessing more precisely mapped sensorimotor representations. The results of studies four and five demonstrated that vocal variability can be used to predict the size of compensatory responses and sensorimotor adaptation to changes in one’s auditory feedback, respectively. Furthermore, these studies demonstrated that increased variability was related to increased auditory feedback control of speech. Finally, the sixth study in this thesis demonstrated that experimentally induced predictability and variability can be used to induce increases in feedforward and auditory feedback control, respectively. In conclusion, the results reported in this thesis demonstrate that age and vocal variability, both naturally occurring and experimentally induced, are important determinants of the role of auditory feedback in speech motor control

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

    Size discrimination of transient signals

    No full text
    The importance of spectral cues in size discrimination of transient signals was investigated, and a model for this ability, tAIM, was created based on the biological principles of human hearing. A psychophysics experiment involving 40 participants found that the most important cue for size discrimination of transient signals, created by striking different sizes of polystyrene spheres, was similar to that of speakers listening to vowels – the relative positions of the resonances between comparison signals. It was found possible to scale the sphere signals in order to confuse listeners into believing the signal source was a different size, but two methods of scaling signals in order to sound the same size as another proved inconclusive, suggesting the possibility that transient signals cannot be scaled in a linear fashion as has been shown possible for vowels. Filtering the signals in a number of different ways found that the most important cue in size discrimination of transient signals is the difference between the most prominent resonances available in the spectra of the comparison signals. A model of the auditory system using the dynamic compressive Gammachirp filterbank, and based on the well-known AIM, was created to produce auditory images of transient signals that could be normalised for size. Transient-AIM, or tAIM used the Mellin transform to produce images that showed size normalisation was possible due to the spectral envelope similarities across the sizes of the spheres. tAIM was extended to carry out size discrimination of the spheres using the information contained within the Mellin images. There was a systematic association between Mellin phase and size of objects of various shapes, which suggests that tAIM is able to infer object size from sound recordings of objects being struck

    Making music through real-time voice timbre analysis: machine learning and timbral control

    Get PDF
    PhDPeople can achieve rich musical expression through vocal sound { see for example human beatboxing, which achieves a wide timbral variety through a range of extended techniques. Yet the vocal modality is under-exploited as a controller for music systems. If we can analyse a vocal performance suitably in real time, then this information could be used to create voice-based interfaces with the potential for intuitive and ful lling levels of expressive control. Conversely, many modern techniques for music synthesis do not imply any particular interface. Should a given parameter be controlled via a MIDI keyboard, or a slider/fader, or a rotary dial? Automatic vocal analysis could provide a fruitful basis for expressive interfaces to such electronic musical instruments. The principal questions in applying vocal-based control are how to extract musically meaningful information from the voice signal in real time, and how to convert that information suitably into control data. In this thesis we address these questions, with a focus on timbral control, and in particular we develop approaches that can be used with a wide variety of musical instruments by applying machine learning techniques to automatically derive the mappings between expressive audio input and control output. The vocal audio signal is construed to include a broad range of expression, in particular encompassing the extended techniques used in human beatboxing. The central contribution of this work is the application of supervised and unsupervised machine learning techniques to automatically map vocal timbre to synthesiser timbre and controls. Component contributions include a delayed decision-making strategy for low-latency sound classi cation, a regression-tree method to learn associations between regions of two unlabelled datasets, a fast estimator of multidimensional di erential entropy and a qualitative method for evaluating musical interfaces based on discourse analysis
    corecore