Search CORE

11 research outputs found

Securing Audio Watermarking System using Discrete Fourier Transform for Copyright Protection

Author: Miss Rushali J. Watane, Prof. Nitin R. Chopde
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2015
Field of study

The recent growth in pc networks, and a lot of specifically, the planet Wide internet, copyright protection of digital audio becomes a lot of and a lot of necessary. Digital audio watermarking has drawn in depth attention for copyright protection of audio information. A digital audio watermarking may be a method of embedding watermarks into audio signal to point out genuineness and possession. Our technique supported the embedding watermark into audio signal and extraction of watermark sequence. We tend to propose a brand new watermarking system victimization separate Fourier remodel (DFT) for audio copyright protection. The watermarks area unit embedded into the best outstanding peak of the magnitude spectrum of every non-overlapping frame. This watermarking system can provides robust lustiness against many styles of attacks like noise addition, cropping, re-sampling, re-quantization, and MP3 compression and achieves similarity values starting from thirteen sound unit to twenty sound unit. Additionally, planned systems attempting to realize SNR (signal-to-noise ratio) values starting from twenty sound unit to twenty-eight sound unit. DOI: 10.17762/ijritcc2321-8169.15055

International Journal on Recent and Innovation Trends in Computing and Communication

Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics

Author: Battiato Sebastiano
Giudice Oliver
Puglisi Valerio Francesco
Publication venue
Publication date: 29/10/2023
Field of study

Deep Audio Analyzer is an open source speech framework that aims to simplify the research and the development process of neural speech processing pipelines, allowing users to conceive, compare and share results in a fast and reproducible way. This paper describes the core architecture designed to support several tasks of common interest in the audio forensics field, showing possibility of creating new tasks thus customizing the framework. By means of Deep Audio Analyzer, forensics examiners (i.e. from Law Enforcement Agencies) and researchers will be able to visualize audio features, easily evaluate performances on pretrained models, to create, export and share new audio analysis workflows by combining deep neural network models with few clicks. One of the advantages of this tool is to speed up research and practical experimentation, in the field of audio forensics analysis thus also improving experimental reproducibility by exporting and sharing pipelines. All features are developed in modules accessible by the user through a Graphic User Interface. Index Terms: Speech Processing, Deep Learning Audio, Deep Learning Audio Pipeline creation, Audio Forensics

arXiv.org e-Print Archive

Blind Detection of Copy-Move Forgery in Digital Audio Forensics

Author: Akram Sheeraz
Ali Zulfiqar
Bakhsh Sheikh Tahir
Imran Muhammad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Although copy-move forgery is one of the most common fabrication techniques, blind detection of such tampering in digital audio is mostly unexplored. Unlike active techniques, blind forgery detection is challenging, because it does not embed a watermark or signature in an audio that is unknown in most of the real-life scenarios. Therefore, forgery localization becomes more challenging, especially when using blind methods. In this paper, we propose a novel method for blind detection and localization of copy-move forgery. One of the most crucial steps in the proposed method is a voice activity detection (VAD) module for investigating audio recordings to detect and localize the forgery. The VAD module is equally vital for the development of the copy-move forgery database, wherein audio samples are generated by using the recordings of various types of microphones. We employ a chaotic theory to copy and move the text in generated forged recordings to ensure forgery localization at any place in a recording. The VAD module is responsible for the extraction of words in a forged audio, and these words are analyzed by applying a 1-D local binary pattern operator. This operator provides the patterns of extracted words in the form of histograms. The forged parts (copy and move text) have similar histograms. An accuracy of 96.59% is achieved, and the proposed method is deemed robust against noise

University of Essex Research Repository

Crossref

Federation ResearchOnline

Ulster University's Research Portal

An Automatic Digital Audio Authentication/Forensics System

Author: Ali Zulfiqar
Alsulaiman Mansour
Imran Muhammad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

With the continuous rise in ingenious forgery, a wide range of digital audio authentication applications are emerging as a preventive and detective control in real-world circumstances, such as forged evidence, breach of copyright protection, and unauthorized data access. To investigate and verify, this paper presents a novel automatic authentication system that differentiates between the forged and original audio. The design philosophy of the proposed system is primarily based on three psychoacoustic principles of hearing, which are implemented to simulate the human sound perception system. Moreover, the proposed system is able to classify between the audio of different environments recorded with the same microphone. To authenticate the audio and environment classification, the computed features based on the psychoacoustic principles of hearing are dangled to the Gaussian mixture model to make automatic decisions. It is worth mentioning that the proposed system authenticates an unknown speaker irrespective of the audio content i.e., independent of narrator and text. To evaluate the performance of the proposed system, audios in multi-environments are forged in such a way that a human cannot recognize them. Subjective evaluation by three human evaluators is performed to verify the quality of the generated forged audio. The proposed system provides a classification accuracy of 99.2% ± 2.6. Furthermore, the obtained accuracy for the other scenarios, such as text-dependent and text-independent audio authentication, is 100% by using the proposed system

University of Essex Research Repository

Crossref

Federation ResearchOnline

Ulster University's Research Portal

Audio phylogenetic analysis using geometric transforms

Author: Bestagini Paolo
Milani Simone
Tubaro Stefano
Verde Sebastiano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Whenever a multimedia content is shared on the Internet, a mutation process is being operated by multiple users that download, alter and repost a modified version of the original data leading to the diffusion of multiple near-duplicate copies. This effect is also experienced by audio data (e.g., in audio sharing platforms) and requires the design of accurate phylogenetic analysis strategies that permit uncovering the processing history of each copy and identify the original one. This paper proposes a new phylogenetic reconstruction strategy that converts the analyzed audio tracks into spectrogram images and compare them using alignment strategies borrowed from computer vision. With respect to strategies currently-available in literature, the proposed solution proves to be more accurate, does not require any a-priori knowledge about the operated transformations, and requires a significantly-lower amount of computational time

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Cryptographic Techniques for Data Privacy in Digital Forensics

Author: Ogunseyi Taiwo Blessing
Oluwasola Mary Adedayo
Publication venue: IEEE
Publication date: 15/12/2023
Field of study

The acquisition and analysis of data in digital forensics raise different data privacy challenges. Many existing works on digital forensic readiness discuss what information should be stored and how to collect relevant data to facilitate investigations. However, the cost of this readiness often directly impacts the privacy of innocent third parties and suspects if the collected information is irrelevant. Approaches that have been suggested for privacy-preserving digital forensics focus on the use of policy, non-cryptography-based, and cryptography-based solutions. Cryptographic techniques have been proposed to address issues of data privacy during data analysis. As the utilization of some of these cryptographic techniques continues to increase, it is important to evaluate their applicability and challenges in relation to digital forensics processes. This study provides digital forensics investigators and researchers with a roadmap to understanding the data privacy challenges in digital forensics and examines the various privacy techniques that can be utilized to tackle these challenges. Specifically, we review the cryptographic techniques applied for privacy protection in digital forensics and categorize them within the context of whether they support trusted third parties, multiple investigators, and multi-keyword searches. We highlight some of the drawbacks of utilizing cryptography-based methods in privacy-preserving digital forensics and suggest potential solutions to the identified shortcomings. In addition, we propose a conceptual privacy-preserving digital forensics (PPDF) model that is based on the use of cryptographic techniques and analyze the model within the context of the above-mentioned factors. An evaluation of the model is provided through a consideration of identified factors that may affect an investigation. Lastly, we provide an analysis of how existing principles for preserving privacy in digital forensics are addressed in our PPDF model. Our evaluation shows that the model aligns with many of the existing privacy principles recommended for privacy protection in digital forensics.This work was supported by The University of Winnipeg (Grant ID: 16792)

WinnSpace Repository

Acoustic analysis of Croatian and Serbian RP pronunciation – formant analysis and fundamental frequency measurements

Author: Bašić Iva
Publication venue
Publication date: 01/01/2018
Field of study

Osnovni je cilj ovoga istraživanja utvrditi referentne formantske frekvencije (F1-F3) na ukupno 162 izvorna govornika hrvatskoga i srpskoga jezika. U radu su akustički ispitane spolne i jezične razlike. Kontrastivnom analizom utvrđene su razlike u izgovoru svih vokala. Rezultati su također pokazali da različito fonetsko okruženje utječe na vrijednosti formanata čak i u središnjem dijelu vokala. Spolne razlike među govornicima obaju jezika utvrđene su u svim analiziranim parametrima. Također, pokazalo se da je najsnažniji pokazatelj spolnih razlika među govornicima hrvatskoga jezika F1 i Df mjera, dok je u srpskome jeziku najveću razlikovnost pokazao F1. U oba jezika utvrđena je veća akustička raspršenost kod žena. Formantska disperzija pokazala se i kao dobar parametar jezične razlikovnosti. Analizom fundamentalne frekvencije utvrđene su spolne i jezične razlike među govornicima. Muški govornici obaju jezika imaju značajno nižu prosječnu F0 u odnosu na žene, kao i govornici hrvatskoga jezika u odnosu na govornike srpskoga. U obama jezicima pokazalo se da žene značajno više variraju u F0 nego muškarci. Ispitivanjem korelacije između formanata i F0 te između formanata samih utvrđene su korelacije F0 te F2 i F3 u hrvatskome jeziku, dok su u srpskome značajne korelacije potvrđene samo kod žena. Ukupno gledano, najveći je broj korelacija zabilježen kod žena i između formanata samih. Ovim se radom nastojalo pridonijeti fonetici hrvatskoga jezika, oblikovati referentni okvir za procjenu formanata osumnjičenika u forenzičkim slučajevima te okvir za opis vokalske raznolikosti regionalnih varijeteta hrvatskoga jezika. Sociofonetski aspekt ovoga istraživanja ogleda se u usporedbi različitih mjera formanata te fundamentalne frekvencije, između govornika različitoga spola te jezikaThe subject of this study is on the one hand motivated by the need for the necessary acoustic description of the vowel system of Croatian, and on the other hand by the recent research methods in forensic phonetics. Formant analysis is the main component of most forensic phonetic cases. Physiological features of the speaker, their sociolinguistic background and idiosyncratic phonetic behaviour, are reflected in formant frequencies. That is the reason why formant analysis is applied in different scientific fields: articulatory, acoustic and forensic phonetics, sociophonetics, sociolinguistics, etc. The primary aim of this research is to determine the reference formant frequencies (F1-F3) of the vowel system of the Croatian and Serbian languages. For the purposes of this research, 184 native speakers were recorded. After the verification process, 162 speakers were selected from the corpora. Both languages were represented by an equal number of speakers (NCRO= 81 and NSER= 81), with a remotely larger number of male compared to female voices (NF= 70 and NM= 92). Ladefoged (2003) recommended that corpora in sociolinguistic and sociophonetic research should include speakers of both sexes, due to the differences in coexistent phonology, phonetics etc., which is confirmed in the majority of world languages. Speakers recorded for this dissertation were chosen according to five criteria: speech status, place of birth and an extended stay, place of birth of their parents, level of education, and birth year. The recordings were carried out in very similar conditions: in rooms with reduced noise level or in studio conditions. All speakers were recorded by the same recording schedule, and were given the same instructions. Vermeulen and Cambier-Langveld (2017) noted that the same speech style (reading, spontaneous speech, etc.) is optimal for speaker comparison in forensic phonetics. For the purposes of this research, speakers were instructed to read a list of 50 shorter sentences in which target words were placed in the final positions. Each vowel was represented through 10 two-syllable words with a different phonetic environment. Formant frequencies (F1-F3) were estimated from the central stable part of the accented vowel of the target word, with the help of the Praat program (Boersma & Weenik, 2015). In some vowels ([u], [o] and [i]), and more often within female voices, formants tended to overlap. This spectral integration was noticed in both languages, and in these samples all results were subsequently acoustically and perceptively checked and corrected. The fundamental frequency was also evaluated in the central part of the accented vowel. Very low frequencies, which were the result of the glottal fry, have been excluded from the results. Glottalization was more often recorded within men voices, which is in sociophonetics interpreted as a possible social marker of highlighting their own masculinity. The programs MATLAB (MathWorks Inc., 2015) and JASP (JASP Team, 2018) were used for the statistical analysis of the collected data. Descriptive statistics consisted of determining average, median, minimum, and maximum values of formants and F0. For the purposes of this research, the frequency ranges of F0 were also calculated. They were determined by specifying the average minimum and maximum values of the fundamental frequencies for each speaker individually. The results were then processed by factors of different sex and language. Further analyses were conducted using frequency values of formants (F1-F3) and the fundamental frequencies. For the purpose of testing the significant differences in average absolute deviation of these, acoustic parameters (as dependent variables) have been calculated. The average absolute deviation has been selected as a more stable measure for the dispersion of measured results (compared to the standard deviation, or variance analysis), given the increasing number of measurements. In this way, the measured dispersions of results have been compared amongst different vowels, sexes, and languages. Different parametric tests have been used for the comparison of differences between the various groups of speakers (of diverse sex and/or language). Correlation coefficients have been calculated between formant frequency values and the fundamental frequency, and also between formants themselves. Correlation coefficients have been calculated using Pearson formulas for estimating connectedness. The study examined sex and language differences among analysed speakers in several acoustic parameters (formant values, formant dispersion, and fundamental frequency). Furthermore, it questioned if there are some acoustic differences in the variability of vowel systems according to the factors of sex and language. Since it is generally known that coarticulation has the strongest impact in the trajectory areas of vowels, this study questioned whether a different phonetic environment has an impact on the formant values in the central part of the accented vocals. Given that some authors emphasize that it is more useful to interpret the relations among formants than the average values for each formant separately (Chistovich & Lublinskaya, 1979; Chistovich, 1985; Hayward, 2000; Harrington, 2013), in this research differences in formant relations between speakers of different sexes were analysed, as well as differences among speakers of different languages. This study presumed that the majority of male speakers would have lower frequency values of the formants, fundamental frequency, and measure of formant dispersion, compared to those of female speakers. Considering the fact that some studies showed that coarticulation influence is the strongest around vowel [a] due to its lowest articulation stability (Stevens & House, 1963), and that in sociophonetical research of Croatian (Škarić, 2009; Kišiček, 2012) the same vowel was described as the most distinctive vowel in Croatian, the investigation was also directed towards the variability of formant frequencies in different vowels. Since the pilot research of Varošanec-Škarić, Bašić and Kišiček (2016) has shown that vowel [a] was more open, vowels [i] and [e] were more front, vowels [i] and [u] more closed, and vowel [u] more back in Croatian than in Serbian, in the present study the vowel systems of the analysed languages have been acoustically described and compared. Furthermore, it was expected that overall average values of the fundamental frequency would be lower for Croatian speakers of both gender groups compared to the results in previous studies (Škarić, 1998; Jovičić, 1999; Biočina, Varošanec-Škarić & Kišiček, 2017). Also, this study determined the frequency ranges of the fundamental frequency for speakers of both languages, as well as for both gender groups. Correlations of formants and the fundamental frequency, as well as correlations between the formants themselves have been examined. Results of the formant analysis have shown that both genders in both languages have the lowest average values of the first formant while pronouncing the front vowel [i], and the highest F1 values for the central vowel [a]. The lowest average values of the second formant have been found for the back vowel [u] and the highest for [i]. The third formant had the highest average values for vowel [i], whereas [o] had the lowest. Considering that previous research of similar subject matter in the Croatian language and this research was conducted with substantial methodological differences (in the number of speakers, pooling results of speakers of different sexes, deficient speech material, different speaking style, etc.), average values of formants were compared descriptively – without statistics, which in that case would be unjustified. Results of the average reference values for Croatian were closest to the results from the study by Varošanec-Škarić and Bašić (2015), and for Serbian; they were closest to the results from Marković and Bjelaković (2009), as well as from Varošanec-Škarić et al. (2016). An overview of recent studies on formant analysis in the Croatian and Serbian languages has shown that there is a greater discrepancy in the formant values for Serbian – as stated by different authors - than it is for Croatian. In addition, this study includes an acoustic contrastive analysis in order to describe the vowel systems of the Croatian and Serbian languages. The importance of the differences was also statistically analysed. The results have shown that vowel [a] is placed further back in both genders for Croatian speakers, where a significant difference was found for the first and third formant. The front vowel [e] has shown itself as more front and it was observed that it is pronounced with less open lips than in Serbian (statically significant only among women). Between female speakers of the Croatian and Serbian languages there is also a difference in the pronunciation of the vowel [e], which is somewhat more closed in Croatian (yet it has no statistical significance). Vowel [i] is also more closed in Croatian (statistical significance determined only among men), while in terms of the feature front/back, one can say that the results are sexually dimorphic: male speakers tend to have higher values for the second formant which indicates more front pronunciation, while female speakers, on the other hand, pronounce it more to the back when compared to Serbian speakers. According to the results from the conducted analyses, back vowels [o] and [u] are more closed (and back) in Croatian, and the vowel [o] is significantly more closed only in women. Statistical significance for vowel [u] has been observed among all formants for both sexes, which brings us to the conclusion that [u] is more closed and further back in Croatian and that it is articulated with more rounded lips. After determining the average formant values in both analysed languages, the next step was to compare the variability of different formant frequencies between all vowels within both genders and languages separately. It was observed that F1 has the lowest value variability across the vowels, while it is somewhat higher for F2, and the highest for F3. If we look at the results across different vowels, the highest variability of the first formant was observed in vowel [a] in both gender groups. This was indicated on the one hand by the results from sociophonetic studies in Croatian (Škarić, 1991; Varošanec-Škarić, 2010), according to which the vowel [a] is the most distinctive vowel of the Croatian vowel system, and on the other hand by the results from studies according to which the coarticulation influence is strongest for the vowel [a], due to its lowest articulatory stability (Stevens & House, 1963). In the Croatian language, the biggest dispersion of the second formant was determined among men during the pronunciation of vowel [o], and among women during the pronunciation of the front vowel [i]. The third formant varies the most in the back vowel [u] in male speakers, and in [i] for women. In the Serbian language, the greatest dispersion of F1 in women was observed for the front vowel [e], while the back vowels [o] and [u] showed the same values of dispersion among the male population. The second formant had the highest variability in both sexes of Serbian speakers during the pronunciation of the front vowel [e], while for F3 the same was observed in the pronunciation of the back vowel [o]. Based on the comparison of average formant frequencies between different sexes in one language, and between speakers of the same sexes in both languages, several conclusions have been drawn. As expected, in both analysed languages female speakers had higher values of all analysed formants (F1-F3), compared to male speakers. By means of statistical analysis it has been confirmed that Croatian speakers of both sexes differ significantly in their average values: F1 is significant for all vowels, F2 for the majority (with the exception of central [a] and back vowel [u]), F3 also for the majority (except the back vowel [u]), as well as in values of formant dispersion (Df) in all vowels. Hence, it can be said that F1 and Df are stronger acoustic parameters for sex differentiation in Croatian than parameters F2 and F3, which has already been ascertained by Torre III and Barlow (2009). The results of the analysis in the Serbian language have also confirmed this pattern of gender distinction based on the values of F1. Namely, the results show that men and women differ with a statistical significance in their average values: F1 in nearly all vowels (with the exception of the back vowel [u]), F2 for fewer vowels (in [e] and [u]), and F3 for back vowels. Unexpectedly, the parameter of formant dispersion has been a very weak indicator of gender distinction in Serbian (no statistically significant difference has been found). Therefore, we can conclude that in the Serbian language the strongest factor for gender distinction is the first formant, while the second and third are equally weak indicators of gender differences. The measure of formant dispersion (Df) was used to examine gender and language differences between the analysed speakers. The results show that the Df values in the Croatian language are primarily higher among women, except for vowels [a] and [o]. Female Serbian speakers had higher Df values in all vowels, compared to female speakers of Croatian. The statistical significance of variability of the analysed parameters (F1, F2, F3 and Df) was examined between male and female Croatian speakers and, subsequently, between speakers of different sex in the Serbian language. The results show that the difference in dispersion of formant values in Croatian is statistically significantly higher in women for vowels [a], [e] and [i], while in men this is the case for back vowels. Therefore, we can say that acoustic dispersion of formant frequencies is higher for female speakers of the Croatian language, which reinforces equivalent results of studies in other languages (Gordon & Heath, 1998; Hanson & Chuang, 1999). For Serbian speakers, the variability of the first formant is significantly higher among female speakers in nearly all vowels (except [u]), whereas the variability of F2 was primarily higher among male speakers (in vowels [a], [i] and [o]). Apart from examining sex differences, the aim of using the measure of formant dispersion was to analyse language differences between the speakers. The results have shown that Df values are mainly higher for speakers of the Serbian language (vowels [a], [e] i [u]). The results showed that Df values are mainly higher within speakers of Serbian (for vowels [a], [e] and [u]). Higher Df values have been determined among speakers of Croatian in the front vowel [i], while in the back vowel [o] their values were very close. Female speakers of Croatian had lower Df values in all vowels, compared to female speakers of Serbian. Since the same tendency has been confirmed in both groups of speakers, these findings suggest that differences in Df values are caused by language differences, respectively by differences in the vowel systems of the analysed languages, which had reflected on formant values, as well as on Df values. In this research, the results also showed lower variability of formant frequencies (F1-F3) among speakers of both sexes in Croatian, compared to speakers of Serbian (statistically significant for vowels [e] and [o] for male speakers, and for vowels [e], [i], [o] and [u] between female speakers of Croatian and Serbian). Taking into consideration that the phonetic environment effects not only the trajectory part of the vowel, but also the formant frequencies in the stable part of the vowel, this research questioned coarticulation effects of different phonetic environments. The results showed that F1 values tend to fall in fricative and plosive phonetic environment. On the other hand, F2 values tend to rise in the same environment. An increase in F2 values is especially emphasized in the plosive environment in front vowels. Second formant showed higher values in the fricative environment in back vowels, which was confirmed in different languages (Stevens & House, 1963). The second aim of this dissertation was to compare the different measures of the fundamental frequency, with the purpose of questioning sex and language differences between the Croatian and Serbian languages. The results of the acoustic analysis and statistical data processing showed that the average F0 value for male speakers of Croatian is 118 Hz, and for female speakers 197 Hz. The highest average F0 values were calculated for the front vowel [i], and the lowest for the central vowel [a], which was confirmed for both sexes. In comparison to previous research, frequency values are very close, closest to the results of the most recent studies with similar methodology (Varošanec-Škarić, 2010; Kišiček, 2012; Biočina et al., 2016; Varošanec-Škarić et al., 2017). In the group of Serbian speakers, the average F0 for male speakers is 108 Hz, and 179 Hz for females. Comparing the fundamental frequency values between speakers of the same sex and different language has shown that speakers of Croatian (females and males) have significantly lower F0 values for every analysed vowel and generally at the level of all vowels (p<0,001). Sex and language differences have also been analysed according to the range of the fundamental frequency. Descriptive statistics showed that male speakers of Croatian mainly have wider F0 range, with regard to women. Surprisingly, in Serbian the results were opposite. Comparing the frequency ranges of F0 between the speakers of Croatian and Serbian, results showed that male speakers of the Croatian language have a wider F0 range in vowels [a], [e] and [i]. In the group of male speakers of the Serbian language, results showed a wider range for back vowels ([u] and [o]). Female speakers of Croatian and Serbian also differed in F0 range. Speakers of Serbian have shown a wider range in vowels [a], [e] and [u], while in the remaining vowels female speakers of Croatian had a wider range. Although frequency ranges are a frequently used parameter in phonetics, statistically speaking they are not a stable and reliable indicator of dispersion. Accordingly, significance of sex and language differences in F0 were tested with complex ANOVA and multiple paired t-tests. Descriptive statistics, ANOVA analysis, and t-tests suggest very diverse results. Namely, for all vowels it has been established that F0 is significantly more variable within the group of female speakers, than within male speakers. These results have been confirmed for both analysed languages, and were found in numerous sociolinguistic and sociophonetic studies for different languages. The findings of this study also indicate that there are no significant differences in F0 variability (except for vowel [e]) between speakers of the analysed languages (confirmed for both sexes). Finally, this dissertation has questioned the correlations between the fundamental frequency and formants, as well as the correlations between formants themselves. These correlations were analysed within speakers of different languages and different sexes. Also, the significance of the correlations themselves was analysed. In the group of Croatian speakers, results showed that there are statistically significant correlations between F0 and F2, as well as between F0 i F3. Surprisingly, correlations between F0 and the first formant have not been found. Taken together, these results suggest that there is a greater number of correlations between F0 and formants within the group of female speakers

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb

Digitalni arhiv Filozofskog fakulteta u Zagrebu

Acoustic modelling, data augmentation and feature extraction for in-pipe machine learning applications

Author: Chiantello Dario Alfredo
Publication venue
Publication date: 01/03/2023
Field of study

Gathering measurements from infrastructure, private premises, and harsh environments can be difficult and expensive. From this perspective, the development of new machine learning algorithms is strongly affected by the availability of training and test data. We focus on audio archives for in-pipe events. Although several examples of pipe-related applications can be found in the literature, datasets of audio/vibration recordings are much scarcer, and the only references found relate to leakage detection and characterisation. Therefore, this work proposes a methodology to relieve the burden of data collection for acoustic events in deployed pipes. The aim is to maximise the yield of small sets of real recordings and demonstrate how to extract effective features for machine learning. The methodology developed requires the preliminary creation of a soundbank of audio samples gathered with simple weak annotations. For practical reasons, the case study is given by a range of appliances, fittings, and fixtures connected to pipes in domestic environments. The source recordings are low-reverberated audio signals enhanced through a bespoke spectral filter and containing the desired audio fingerprints. The soundbank is then processed to create an arbitrary number of synthetic augmented observations. The data augmentation improves the quality and the quantity of the metadata and automatically creates strong and accurate annotations that are both machine and human-readable. Besides, the implemented processing chain allows precise control of properties such as signal-to-noise ratio, duration of the events, and the number of overlapping events. The inter-class variability is expanded by recombining source audio blocks and adding simulated artificial reverberation obtained through an acoustic model developed for the purpose. Finally, the dataset is synthesised to guarantee separability and balance. A few signal representations are optimised to maximise the classification performance, and the results are reported as a benchmark for future developments. The contribution to the existing knowledge concerns several aspects of the processing chain implemented. A novel quasi-analytic acoustic model is introduced to simulate in-pipe reverberations, adopting a three-layer architecture particularly convenient for batch processing. The first layer includes two algorithms: one for the numerical calculation of the axial wavenumbers and one for the separation of the modes. The latter, in particular, provides a workaround for a problem not explicitly treated in the literature and related to the modal non-orthogonality given by the solid-liquid interface in the analysed domain. A set of results for different waveguides is reported to compare the dispersive behaviour against different mechanical configurations. Two more novel solutions are also included in the second layer of the model and concern the integration of the acoustic sources. Specifically, the amplitudes of the non-orthogonal modal potentials are obtained using either a distance minimisation objective function or by solving an analytical decoupling problem. In both cases, results show that sources sufficiently smooth can be approximated with a limited number of modes keeping the error below 1%. The last layer proposes a bespoke approach for the integration of the acoustic model into the synthesiser as a reverberation simulator. Additional elements of novelty relate to the other blocks of the audio synthesiser. The statistical spectral filter, for instance, is a batch-processing solution for the attenuation of the background noise of the source recordings. The signal-to-noise ratio analysis for both moderate and high noise levels indicates a clear improvement of several decibels against the closest filter example in the literature. The recombination of the audio blocks and the system of fully tracked annotations are also novel extensions of similar approaches recently adopted in other contexts. Moreover, a bespoke synthesis strategy is proposed to guarantee separable and balanced datasets. The last contribution concerns the extraction of convenient sets of audio features. Elements of novelty are introduced for the optimisation of the filter banks of the mel-frequency cepstral coefficients and the scattering wavelet transform. In particular, compared to the respective standard definitions, the average F-score performance of the optimised features is roughly 6% higher in the first case and 2.5% higher for the latter. Finally, the soundbank, the synthetic dataset, and the fundamental blocks of the software library developed are publicly available for further research

E-space: Manchester Metropolitan University's Research Repository

Statistical pattern recognition for audio-forensics : empirical investigations on the application scenarios audio steganalysis and microphone forensics

Author: Krätzer Christian
Publication venue: Universitätsbibl.
Publication date
Field of study

Magdeburg, Univ., Fak. für Informatik, Diss., 2013von Christian Krätze

Digital University Library Saxony-Anhalt