11 research outputs found
Securing Audio Watermarking System using Discrete Fourier Transform for Copyright Protection
The recent growth in pc networks, and a lot of specifically, the planet Wide internet, copyright protection of digital audio becomes a lot of and a lot of necessary. Digital audio watermarking has drawn in depth attention for copyright protection of audio information. A digital audio watermarking may be a method of embedding watermarks into audio signal to point out genuineness and possession. Our technique supported the embedding watermark into audio signal and extraction of watermark sequence. We tend to propose a brand new watermarking system victimization separate Fourier remodel (DFT) for audio copyright protection. The watermarks area unit embedded into the best outstanding peak of the magnitude spectrum of every non-overlapping frame. This watermarking system can provides robust lustiness against many styles of attacks like noise addition, cropping, re-sampling, re-quantization, and MP3 compression and achieves similarity values starting from thirteen sound unit to twenty sound unit. Additionally, planned systems attempting to realize SNR (signal-to-noise ratio) values starting from twenty sound unit to twenty-eight sound unit.
DOI: 10.17762/ijritcc2321-8169.15055
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics
Deep Audio Analyzer is an open source speech framework that aims to simplify
the research and the development process of neural speech processing pipelines,
allowing users to conceive, compare and share results in a fast and
reproducible way. This paper describes the core architecture designed to
support several tasks of common interest in the audio forensics field, showing
possibility of creating new tasks thus customizing the framework. By means of
Deep Audio Analyzer, forensics examiners (i.e. from Law Enforcement Agencies)
and researchers will be able to visualize audio features, easily evaluate
performances on pretrained models, to create, export and share new audio
analysis workflows by combining deep neural network models with few clicks. One
of the advantages of this tool is to speed up research and practical
experimentation, in the field of audio forensics analysis thus also improving
experimental reproducibility by exporting and sharing pipelines. All features
are developed in modules accessible by the user through a Graphic User
Interface. Index Terms: Speech Processing, Deep Learning Audio, Deep Learning
Audio Pipeline creation, Audio Forensics
Blind Detection of Copy-Move Forgery in Digital Audio Forensics
Although copy-move forgery is one of the most common fabrication techniques, blind detection of such tampering in digital audio is mostly unexplored. Unlike active techniques, blind forgery detection is challenging, because it does not embed a watermark or signature in an audio that is unknown in most of the real-life scenarios. Therefore, forgery localization becomes more challenging, especially when using blind methods. In this paper, we propose a novel method for blind detection and localization of copy-move forgery. One of the most crucial steps in the proposed method is a voice activity detection (VAD) module for investigating audio recordings to detect and localize the forgery. The VAD module is equally vital for the development of the copy-move forgery database, wherein audio samples are generated by using the recordings of various types of microphones. We employ a chaotic theory to copy and move the text in generated forged recordings to ensure forgery localization at any place in a recording. The VAD module is responsible for the extraction of words in a forged audio, and these words are analyzed by applying a 1-D local binary pattern operator. This operator provides the patterns of extracted words in the form of histograms. The forged parts (copy and move text) have similar histograms. An accuracy of 96.59% is achieved, and the proposed method is deemed robust against noise
An Automatic Digital Audio Authentication/Forensics System
With the continuous rise in ingenious forgery, a wide range of digital audio authentication applications are emerging as a preventive and detective control in real-world circumstances, such as forged evidence, breach of copyright protection, and unauthorized data access. To investigate and verify, this paper presents a novel automatic authentication system that differentiates between the forged and original audio. The design philosophy of the proposed system is primarily based on three psychoacoustic principles of hearing, which are implemented to simulate the human sound perception system. Moreover, the proposed system is able to classify between the audio of different environments recorded with the same microphone. To authenticate the audio and environment classification, the computed features based on the psychoacoustic principles of hearing are dangled to the Gaussian mixture model to make automatic decisions. It is worth mentioning that the proposed system authenticates an unknown speaker irrespective of the audio content i.e., independent of narrator and text. To evaluate the performance of the proposed system, audios in multi-environments are forged in such a way that a human cannot recognize them. Subjective evaluation by three human evaluators is performed to verify the quality of the generated forged audio. The proposed system provides a classification accuracy of 99.2% Ā± 2.6. Furthermore, the obtained accuracy for the other scenarios, such as text-dependent and text-independent audio authentication, is 100% by using the proposed system
Audio phylogenetic analysis using geometric transforms
Whenever a multimedia content is shared on the Internet, a mutation process is being operated by multiple users that download, alter and repost a modified version of the original data leading to the diffusion of multiple near-duplicate copies. This effect is also experienced by audio data (e.g., in audio sharing platforms) and requires the design of accurate phylogenetic analysis strategies that permit uncovering the processing history of each copy and identify the original one. This paper proposes a new phylogenetic reconstruction strategy that converts the analyzed audio tracks into spectrogram images and compare them using alignment strategies borrowed from computer vision. With respect to strategies currently-available in literature, the proposed solution proves to be more accurate, does not require any a-priori knowledge about the operated transformations, and requires a significantly-lower amount of computational time
Cryptographic Techniques for Data Privacy in Digital Forensics
The acquisition and analysis of data in digital forensics raise different data privacy challenges. Many existing works on digital forensic readiness discuss what information should be stored and how to collect relevant data to facilitate investigations. However, the cost of this readiness often directly impacts the privacy of innocent third parties and suspects if the collected information is irrelevant. Approaches that have been suggested for privacy-preserving digital forensics focus on the use of policy, non-cryptography-based, and cryptography-based solutions. Cryptographic techniques have been proposed to address issues of data privacy during data analysis. As the utilization of some of these cryptographic techniques continues to increase, it is important to evaluate their applicability and challenges in relation to digital forensics processes. This study provides digital forensics investigators and researchers with a roadmap to understanding the data privacy challenges in digital forensics and examines the various privacy techniques that can be utilized to tackle these challenges. Specifically, we review the cryptographic techniques applied for privacy protection in digital forensics and categorize them within the context of whether they support trusted third parties, multiple investigators, and multi-keyword searches. We highlight some of the drawbacks of utilizing cryptography-based methods in privacy-preserving digital forensics and suggest potential solutions to the identified shortcomings. In addition, we propose a conceptual privacy-preserving digital forensics (PPDF) model that is based on the use of cryptographic techniques and analyze the model within the context of the above-mentioned factors. An evaluation of the model is provided through a consideration of identified factors that may affect an investigation. Lastly, we provide an analysis of how existing principles for preserving privacy in digital forensics are addressed in our PPDF model. Our evaluation shows that the model aligns with many of the existing privacy principles recommended for privacy protection in digital forensics.This work was supported by The University of Winnipeg (Grant ID: 16792)
Acoustic analysis of Croatian and Serbian RP pronunciation ā formant analysis and fundamental frequency measurements
Osnovni je cilj ovoga istraživanja utvrditi referentne formantske frekvencije (F1-F3) na
ukupno 162 izvorna govornika hrvatskoga i srpskoga jezika. U radu su akustiÄki ispitane
spolne i jeziÄne razlike. Kontrastivnom analizom utvrÄene su razlike u izgovoru svih vokala.
Rezultati su takoÄer pokazali da razliÄito fonetsko okruženje utjeÄe na vrijednosti formanata
Äak i u srediÅ”njem dijelu vokala. Spolne razlike meÄu govornicima obaju jezika utvrÄene su u
svim analiziranim parametrima. TakoÄer, pokazalo se da je najsnažniji pokazatelj spolnih
razlika meÄu govornicima hrvatskoga jezika F1 i Df mjera, dok je u srpskome jeziku najveÄu
razlikovnost pokazao F1. U oba jezika utvrÄena je veÄa akustiÄka rasprÅ”enost kod žena.
Formantska disperzija pokazala se i kao dobar parametar jeziÄne razlikovnosti.
Analizom fundamentalne frekvencije utvrÄene su spolne i jeziÄne razlike meÄu govornicima.
MuÅ”ki govornici obaju jezika imaju znaÄajno nižu prosjeÄnu F0 u odnosu na žene, kao i
govornici hrvatskoga jezika u odnosu na govornike srpskoga. U obama jezicima pokazalo se
da žene znaÄajno viÅ”e variraju u F0 nego muÅ”karci. Ispitivanjem korelacije izmeÄu formanata i
F0 te izmeÄu formanata samih utvrÄene su korelacije F0 te F2 i F3 u hrvatskome jeziku, dok su
u srpskome znaÄajne korelacije potvrÄene samo kod žena. Ukupno gledano, najveÄi je broj
korelacija zabilježen kod žena i izmeÄu formanata samih.
Ovim se radom nastojalo pridonijeti fonetici hrvatskoga jezika, oblikovati referentni okvir za
procjenu formanata osumnjiÄenika u forenziÄkim sluÄajevima te okvir za opis vokalske
raznolikosti regionalnih varijeteta hrvatskoga jezika. Sociofonetski aspekt ovoga istraživanja
ogleda se u usporedbi razliÄitih mjera formanata te fundamentalne frekvencije, izmeÄu
govornika razliÄitoga spola te jezikaThe subject of this study is on the one hand motivated by the need for the necessary acoustic
description of the vowel system of Croatian, and on the other hand by the recent research
methods in forensic phonetics. Formant analysis is the main component of most forensic
phonetic cases. Physiological features of the speaker, their sociolinguistic background and
idiosyncratic phonetic behaviour, are reflected in formant frequencies. That is the reason why
formant analysis is applied in different scientific fields: articulatory, acoustic and forensic
phonetics, sociophonetics, sociolinguistics, etc.
The primary aim of this research is to determine the reference formant frequencies (F1-F3) of
the vowel system of the Croatian and Serbian languages. For the purposes of this research,
184 native speakers were recorded. After the verification process, 162 speakers were selected
from the corpora. Both languages were represented by an equal number of speakers (NCRO=
81 and NSER= 81), with a remotely larger number of male compared to female voices (NF= 70
and NM= 92). Ladefoged (2003) recommended that corpora in sociolinguistic and
sociophonetic research should include speakers of both sexes, due to the differences in
coexistent phonology, phonetics etc., which is confirmed in the majority of world languages.
Speakers recorded for this dissertation were chosen according to five criteria: speech status,
place of birth and an extended stay, place of birth of their parents, level of education, and
birth year.
The recordings were carried out in very similar conditions: in rooms with reduced noise level
or in studio conditions. All speakers were recorded by the same recording schedule, and were
given the same instructions. Vermeulen and Cambier-Langveld (2017) noted that the same
speech style (reading, spontaneous speech, etc.) is optimal for speaker comparison in forensic
phonetics. For the purposes of this research, speakers were instructed to read a list of 50
shorter sentences in which target words were placed in the final positions. Each vowel was
represented through 10 two-syllable words with a different phonetic environment. Formant
frequencies (F1-F3) were estimated from the central stable part of the accented vowel of the
target word, with the help of the Praat program (Boersma & Weenik, 2015).
In some vowels ([u], [o] and [i]), and more often within female voices, formants tended to
overlap. This spectral integration was noticed in both languages, and in these samples all
results were subsequently acoustically and perceptively checked and corrected. The
fundamental frequency was also evaluated in the central part of the accented vowel. Very low frequencies, which were the result of the glottal fry, have been excluded from the results.
Glottalization was more often recorded within men voices, which is in sociophonetics
interpreted as a possible social marker of highlighting their own masculinity.
The programs MATLAB (MathWorks Inc., 2015) and JASP (JASP Team, 2018) were used
for the statistical analysis of the collected data. Descriptive statistics consisted of determining
average, median, minimum, and maximum values of formants and F0. For the purposes of this
research, the frequency ranges of F0 were also calculated. They were determined by
specifying the average minimum and maximum values of the fundamental frequencies for
each speaker individually. The results were then processed by factors of different sex and
language.
Further analyses were conducted using frequency values of formants (F1-F3) and the
fundamental frequencies. For the purpose of testing the significant differences in average
absolute deviation of these, acoustic parameters (as dependent variables) have been
calculated. The average absolute deviation has been selected as a more stable measure for the
dispersion of measured results (compared to the standard deviation, or variance analysis),
given the increasing number of measurements. In this way, the measured dispersions of
results have been compared amongst different vowels, sexes, and languages. Different
parametric tests have been used for the comparison of differences between the various groups
of speakers (of diverse sex and/or language). Correlation coefficients have been calculated
between formant frequency values and the fundamental frequency, and also between formants
themselves. Correlation coefficients have been calculated using Pearson formulas for
estimating connectedness.
The study examined sex and language differences among analysed speakers in several
acoustic parameters (formant values, formant dispersion, and fundamental frequency).
Furthermore, it questioned if there are some acoustic differences in the variability of vowel
systems according to the factors of sex and language. Since it is generally known that
coarticulation has the strongest impact in the trajectory areas of vowels, this study questioned
whether a different phonetic environment has an impact on the formant values in the central
part of the accented vocals. Given that some authors emphasize that it is more useful to
interpret the relations among formants than the average values for each formant separately
(Chistovich & Lublinskaya, 1979; Chistovich, 1985; Hayward, 2000; Harrington, 2013), in this research differences in formant relations between speakers of different sexes were
analysed, as well as differences among speakers of different languages.
This study presumed that the majority of male speakers would have lower frequency values of
the formants, fundamental frequency, and measure of formant dispersion, compared to those
of female speakers. Considering the fact that some studies showed that coarticulation
influence is the strongest around vowel [a] due to its lowest articulation stability (Stevens &
House, 1963), and that in sociophonetical research of Croatian (Å kariÄ, 2009; KiÅ”iÄek, 2012)
the same vowel was described as the most distinctive vowel in Croatian, the investigation was
also directed towards the variability of formant frequencies in different vowels. Since the pilot
research of VaroÅ”anec-Å kariÄ, BaÅ”iÄ and KiÅ”iÄek (2016) has shown that vowel [a] was more
open, vowels [i] and [e] were more front, vowels [i] and [u] more closed, and vowel [u] more
back in Croatian than in Serbian, in the present study the vowel systems of the analysed
languages have been acoustically described and compared.
Furthermore, it was expected that overall average values of the fundamental frequency would
be lower for Croatian speakers of both gender groups compared to the results in previous
studies (Å kariÄ, 1998; JoviÄiÄ, 1999; BioÄina, VaroÅ”anec-Å kariÄ & KiÅ”iÄek, 2017). Also, this
study determined the frequency ranges of the fundamental frequency for speakers of both
languages, as well as for both gender groups. Correlations of formants and the fundamental
frequency, as well as correlations between the formants themselves have been examined.
Results of the formant analysis have shown that both genders in both languages have the
lowest average values of the first formant while pronouncing the front vowel [i], and the
highest F1 values for the central vowel [a]. The lowest average values of the second formant
have been found for the back vowel [u] and the highest for [i]. The third formant had the
highest average values for vowel [i], whereas [o] had the lowest. Considering that previous
research of similar subject matter in the Croatian language and this research was conducted
with substantial methodological differences (in the number of speakers, pooling results of
speakers of different sexes, deficient speech material, different speaking style, etc.), average
values of formants were compared descriptively ā without statistics, which in that case would
be unjustified. Results of the average reference values for Croatian were closest to the results
from the study by VaroÅ”anec-Å kariÄ and BaÅ”iÄ (2015), and for Serbian; they were closest to
the results from MarkoviÄ and BjelakoviÄ (2009), as well as from VaroÅ”anec-Å kariÄ et al.
(2016). An overview of recent studies on formant analysis in the Croatian and Serbian languages has shown that there is a greater discrepancy in the formant values for Serbian ā as
stated by different authors - than it is for Croatian.
In addition, this study includes an acoustic contrastive analysis in order to describe the vowel
systems of the Croatian and Serbian languages. The importance of the differences was also
statistically analysed. The results have shown that vowel [a] is placed further back in both
genders for Croatian speakers, where a significant difference was found for the first and third
formant. The front vowel [e] has shown itself as more front and it was observed that it is
pronounced with less open lips than in Serbian (statically significant only among women).
Between female speakers of the Croatian and Serbian languages there is also a difference in
the pronunciation of the vowel [e], which is somewhat more closed in Croatian (yet it has no
statistical significance). Vowel [i] is also more closed in Croatian (statistical significance
determined only among men), while in terms of the feature front/back, one can say that the
results are sexually dimorphic: male speakers tend to have higher values for the second
formant which indicates more front pronunciation, while female speakers, on the other hand,
pronounce it more to the back when compared to Serbian speakers.
According to the results from the conducted analyses, back vowels [o] and [u] are more
closed (and back) in Croatian, and the vowel [o] is significantly more closed only in women.
Statistical significance for vowel [u] has been observed among all formants for both sexes,
which brings us to the conclusion that [u] is more closed and further back in Croatian and that
it is articulated with more rounded lips.
After determining the average formant values in both analysed languages, the next step was to
compare the variability of different formant frequencies between all vowels within both
genders and languages separately. It was observed that F1 has the lowest value variability
across the vowels, while it is somewhat higher for F2, and the highest for F3. If we look at the
results across different vowels, the highest variability of the first formant was observed in
vowel [a] in both gender groups. This was indicated on the one hand by the results from
sociophonetic studies in Croatian (Å kariÄ, 1991; VaroÅ”anec-Å kariÄ, 2010), according to which
the vowel [a] is the most distinctive vowel of the Croatian vowel system, and on the other
hand by the results from studies according to which the coarticulation influence is strongest
for the vowel [a], due to its lowest articulatory stability (Stevens & House, 1963).
In the Croatian language, the biggest dispersion of the second formant was determined among
men during the pronunciation of vowel [o], and among women during the pronunciation of the front vowel [i]. The third formant varies the most in the back vowel [u] in male speakers,
and in [i] for women. In the Serbian language, the greatest dispersion of F1 in women was
observed for the front vowel [e], while the back vowels [o] and [u] showed the same values of
dispersion among the male population. The second formant had the highest variability in both
sexes of Serbian speakers during the pronunciation of the front vowel [e], while for F3 the
same was observed in the pronunciation of the back vowel [o].
Based on the comparison of average formant frequencies between different sexes in one
language, and between speakers of the same sexes in both languages, several conclusions
have been drawn. As expected, in both analysed languages female speakers had higher values
of all analysed formants (F1-F3), compared to male speakers. By means of statistical analysis
it has been confirmed that Croatian speakers of both sexes differ significantly in their average
values: F1 is significant for all vowels, F2 for the majority (with the exception of central [a]
and back vowel [u]), F3 also for the majority (except the back vowel [u]), as well as in values
of formant dispersion (Df) in all vowels. Hence, it can be said that F1 and Df are stronger
acoustic parameters for sex differentiation in Croatian than parameters F2 and F3, which has
already been ascertained by Torre III and Barlow (2009).
The results of the analysis in the Serbian language have also confirmed this pattern of gender
distinction based on the values of F1. Namely, the results show that men and women differ
with a statistical significance in their average values: F1 in nearly all vowels (with the
exception of the back vowel [u]), F2 for fewer vowels (in [e] and [u]), and F3 for back
vowels. Unexpectedly, the parameter of formant dispersion has been a very weak indicator of
gender distinction in Serbian (no statistically significant difference has been found).
Therefore, we can conclude that in the Serbian language the strongest factor for gender
distinction is the first formant, while the second and third are equally weak indicators of
gender differences.
The measure of formant dispersion (Df) was used to examine gender and language differences
between the analysed speakers. The results show that the Df values in the Croatian language
are primarily higher among women, except for vowels [a] and [o]. Female Serbian speakers
had higher Df values in all vowels, compared to female speakers of Croatian. The statistical
significance of variability of the analysed parameters (F1, F2, F3 and Df) was examined
between male and female Croatian speakers and, subsequently, between speakers of different
sex in the Serbian language. The results show that the difference in dispersion of formant values in Croatian is statistically significantly higher in women for vowels [a], [e] and [i],
while in men this is the case for back vowels. Therefore, we can say that acoustic dispersion
of formant frequencies is higher for female speakers of the Croatian language, which
reinforces equivalent results of studies in other languages (Gordon & Heath, 1998; Hanson &
Chuang, 1999). For Serbian speakers, the variability of the first formant is significantly higher
among female speakers in nearly all vowels (except [u]), whereas the variability of F2 was
primarily higher among male speakers (in vowels [a], [i] and [o]).
Apart from examining sex differences, the aim of using the measure of formant dispersion
was to analyse language differences between the speakers. The results have shown that Df
values are mainly higher for speakers of the Serbian language (vowels [a], [e] i [u]). The
results showed that Df values are mainly higher within speakers of Serbian (for vowels [a], [e]
and [u]). Higher Df values have been determined among speakers of Croatian in the front
vowel [i], while in the back vowel [o] their values were very close. Female speakers of
Croatian had lower Df values in all vowels, compared to female speakers of Serbian. Since the
same tendency has been confirmed in both groups of speakers, these findings suggest that
differences in Df values are caused by language differences, respectively by differences in the
vowel systems of the analysed languages, which had reflected on formant values, as well as
on Df values. In this research, the results also showed lower variability of formant frequencies
(F1-F3) among speakers of both sexes in Croatian, compared to speakers of Serbian
(statistically significant for vowels [e] and [o] for male speakers, and for vowels [e], [i], [o]
and [u] between female speakers of Croatian and Serbian).
Taking into consideration that the phonetic environment effects not only the trajectory part of
the vowel, but also the formant frequencies in the stable part of the vowel, this research
questioned coarticulation effects of different phonetic environments. The results showed that
F1 values tend to fall in fricative and plosive phonetic environment. On the other hand, F2
values tend to rise in the same environment. An increase in F2 values is especially
emphasized in the plosive environment in front vowels. Second formant showed higher values
in the fricative environment in back vowels, which was confirmed in different languages
(Stevens & House, 1963).
The second aim of this dissertation was to compare the different measures of the fundamental
frequency, with the purpose of questioning sex and language differences between the Croatian
and Serbian languages. The results of the acoustic analysis and statistical data processing showed that the average F0 value for male speakers of Croatian is 118 Hz, and for female
speakers 197 Hz. The highest average F0 values were calculated for the front vowel [i], and
the lowest for the central vowel [a], which was confirmed for both sexes. In comparison to
previous research, frequency values are very close, closest to the results of the most recent
studies with similar methodology (VaroÅ”anec-Å kariÄ, 2010; KiÅ”iÄek, 2012; BioÄina et al.,
2016; VaroÅ”anec-Å kariÄ et al., 2017). In the group of Serbian speakers, the average F0 for
male speakers is 108 Hz, and 179 Hz for females. Comparing the fundamental frequency
values between speakers of the same sex and different language has shown that speakers of
Croatian (females and males) have significantly lower F0 values for every analysed vowel and
generally at the level of all vowels (p<0,001).
Sex and language differences have also been analysed according to the range of the
fundamental frequency. Descriptive statistics showed that male speakers of Croatian mainly
have wider F0 range, with regard to women. Surprisingly, in Serbian the results were opposite.
Comparing the frequency ranges of F0 between the speakers of Croatian and Serbian, results
showed that male speakers of the Croatian language have a wider F0 range in vowels [a], [e]
and [i]. In the group of male speakers of the Serbian language, results showed a wider range
for back vowels ([u] and [o]). Female speakers of Croatian and Serbian also differed in F0
range. Speakers of Serbian have shown a wider range in vowels [a], [e] and [u], while in the
remaining vowels female speakers of Croatian had a wider range.
Although frequency ranges are a frequently used parameter in phonetics, statistically speaking
they are not a stable and reliable indicator of dispersion. Accordingly, significance of sex and
language differences in F0 were tested with complex ANOVA and multiple paired t-tests.
Descriptive statistics, ANOVA analysis, and t-tests suggest very diverse results. Namely, for
all vowels it has been established that F0 is significantly more variable within the group of
female speakers, than within male speakers. These results have been confirmed for both
analysed languages, and were found in numerous sociolinguistic and sociophonetic studies for
different languages. The findings of this study also indicate that there are no significant
differences in F0 variability (except for vowel [e]) between speakers of the analysed languages
(confirmed for both sexes).
Finally, this dissertation has questioned the correlations between the fundamental frequency
and formants, as well as the correlations between formants themselves. These correlations
were analysed within speakers of different languages and different sexes. Also, the significance of the correlations themselves was analysed. In the group of Croatian speakers,
results showed that there are statistically significant correlations between F0 and F2, as well as
between F0 i F3. Surprisingly, correlations between F0 and the first formant have not been
found. Taken together, these results suggest that there is a greater number of correlations
between F0 and formants within the group of female speakers
Acoustic modelling, data augmentation and feature extraction for in-pipe machine learning applications
Gathering measurements from infrastructure, private premises, and harsh environments can be difficult and expensive. From this perspective, the development of
new machine learning algorithms is strongly affected by the availability of training
and test data. We focus on audio archives for in-pipe events. Although several
examples of pipe-related applications can be found in the literature, datasets of
audio/vibration recordings are much scarcer, and the only references found relate
to leakage detection and characterisation. Therefore, this work proposes a methodology to relieve the burden of data collection for acoustic events in deployed pipes.
The aim is to maximise the yield of small sets of real recordings and demonstrate
how to extract effective features for machine learning. The methodology developed
requires the preliminary creation of a soundbank of audio samples gathered with
simple weak annotations. For practical reasons, the case study is given by a range
of appliances, fittings, and fixtures connected to pipes in domestic environments.
The source recordings are low-reverberated audio signals enhanced through a
bespoke spectral filter and containing the desired audio fingerprints. The soundbank is then processed to create an arbitrary number of synthetic augmented
observations. The data augmentation improves the quality and the quantity of
the metadata and automatically creates strong and accurate annotations that
are both machine and human-readable. Besides, the implemented processing
chain allows precise control of properties such as signal-to-noise ratio, duration
of the events, and the number of overlapping events. The inter-class variability
is expanded by recombining source audio blocks and adding simulated artificial
reverberation obtained through an acoustic model developed for the purpose.
Finally, the dataset is synthesised to guarantee separability and balance. A few
signal representations are optimised to maximise the classification performance,
and the results are reported as a benchmark for future developments. The contribution to the existing knowledge concerns several aspects of the processing chain
implemented. A novel quasi-analytic acoustic model is introduced to simulate
in-pipe reverberations, adopting a three-layer architecture particularly convenient
for batch processing. The first layer includes two algorithms: one for the numerical
calculation of the axial wavenumbers and one for the separation of the modes. The
latter, in particular, provides a workaround for a problem not explicitly treated in the
literature and related to the modal non-orthogonality given by the solid-liquid interface in the analysed domain. A set of results for different waveguides is reported
to compare the dispersive behaviour against different mechanical configurations.
Two more novel solutions are also included in the second layer of the model and
concern the integration of the acoustic sources. Specifically, the amplitudes of the
non-orthogonal modal potentials are obtained using either a distance minimisation
objective function or by solving an analytical decoupling problem. In both cases,
results show that sources sufficiently smooth can be approximated with a limited
number of modes keeping the error below 1%. The last layer proposes a bespoke
approach for the integration of the acoustic model into the synthesiser as a reverberation simulator. Additional elements of novelty relate to the other blocks of the
audio synthesiser. The statistical spectral filter, for instance, is a batch-processing
solution for the attenuation of the background noise of the source recordings. The
signal-to-noise ratio analysis for both moderate and high noise levels indicates
a clear improvement of several decibels against the closest filter example in the
literature. The recombination of the audio blocks and the system of fully tracked
annotations are also novel extensions of similar approaches recently adopted in
other contexts. Moreover, a bespoke synthesis strategy is proposed to guarantee
separable and balanced datasets. The last contribution concerns the extraction
of convenient sets of audio features. Elements of novelty are introduced for the
optimisation of the filter banks of the mel-frequency cepstral coefficients and the
scattering wavelet transform. In particular, compared to the respective standard
definitions, the average F-score performance of the optimised features is roughly
6% higher in the first case and 2.5% higher for the latter. Finally, the soundbank,
the synthetic dataset, and the fundamental blocks of the software library developed
are publicly available for further research
Statistical pattern recognition for audio-forensics : empirical investigations on the application scenarios audio steganalysis and microphone forensics
Magdeburg, Univ., Fak. fĆ¼r Informatik, Diss., 2013von Christian KrƤtze