1,676 research outputs found
Identification of persons via voice imprint
Tato prĂĄce se zabĂœvĂĄ textovÄ zĂĄvislĂœm rozpoznĂĄvĂĄnĂm ĆeÄnĂkĆŻ v systĂ©mech, kde existuje pouze omezenĂ© mnoĆŸstvĂ trĂ©novacĂch vzorkĆŻ. Pro ĂșÄel rozpoznĂĄvĂĄnĂ je navrĆŸen otisk hlasu zaloĆŸenĂœ na rĆŻznĂœch pĆĂznacĂch (napĆ. MFCC, PLP, ACW atd.). Na zaÄĂĄtku prĂĄce je zmĂnÄn zpĆŻsob vytvĂĄĆenĂ ĆeÄovĂ©ho signĂĄlu. NÄkterĂ© charakteristiky ĆeÄi, dĆŻleĆŸitĂ© pro rozpoznĂĄvĂĄnĂ ĆeÄnĂkĆŻ, jsou rovnÄĆŸ zmĂnÄny. DalĆĄĂ ÄĂĄst prĂĄce se zabĂœvĂĄ analĂœzou ĆeÄovĂ©ho signĂĄlu. Je zde zmĂnÄno pĆedzpracovĂĄnĂ a takĂ© metody extrakce pĆĂznakĆŻ. NĂĄsledujĂcĂ ÄĂĄst popisuje proces rozpoznĂĄvĂĄnĂ ĆeÄnĂkĆŻ a zmiĆuje zpĆŻsoby ohodnocenĂ pouĆŸĂvanĂœch metod: identifikace a verifikace ĆeÄnĂkĆŻ. PoslednĂ teoreticky zaloĆŸenĂĄ ÄĂĄst prĂĄce se zabĂœvĂĄ klasifikĂĄtory vhodnĂœmi pro textovÄ zĂĄvislĂ© rozpoznĂĄvĂĄnĂ. Jsou zmĂnÄny klasifikĂĄtory zaloĆŸenĂ© na zlomkovĂœch vzdĂĄlenostech, dynamickĂ©m borcenĂ ÄasovĂ© osy, vyrovnĂĄvĂĄnĂ rozptylu a vektorovĂ© kvantizaci. Tato prĂĄce pokraÄuje nĂĄvrhem a realizacĂ systĂ©mu, kterĂœ hodnotĂ vĆĄechny zmĂnÄnĂ© klasifikĂĄtory pro otisk hlasu zaloĆŸenĂœ na rĆŻznĂœch pĆĂznacĂch.This work deals with the text-dependent speaker recognition in systems, where just a few training samples exist. For the purpose of this recognition, the voice imprint based on different features (e.g. MFCC, PLP, ACW etc.) is proposed. At the beginning, there is described the way, how the speech signal is produced. Some speech characteristics important for speaker recognition are also mentioned. The next part of work deals with the speech signal analysis. There is mentioned the preprocessing and also the feature extraction methods. The following part describes the process of speaker recognition and mentions the evaluation of the used methods: speaker identification and verification. Last theoretically based part of work deals with the classifiers which are suitable for the text-dependent recognition. The classifiers based on fractional distances, dynamic time warping, dispersion matching and vector quantization are mentioned. This work continues by design and realization of system, which evaluates all described classifiers for voice imprint based on different features.
Predicting Audio Advertisement Quality
Online audio advertising is a particular form of advertising used abundantly
in online music streaming services. In these platforms, which tend to host tens
of thousands of unique audio advertisements (ads), providing high quality ads
ensures a better user experience and results in longer user engagement.
Therefore, the automatic assessment of these ads is an important step toward
audio ads ranking and better audio ads creation. In this paper we propose one
way to measure the quality of the audio ads using a proxy metric called Long
Click Rate (LCR), which is defined by the amount of time a user engages with
the follow-up display ad (that is shown while the audio ad is playing) divided
by the impressions. We later focus on predicting the audio ad quality using
only acoustic features such as harmony, rhythm, and timbre of the audio,
extracted from the raw waveform. We discuss how the characteristics of the
sound can be connected to concepts such as the clarity of the audio ad message,
its trustworthiness, etc. Finally, we propose a new deep learning model for
audio ad quality prediction, which outperforms the other discussed models
trained on hand-crafted features. To the best of our knowledge, this is the
first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on
Web Search and Data Mining, 9 page
Tea Category Identification Using a Novel Fractional Fourier Entropy and Jaya Algorithm
This work proposes a tea-category identification (TCI) system, which can automatically determine tea category from images captured by a 3 charge-coupled device (CCD) digital camera. Three-hundred tea images were acquired as the dataset. Apart from the 64 traditional color histogram features that were extracted, we also introduced a relatively new feature as fractional Fourier entropy (FRFE) and extracted 25 FRFE features from each tea image. Furthermore, the kernel principal component analysis (KPCA) was harnessed to reduce 64 + 25 = 89 features. The four reduced features were fed into a feedforward neural network (FNN). Its optimal weights were obtained by Jaya algorithm. The 10 Ă 10-fold stratified cross-validation (SCV) showed that our TCI system obtains an overall average sensitivity rate of 97.9%, which was higher than seven existing approaches. In addition, we used only four features less than or equal to state-of-the-art approaches. Our proposed system is efficient in terms of tea-category identification
Multibiometric security in wireless communication systems
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 05/08/2010.This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and
WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition.
First is the enrolment phase by which the database of watermarked fingerprints with
memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel.
Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present oneâs fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user.
The following three steps then involve speaker recognition including the user
responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user.
In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint
image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and
sliding neighborhood) have been followed with further two steps for embedding, and
extracting the watermark into the enhanced fingerprint image utilising Discrete
Wavelet Transform (DWT).
In the speaker recognition stage, the limitations of this technique in wireless
communication have been addressed by sending voice feature (cepstral coefficients)
instead of raw sample. This scheme is to reap the advantages of reducing the
transmission time and dependency of the data on communication channel, together
with no loss of packet. Finally, the obtained results have verified the claims
Off-line handwritten signature recognition by wavelet entropy and neural network
Handwritten signatures are widely utilized as a form of personal recognition. However, they have the unfortunate shortcoming of being easily abused by those who would fake the identification or intent of an individual which might be very harmful. Therefore, the need for an automatic signature recognition system is crucial. In this paper, a signature recognition approach based on a probabilistic neural network (PNN) and wavelet transform average framing entropy (AFE) is proposed. The system was tested with a wavelet packet (WP) entropy denoted as a WP entropy neural network system (WPENN) and with a discrete wavelet transform (DWT) entropy denoted as a DWT entropy neural network system (DWENN). Our investigation was conducted over several wavelet families and different entropy types. Identification tasks, as well as verification tasks, were investigated for a comprehensive signature system study. Several other methods used in the literature were considered for comparison. Two databases were used for algorithm testing. The best recognition rate result was achieved by WPENN whereby the threshold entropy reached 92%
Fractal based speech recognition and synthesis
Transmitting a linguistic message is most often the primary purpose of speech comÂmunication and the recognition of this message by machine that would be most useful.
This research consists of two major parts. The first part presents a novel and promisÂing approach for estimating the degree of recognition of speech phonemes and makes use of a new set of features based fractals. The main methods of computing the fracÂtal dimension of speech signals are reviewed and a new speaker-independent speech recognition system developed at De Montfort University is described in detail. FiÂnally, a Least Square Method as well as a novel Neural Network algorithm is employed to derive the recognition performance of the speech data.
The second part of this work studies the synthesis of speech words, which is based mainly on the fractal dimension to create natural sounding speech. The work shows that by careful use of the fractal dimension together with the phase of the speech signal to ensure consistent intonation contours, natural-sounding speech synthesis is achievable with word level speech. In order to extend the flexibility of this framework, we focused on the filtering and the compression of the phase to maintain and produce natural sounding speech. A ânaturalness levelâ is achieved as a result of the fractal characteristic used in the synthesis process. Finally, a novel speech synthesis system based on fractals developed at De Montfort University is discussed.
Throughout our research simulation experiments were performed on continuous speech data available from the Texas Instrument Massachusetts institute of technology ( TIMIT) database, which is designed to provide the speech research community with a standarised corpus for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition system
- âŠ