1,745 research outputs found
Identification of persons via voice imprint
Tato prĂĄce se zabĂœvĂĄ textovÄ zĂĄvislĂœm rozpoznĂĄvĂĄnĂm ĆeÄnĂkĆŻ v systĂ©mech, kde existuje pouze omezenĂ© mnoĆŸstvĂ trĂ©novacĂch vzorkĆŻ. Pro ĂșÄel rozpoznĂĄvĂĄnĂ je navrĆŸen otisk hlasu zaloĆŸenĂœ na rĆŻznĂœch pĆĂznacĂch (napĆ. MFCC, PLP, ACW atd.). Na zaÄĂĄtku prĂĄce je zmĂnÄn zpĆŻsob vytvĂĄĆenĂ ĆeÄovĂ©ho signĂĄlu. NÄkterĂ© charakteristiky ĆeÄi, dĆŻleĆŸitĂ© pro rozpoznĂĄvĂĄnĂ ĆeÄnĂkĆŻ, jsou rovnÄĆŸ zmĂnÄny. DalĆĄĂ ÄĂĄst prĂĄce se zabĂœvĂĄ analĂœzou ĆeÄovĂ©ho signĂĄlu. Je zde zmĂnÄno pĆedzpracovĂĄnĂ a takĂ© metody extrakce pĆĂznakĆŻ. NĂĄsledujĂcĂ ÄĂĄst popisuje proces rozpoznĂĄvĂĄnĂ ĆeÄnĂkĆŻ a zmiĆuje zpĆŻsoby ohodnocenĂ pouĆŸĂvanĂœch metod: identifikace a verifikace ĆeÄnĂkĆŻ. PoslednĂ teoreticky zaloĆŸenĂĄ ÄĂĄst prĂĄce se zabĂœvĂĄ klasifikĂĄtory vhodnĂœmi pro textovÄ zĂĄvislĂ© rozpoznĂĄvĂĄnĂ. Jsou zmĂnÄny klasifikĂĄtory zaloĆŸenĂ© na zlomkovĂœch vzdĂĄlenostech, dynamickĂ©m borcenĂ ÄasovĂ© osy, vyrovnĂĄvĂĄnĂ rozptylu a vektorovĂ© kvantizaci. Tato prĂĄce pokraÄuje nĂĄvrhem a realizacĂ systĂ©mu, kterĂœ hodnotĂ vĆĄechny zmĂnÄnĂ© klasifikĂĄtory pro otisk hlasu zaloĆŸenĂœ na rĆŻznĂœch pĆĂznacĂch.This work deals with the text-dependent speaker recognition in systems, where just a few training samples exist. For the purpose of this recognition, the voice imprint based on different features (e.g. MFCC, PLP, ACW etc.) is proposed. At the beginning, there is described the way, how the speech signal is produced. Some speech characteristics important for speaker recognition are also mentioned. The next part of work deals with the speech signal analysis. There is mentioned the preprocessing and also the feature extraction methods. The following part describes the process of speaker recognition and mentions the evaluation of the used methods: speaker identification and verification. Last theoretically based part of work deals with the classifiers which are suitable for the text-dependent recognition. The classifiers based on fractional distances, dynamic time warping, dispersion matching and vector quantization are mentioned. This work continues by design and realization of system, which evaluates all described classifiers for voice imprint based on different features.
The Conversation: Deep Audio-Visual Speech Enhancement
Our goal is to isolate individual speakers from multi-talker simultaneous
speech in videos. Existing works in this area have focussed on trying to
separate utterances from known speakers in controlled environments. In this
paper, we propose a deep audio-visual speech enhancement network that is able
to separate a speaker's voice given lip regions in the corresponding video, by
predicting both the magnitude and the phase of the target signal. The method is
applicable to speakers unheard and unseen during training, and for
unconstrained environments. We demonstrate strong quantitative and qualitative
results, isolating extremely challenging real-world examples.Comment: To appear in Interspeech 2018. We provide supplementary material with
interactive demonstrations on
http://www.robots.ox.ac.uk/~vgg/demo/theconversatio
Predicting Audio Advertisement Quality
Online audio advertising is a particular form of advertising used abundantly
in online music streaming services. In these platforms, which tend to host tens
of thousands of unique audio advertisements (ads), providing high quality ads
ensures a better user experience and results in longer user engagement.
Therefore, the automatic assessment of these ads is an important step toward
audio ads ranking and better audio ads creation. In this paper we propose one
way to measure the quality of the audio ads using a proxy metric called Long
Click Rate (LCR), which is defined by the amount of time a user engages with
the follow-up display ad (that is shown while the audio ad is playing) divided
by the impressions. We later focus on predicting the audio ad quality using
only acoustic features such as harmony, rhythm, and timbre of the audio,
extracted from the raw waveform. We discuss how the characteristics of the
sound can be connected to concepts such as the clarity of the audio ad message,
its trustworthiness, etc. Finally, we propose a new deep learning model for
audio ad quality prediction, which outperforms the other discussed models
trained on hand-crafted features. To the best of our knowledge, this is the
first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on
Web Search and Data Mining, 9 page
Multibiometric security in wireless communication systems
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 05/08/2010.This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and
WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition.
First is the enrolment phase by which the database of watermarked fingerprints with
memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel.
Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present oneâs fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user.
The following three steps then involve speaker recognition including the user
responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user.
In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint
image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and
sliding neighborhood) have been followed with further two steps for embedding, and
extracting the watermark into the enhanced fingerprint image utilising Discrete
Wavelet Transform (DWT).
In the speaker recognition stage, the limitations of this technique in wireless
communication have been addressed by sending voice feature (cepstral coefficients)
instead of raw sample. This scheme is to reap the advantages of reducing the
transmission time and dependency of the data on communication channel, together
with no loss of packet. Finally, the obtained results have verified the claims
Fractal based speech recognition and synthesis
Transmitting a linguistic message is most often the primary purpose of speech comÂmunication and the recognition of this message by machine that would be most useful.
This research consists of two major parts. The first part presents a novel and promisÂing approach for estimating the degree of recognition of speech phonemes and makes use of a new set of features based fractals. The main methods of computing the fracÂtal dimension of speech signals are reviewed and a new speaker-independent speech recognition system developed at De Montfort University is described in detail. FiÂnally, a Least Square Method as well as a novel Neural Network algorithm is employed to derive the recognition performance of the speech data.
The second part of this work studies the synthesis of speech words, which is based mainly on the fractal dimension to create natural sounding speech. The work shows that by careful use of the fractal dimension together with the phase of the speech signal to ensure consistent intonation contours, natural-sounding speech synthesis is achievable with word level speech. In order to extend the flexibility of this framework, we focused on the filtering and the compression of the phase to maintain and produce natural sounding speech. A ânaturalness levelâ is achieved as a result of the fractal characteristic used in the synthesis process. Finally, a novel speech synthesis system based on fractals developed at De Montfort University is discussed.
Throughout our research simulation experiments were performed on continuous speech data available from the Texas Instrument Massachusetts institute of technology ( TIMIT) database, which is designed to provide the speech research community with a standarised corpus for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition system
Comparison between Different Methods used in MFCC for Speaker Recognition System
The idea of the Speaker Recognition Project is to implement a recognizer which might determine an individual by process his/her voice. The essential goal of the project is to acknowledge and classify the speeches of various persons. This classification is especially supported extracting many key options like Mel Frequency Cepstral Coefficients (MFCC) from the speech signals of these persons by mistreatment methodology of feature extraction method. The on top of options could encompass pitch, amplitude, frequency etc. employing an applied math model like gaussian mixture model (GMM) and options extracted from those speech signals we have a tendency to build a novel identity for every one that listed for speaker recognition. Estimation and Maximization formula is employed, a chic and powerful methodology for locating the most chance answer for a model with latent variables, to check the later speakers against the information of all speakers who listed within the information. Use of divisional Fourier rework for feature extraction is additionally recommended to enhance the speaker recognition potency
In Car Audio
This chapter presents implementations of advanced in Car Audio Applications. The system is composed by three main different applications regarding the In Car listening and communication experience. Starting from a high level description of the algorithms, several implementations on different levels of hardware abstraction are presented, along with empirical results on both the design process undergone and the performance results achieved
Off-line handwritten signature recognition by wavelet entropy and neural network
Handwritten signatures are widely utilized as a form of personal recognition. However, they have the unfortunate shortcoming of being easily abused by those who would fake the identification or intent of an individual which might be very harmful. Therefore, the need for an automatic signature recognition system is crucial. In this paper, a signature recognition approach based on a probabilistic neural network (PNN) and wavelet transform average framing entropy (AFE) is proposed. The system was tested with a wavelet packet (WP) entropy denoted as a WP entropy neural network system (WPENN) and with a discrete wavelet transform (DWT) entropy denoted as a DWT entropy neural network system (DWENN). Our investigation was conducted over several wavelet families and different entropy types. Identification tasks, as well as verification tasks, were investigated for a comprehensive signature system study. Several other methods used in the literature were considered for comparison. Two databases were used for algorithm testing. The best recognition rate result was achieved by WPENN whereby the threshold entropy reached 92%
- âŠ