1,745 research outputs found

    Identification of persons via voice imprint

    Get PDF
    Tato prĂĄce se zabĂœvĂĄ textově zĂĄvislĂœm rozpoznĂĄvĂĄnĂ­m ƙečnĂ­kĆŻ v systĂ©mech, kde existuje pouze omezenĂ© mnoĆŸstvĂ­ trĂ©novacĂ­ch vzorkĆŻ. Pro Ășčel rozpoznĂĄvĂĄnĂ­ je navrĆŸen otisk hlasu zaloĆŸenĂœ na rĆŻznĂœch pƙíznacĂ­ch (napƙ. MFCC, PLP, ACW atd.). Na začátku prĂĄce je zmĂ­něn zpĆŻsob vytváƙenĂ­ ƙečovĂ©ho signĂĄlu. NěkterĂ© charakteristiky ƙeči, dĆŻleĆŸitĂ© pro rozpoznĂĄvĂĄnĂ­ ƙečnĂ­kĆŻ, jsou rovnÄ›ĆŸ zmĂ­něny. DalĆĄĂ­ část prĂĄce se zabĂœvĂĄ analĂœzou ƙečovĂ©ho signĂĄlu. Je zde zmĂ­něno pƙedzpracovĂĄnĂ­ a takĂ© metody extrakce pƙíznakĆŻ. NĂĄsledujĂ­cĂ­ část popisuje proces rozpoznĂĄvĂĄnĂ­ ƙečnĂ­kĆŻ a zmiƈuje zpĆŻsoby ohodnocenĂ­ pouĆŸĂ­vanĂœch metod: identifikace a verifikace ƙečnĂ­kĆŻ. PoslednĂ­ teoreticky zaloĆŸenĂĄ část prĂĄce se zabĂœvĂĄ klasifikĂĄtory vhodnĂœmi pro textově zĂĄvislĂ© rozpoznĂĄvĂĄnĂ­. Jsou zmĂ­něny klasifikĂĄtory zaloĆŸenĂ© na zlomkovĂœch vzdĂĄlenostech, dynamickĂ©m borcenĂ­ časovĂ© osy, vyrovnĂĄvĂĄnĂ­ rozptylu a vektorovĂ© kvantizaci. Tato prĂĄce pokračuje nĂĄvrhem a realizacĂ­ systĂ©mu, kterĂœ hodnotĂ­ vĆĄechny zmĂ­něnĂ© klasifikĂĄtory pro otisk hlasu zaloĆŸenĂœ na rĆŻznĂœch pƙíznacĂ­ch.This work deals with the text-dependent speaker recognition in systems, where just a few training samples exist. For the purpose of this recognition, the voice imprint based on different features (e.g. MFCC, PLP, ACW etc.) is proposed. At the beginning, there is described the way, how the speech signal is produced. Some speech characteristics important for speaker recognition are also mentioned. The next part of work deals with the speech signal analysis. There is mentioned the preprocessing and also the feature extraction methods. The following part describes the process of speaker recognition and mentions the evaluation of the used methods: speaker identification and verification. Last theoretically based part of work deals with the classifiers which are suitable for the text-dependent recognition. The classifiers based on fractional distances, dynamic time warping, dispersion matching and vector quantization are mentioned. This work continues by design and realization of system, which evaluates all described classifiers for voice imprint based on different features.

    The Conversation: Deep Audio-Visual Speech Enhancement

    Full text link
    Our goal is to isolate individual speakers from multi-talker simultaneous speech in videos. Existing works in this area have focussed on trying to separate utterances from known speakers in controlled environments. In this paper, we propose a deep audio-visual speech enhancement network that is able to separate a speaker's voice given lip regions in the corresponding video, by predicting both the magnitude and the phase of the target signal. The method is applicable to speakers unheard and unseen during training, and for unconstrained environments. We demonstrate strong quantitative and qualitative results, isolating extremely challenging real-world examples.Comment: To appear in Interspeech 2018. We provide supplementary material with interactive demonstrations on http://www.robots.ox.ac.uk/~vgg/demo/theconversatio

    Predicting Audio Advertisement Quality

    Full text link
    Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 9 page

    Multibiometric security in wireless communication systems

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 05/08/2010.This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition. First is the enrolment phase by which the database of watermarked fingerprints with memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel. Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present one’s fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user. The following three steps then involve speaker recognition including the user responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user. In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and sliding neighborhood) have been followed with further two steps for embedding, and extracting the watermark into the enhanced fingerprint image utilising Discrete Wavelet Transform (DWT). In the speaker recognition stage, the limitations of this technique in wireless communication have been addressed by sending voice feature (cepstral coefficients) instead of raw sample. This scheme is to reap the advantages of reducing the transmission time and dependency of the data on communication channel, together with no loss of packet. Finally, the obtained results have verified the claims

    Fractal based speech recognition and synthesis

    Get PDF
    Transmitting a linguistic message is most often the primary purpose of speech com­munication and the recognition of this message by machine that would be most useful. This research consists of two major parts. The first part presents a novel and promis­ing approach for estimating the degree of recognition of speech phonemes and makes use of a new set of features based fractals. The main methods of computing the frac­tal dimension of speech signals are reviewed and a new speaker-independent speech recognition system developed at De Montfort University is described in detail. Fi­nally, a Least Square Method as well as a novel Neural Network algorithm is employed to derive the recognition performance of the speech data. The second part of this work studies the synthesis of speech words, which is based mainly on the fractal dimension to create natural sounding speech. The work shows that by careful use of the fractal dimension together with the phase of the speech signal to ensure consistent intonation contours, natural-sounding speech synthesis is achievable with word level speech. In order to extend the flexibility of this framework, we focused on the filtering and the compression of the phase to maintain and produce natural sounding speech. A ‘naturalness level’ is achieved as a result of the fractal characteristic used in the synthesis process. Finally, a novel speech synthesis system based on fractals developed at De Montfort University is discussed. Throughout our research simulation experiments were performed on continuous speech data available from the Texas Instrument Massachusetts institute of technology ( TIMIT) database, which is designed to provide the speech research community with a standarised corpus for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition system

    Comparison between Different Methods used in MFCC for Speaker Recognition System

    Get PDF
    The idea of the Speaker Recognition Project is to implement a recognizer which might determine an individual by process his/her voice. The essential goal of the project is to acknowledge and classify the speeches of various persons. This classification is especially supported extracting many key options like Mel Frequency Cepstral Coefficients (MFCC) from the speech signals of these persons by mistreatment methodology of feature extraction method. The on top of options could encompass pitch, amplitude, frequency etc. employing an applied math model like gaussian mixture model (GMM) and options extracted from those speech signals we have a tendency to build a novel identity for every one that listed for speaker recognition. Estimation and Maximization formula is employed, a chic and powerful methodology for locating the most chance answer for a model with latent variables, to check the later speakers against the information of all speakers who listed within the information. Use of divisional Fourier rework for feature extraction is additionally recommended to enhance the speaker recognition potency

    In Car Audio

    Get PDF
    This chapter presents implementations of advanced in Car Audio Applications. The system is composed by three main different applications regarding the In Car listening and communication experience. Starting from a high level description of the algorithms, several implementations on different levels of hardware abstraction are presented, along with empirical results on both the design process undergone and the performance results achieved

    Off-line handwritten signature recognition by wavelet entropy and neural network

    Get PDF
    Handwritten signatures are widely utilized as a form of personal recognition. However, they have the unfortunate shortcoming of being easily abused by those who would fake the identification or intent of an individual which might be very harmful. Therefore, the need for an automatic signature recognition system is crucial. In this paper, a signature recognition approach based on a probabilistic neural network (PNN) and wavelet transform average framing entropy (AFE) is proposed. The system was tested with a wavelet packet (WP) entropy denoted as a WP entropy neural network system (WPENN) and with a discrete wavelet transform (DWT) entropy denoted as a DWT entropy neural network system (DWENN). Our investigation was conducted over several wavelet families and different entropy types. Identification tasks, as well as verification tasks, were investigated for a comprehensive signature system study. Several other methods used in the literature were considered for comparison. Two databases were used for algorithm testing. The best recognition rate result was achieved by WPENN whereby the threshold entropy reached 92%
    • 

    corecore