Search CORE

1,676 research outputs found

Identification of persons via voice imprint

Author: Mekyska Jiří
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2010
Field of study

Tato práce se zabývá textově závislým rozpoznáváním řečníků v systémech, kde existuje pouze omezené množství trénovacích vzorků. Pro účel rozpoznávání je navržen otisk hlasu založený na různých příznacích (např. MFCC, PLP, ACW atd.). Na začátku práce je zmíněn způsob vytváření řečového signálu. Některé charakteristiky řeči, důležité pro rozpoznávání řečníků, jsou rovněž zmíněny. Další část práce se zabývá analýzou řečového signálu. Je zde zmíněno předzpracování a také metody extrakce příznaků. Následující část popisuje proces rozpoznávání řečníků a zmiňuje způsoby ohodnocení používaných metod: identifikace a verifikace řečníků. Poslední teoreticky založená část práce se zabývá klasifikátory vhodnými pro textově závislé rozpoznávání. Jsou zmíněny klasifikátory založené na zlomkových vzdálenostech, dynamickém borcení časové osy, vyrovnávání rozptylu a vektorové kvantizaci. Tato práce pokračuje návrhem a realizací systému, který hodnotí všechny zmíněné klasifikátory pro otisk hlasu založený na různých příznacích.This work deals with the text-dependent speaker recognition in systems, where just a few training samples exist. For the purpose of this recognition, the voice imprint based on different features (e.g. MFCC, PLP, ACW etc.) is proposed. At the beginning, there is described the way, how the speech signal is produced. Some speech characteristics important for speaker recognition are also mentioned. The next part of work deals with the speech signal analysis. There is mentioned the preprocessing and also the feature extraction methods. The following part describes the process of speaker recognition and mentions the evaluation of the used methods: speaker identification and verification. Last theoretically based part of work deals with the classifiers which are suitable for the text-dependent recognition. The classifiers based on fractional distances, dynamic time warping, dispersion matching and vector quantization are mentioned. This work continues by design and realization of system, which evaluates all described classifiers for voice imprint based on different features.

Digital library of Brno University of Technology

National Repository of Grey Literature

Predicting Audio Advertisement Quality

Author: Bandiera Giuseppe
Böck Sebastian
Glorot Xavier
Kim Youngmoo E
Kingma Diederik
Nieto Oriol
Prockup Matthew
Schörkhuber Christian
Seyerlehner Klaus
Witten Ian H
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/02/2018
Field of study

Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 9 page

arXiv.org e-Print Archive

Crossref

Acoustic signal processing with robust machine learning algorithm for improved monitoring of particulate solid materials in a gas flowline

Author: Andrew Cowell
Bello
Don McGlinchey
Droubi
El-Alej
El-Alej
Guido
Guo
Haugsdal
Hu
Isaacson
Kos
Kuda Tijjani Aminu
Le
Ludeña-Choez
Mackinnon
Mason
McCulloch
McKay
Mirjalili
Mirjalili
Mitrović
Mittal
Odigie
Ooi
Riedmiller
Shannon
Shuiping
Sun
Sun
Thiruvenkatanathan
Toh
Waibel
Wang
Wang
Xie
Yan
Publication venue: 'Elsevier BV'
Publication date: 01/03/2019
Field of study

Crossref

ResearchOnline@GCU

Tea Category Identification Using a Novel Fractional Fourier Entropy and Jaya Algorithm

Author: Carlo Cattani
Preetha Phillips
Ravipudi Venkata Rao
Shuihua Wang
Xiaojun Yang
Yudong Zhang
Publication venue
Publication date: 01/01/2016
Field of study

This work proposes a tea-category identification (TCI) system, which can automatically determine tea category from images captured by a 3 charge-coupled device (CCD) digital camera. Three-hundred tea images were acquired as the dataset. Apart from the 64 traditional color histogram features that were extracted, we also introduced a relatively new feature as fractional Fourier entropy (FRFE) and extracted 25 FRFE features from each tea image. Furthermore, the kernel principal component analysis (KPCA) was harnessed to reduce 64 + 25 = 89 features. The four reduced features were fed into a feedforward neural network (FNN). Its optimal weights were obtained by Jaya algorithm. The 10 × 10-fold stratified cross-validation (SCV) showed that our TCI system obtains an overall average sensitivity rate of 97.9%, which was higher than seven existing approaches. In addition, we used only four features less than or equal to state-of-the-art approaches. Our proposed system is efficient in terms of tea-category identification

Directory of Open Access Journals

Unitus DSpace

Open Access Repository

Multibiometric security in wireless communication systems

Author: Sepasian Mojtaba
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 05/08/2010.This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition. First is the enrolment phase by which the database of watermarked fingerprints with memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel. Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present one’s fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user. The following three steps then involve speaker recognition including the user responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user. In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and sliding neighborhood) have been followed with further two steps for embedding, and extracting the watermark into the enhanced fingerprint image utilising Discrete Wavelet Transform (DWT). In the speaker recognition stage, the limitations of this technique in wireless communication have been addressed by sending voice feature (cepstral coefficients) instead of raw sample. This scheme is to reap the advantages of reducing the transmission time and dependency of the data on communication channel, together with no loss of packet. Finally, the obtained results have verified the claims

CiteSeerX

Brunel University Research Archive

Off-line handwritten signature recognition by wavelet entropy and neural network

Author: Ajour Mohammed N.
Balamesh Ahmad
Daqrouq Khaled
Sweidan Husam
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/05/2017
Field of study

Handwritten signatures are widely utilized as a form of personal recognition. However, they have the unfortunate shortcoming of being easily abused by those who would fake the identification or intent of an individual which might be very harmful. Therefore, the need for an automatic signature recognition system is crucial. In this paper, a signature recognition approach based on a probabilistic neural network (PNN) and wavelet transform average framing entropy (AFE) is proposed. The system was tested with a wavelet packet (WP) entropy denoted as a WP entropy neural network system (WPENN) and with a discrete wavelet transform (DWT) entropy denoted as a DWT entropy neural network system (DWENN). Our investigation was conducted over several wavelet families and different entropy types. Identification tasks, as well as verification tasks, were investigated for a comprehensive signature system study. Several other methods used in the literature were considered for comparison. Two databases were used for algorithm testing. The best recognition rate result was achieved by WPENN whereby the threshold entropy reached 92%

Multidisciplinary Digital Publishing Institute

Michigan Technological University

Directory of Open Access Journals

Efficient speaker recognition for mobile devices

Author: Karpov Evgeny
Publication venue: University of Eastern Finland
Publication date
Field of study

UEF Electronic Publications

Fractal based speech recognition and synthesis

Author: Fekkai Souhila
Publication venue: Department of Computing Science and Engineering
Publication date: 01/10/2002
Field of study

Transmitting a linguistic message is most often the primary purpose of speech communication and the recognition of this message by machine that would be most useful. This research consists of two major parts. The first part presents a novel and promising approach for estimating the degree of recognition of speech phonemes and makes use of a new set of features based fractals. The main methods of computing the fractal dimension of speech signals are reviewed and a new speaker-independent speech recognition system developed at De Montfort University is described in detail. Finally, a Least Square Method as well as a novel Neural Network algorithm is employed to derive the recognition performance of the speech data. The second part of this work studies the synthesis of speech words, which is based mainly on the fractal dimension to create natural sounding speech. The work shows that by careful use of the fractal dimension together with the phase of the speech signal to ensure consistent intonation contours, natural-sounding speech synthesis is achievable with word level speech. In order to extend the flexibility of this framework, we focused on the filtering and the compression of the phase to maintain and produce natural sounding speech. A ‘naturalness level’ is achieved as a result of the fractal characteristic used in the synthesis process. Finally, a novel speech synthesis system based on fractals developed at De Montfort University is discussed. Throughout our research simulation experiments were performed on continuous speech data available from the Texas Instrument Massachusetts institute of technology ( TIMIT) database, which is designed to provide the speech research community with a standarised corpus for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition system

De Montfort University Open Research Archive