1,274 research outputs found

    Perbandingan Ekstraksi Ciri Full, Blocks, dan Row Mean Spectrogram Image Dalam Mengidentifikasi Pembicara

    Get PDF
    AbstrakPada sebuah sistem recognition, pemilihan metode ekstraksi ciri dan ukuran fitur yang digunakan mempengaruhi tingkat keakuratan identifikasi. Berkaitan dengan hal itu, dalam penelitian ini akan dijabarkan perbandingan tiga metode ekstraksi ciri CBIR yaitu row mean image, full image, dan blocks image. Ketiga metode tersebut digunakan untuk mengidentifikasi pembicara dengan menitikberatkan pada ukuran selection feature vector yang digunakan.Data suara diperoleh dari rekaman suara menggunakan handphone. Rekaman suara berasal dari 10 orang narasumber dengan rincian 5 pria dan 5 wanita. Setiap narasumber mengucapkan lima buah kalimat yaitu Selamat Pagi, Selamat Siang, Selamat Sore, Selamat Malam, dan Dengan Siapa serta diulangi delapan kali tiap kalimat.Karena menerapkan metode CBIR maka rekaman suara yang berbentuk sinyal dikonversi menjadi image spectrogram menggunakan STFT. Kemudian spectrogram diimplementasikan ke kekre transform lalu diekstrasi cirinya. Penggunaan kekre transform bertujuan untuk menyeleksi dan mengambil kemungkinan-kemungkinan fitur yang optimal serta juga meringankan proses komputasi.Menggunakan data reference 250 image spectrogram dan data testing 150 image spectrogram memberikan hasil bahwa metode ekstraksi ciri full image memperoleh persentase identifikasi lebih tinggi yaitu 93,3% dengan ukuran fitur 32x32. Kata kunci— Identifikasi pembicara, Spektrogram, Transformasi kekre, Full image, Blocks Image, Row mean image AbstractOn a system of recognition, selection feature extraction method and feature size are used in identification affects identication rate. In that regard, this study will presents comparison three feature extraction methods namely row mean image, full image, and blocks image. The third method used to identify the speaker with a focus on the size selection feature vector are used. Sound data obtained from the mobile phone voice recording. Sound recording derived from 10 speakers consisting of 5 men and 5 women. Every speakers pronounce five sentences are Selamat Pagi, Selamat Siang, Selamat Sore, Selamat Malam, and Dengan siapa as well as repeated eight times.Because applying CBIR methods then the sound recording signal is converted into an image spectrogram using STFT. Spectrogram is formed implemented in kekre transform to extract feature. Using kekre transform aims to select and take the possibilities optimal feature also relieves the computing process.Using reference data 250 spectrogram and testing data 150 spectrogram produces results that the full image feature extraction methods obtain a higher percentage identification rate is 93,3% with a feature size of 32x32. Keywords— Speaker identification, Spectrogram, Kekre Transform, Full Image, Blocks Image, Row Mean Imag

    Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression

    Get PDF
    This paper addresses the problem of localizing audio sources using binaural measurements. We propose a supervised formulation that simultaneously localizes multiple sources at different locations. The approach is intrinsically efficient because, contrary to prior work, it relies neither on source separation, nor on monaural segregation. The method starts with a training stage that establishes a locally-linear Gaussian regression model between the directional coordinates of all the sources and the auditory features extracted from binaural measurements. While fixed-length wide-spectrum sounds (white noise) are used for training to reliably estimate the model parameters, we show that the testing (localization) can be extended to variable-length sparse-spectrum sounds (such as speech), thus enabling a wide range of realistic applications. Indeed, we demonstrate that the method can be used for audio-visual fusion, namely to map speech signals onto images and hence to spatially align the audio and visual modalities, thus enabling to discriminate between speaking and non-speaking faces. We release a novel corpus of real-room recordings that allow quantitative evaluation of the co-localization method in the presence of one or two sound sources. Experiments demonstrate increased accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure

    Perbandingan Ekstraksi Ciri Full, Blocks, dan Row Mean Spectrogram Image dalam Mengidentifikasi Pembicara

    Get PDF
    Pada sebuah sistem identifikasi pembicara, pemilihan metode ekstraksi ciri dan ukuran ciri yang digunakan mempengaruhi tingkat keakuratan identifikasi. Berkaitan dengan hal itu, dalam penelitian ini akan dijabarkan perbandingan tiga metode ekstraksi ciri CBIR yaitu row mean image, full image, dan blocks image. Ketiga metode tersebut digunakan untuk mengidentifikasi pembicara dengan menitikberatkan pada ukuran selection feature vector yang digunakan.Data suara diperoleh dari rekaman suara menggunakan handphone. Rekaman suara berasal dari 10 orang narasumber dengan rincian 5 pria dan 5 wanita. Setiap narasumber mengucapkan lima buah kalimat yaitu Selamat Pagi, Selamat Siang, Selamat Sore, Selamat Malam, dan Dengan Siapa serta diulangi delapan kali tiap kalimat. Rekaman suara yang digunakan terlebih dahulu dikonversi menjadi image spectrogram menggunakan STFT. Spectrogram yang terbentuk kemudian diteruskan ke kekre transform lalu diekstraksi cirinya. Penggunaan kekre transform bertujuan untuk menyeleksi dan mengambil kemungkinan-kemungkinan ciri yang optimal serta juga meringankan proses komputasi.Menggunakan data reference 250 image spectrogram dan data testing 150 image spectrogram memberikan hasil bahwa metode ekstraksi ciri full image memperoleh persentase identifikasi lebih tinggi yaitu 93,3% dengan ukuran fitur 32x32.Kata kunci---Identifikasi pembicara, Spektrogram, Transformasi kekre, Full image, Blocks Image, Row mean imag

    VoxCeleb2: Deep Speaker Recognition

    Full text link
    The objective of this paper is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual speaker recognition dataset collected from open-source media. Using a fully automated pipeline, we curate VoxCeleb2 which contains over a million utterances from over 6,000 speakers. This is several times larger than any publicly available speaker recognition dataset. Second, we develop and compare Convolutional Neural Network (CNN) models and training strategies that can effectively recognise identities from voice under various conditions. The models trained on the VoxCeleb2 dataset surpass the performance of previous works on a benchmark dataset by a significant margin.Comment: To appear in Interspeech 2018. The audio-visual dataset can be downloaded from http://www.robots.ox.ac.uk/~vgg/data/voxceleb2 . 1806.05622v2: minor fixes; 5 page

    Speaker Recognition in Content-based Image Retrieval for a High Degree of Accuracy

    Get PDF
    The purpose of this research is to measure the speaker recognition accuracy in Content-Based Image Retrieval. To support research in speaker recognition accuracy, we use two approaches for recognition system: identification and verification, an identification using fuzzy Mamdani, a verification using Manhattan distance. The test results in this research. The best of distance mean is size 32x32. The best of the verification for distance rate is 965, and the speaker recognition system has a standard error of 5% and the system accuracy is 95%. From these results, we find that there is an increase in accuracy of almost 2.5%. This is due to a combination of two approaches so the system can add to the accuracy of speaker recognition
    • …