Search CORE

13,888 research outputs found

Automatic Speech Recognition for Indonesian using Linear Predictive Coding (LPC) and Hidden Markov Model (HMM)

Author: Adhy Satriyo
Akbar Rizky
Endah Sukmawati Nur
Sutikno S.
Publication venue
Publication date: 07/10/2015
Field of study

Speech recognition is influential signal processing in communication technology. Speech recognition has allowed software to recognize the spoken word. Automatic speech recognition could be a solution to recognize the spoken word. This application was developed using Linear Predictive Coding (LPC) for feature extraction of speech signal and Hidden Markov Model (HMM) for generating the model of each the spoken word. The data of speech used for training and testing was produced by 10 speaker (5 men and 5 women) whose each speakers spoke 10 words and each of words spoken for 10 times. This research is tested using 10-fold cross validation for each pair LPC order and HMM states. System performance is measured based on the average accuracy testing from men and women speakers. According to the test results that the amount of HMM states affect the accuracy of system and the best accuracy is 94, 20% using LPC order =13 and HMM state=16

Diponegoro University Institutional Repository

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

Author: Abdullah Hadi
Butler Kevin R. B.
Garcia Washington
Peeters Christian
Traynor Patrick
Wilson Joseph
Publication venue
Publication date: 01/01/2019
Field of study

Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands - audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model and limited to specific microphones and speakers, making their use across different acoustic hardware platforms (and thus their practicality) limited. In this paper, we break these dependencies and make hidden command attacks more practical through model-agnostic (blackbox) attacks, which exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems. Specifically, we exploit the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms (e.g., FFTs). We develop four classes of perturbations that create unintelligible audio and test them against 12 machine learning models, including 7 proprietary models (e.g., Google Speech API, Bing Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful attacks against all targets. Moreover, we successfully use our maliciously generated audio samples in multiple hardware configurations, demonstrating effectiveness across both models and real systems. In so doing, we demonstrate that domain-specific knowledge of audio signal processing represents a practical means of generating successful hidden voice command attacks

arXiv.org e-Print Archive

Crossref

Automatic Speech Recognition for Indonesian using Linear Predictive Coding (LPC) and Hidden Markov Model (HMM)

Author: Adhy Satriyo
Akbar Rizky
Endah Sukmawati
Sutikno -
Publication venue
Publication date
Field of study

Diponegoro University Institutional Repository

A Review of Chinese Academy of Sciences (CASIA) Gait Database As a Human Gait Recognition Dataset

Author: Andrie Rosa
Arai Kohei
Basuki Achmad
Publication venue
Publication date: 26/10/2011
Field of study

Human Gait as the recognition object is the famous biometrics system recently. Many researchers had focused this subject to consider for a new recognition system. One of the important advantage in this recognition compare to other is it does not require observed subject’s attention and cooperation. There are many human gait datasets created within the last 10 years. Some databases that are widely used are University Of South Florida (USF) Gait Dataset, Chinese Academy of Sciences (CASIA) Gait Dataset, and Southampton University (SOTON) Gait Dataset. This paper will analyze the CASIA Gait Dataset in order to see their characteristics. There are 2 pre-processing subsystems; model based and model free approach. We will use 2D Discrete Wavelet Transform (DWT). We select Haar wavelets to reduce and extract the feature

EEPIS Repository

Glottal Source Cepstrum Coefficients Applied to NIST SRE 2010

Author: Gómez Vilda Pedro
Martínez Olalla Rafael
Mazaira Fernández Luis Miguel
Muñoz Cristina
Álvarez Marquina Agustin
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2010
Field of study

Through the present paper, a novel feature set for speaker recognition based on glottal estimate information is presented. An iterative algorithm is used to derive the vocal tract and glottal source estimations from speech signal. In order to test the importance of glottal source information in speaker characterization, the novel feature set has been tested in the 2010 NIST Speaker Recognition Evaluation (NIST SRE10). The proposed system uses glottal estimate parameter templates and classical cepstral information to build a model for each speaker involved in the recognition process. ALIZE [1] open-source software has been used to create the GMM models for both background and target speakers. Compared to using mel-frequency cepstrum coefficients (MFCC), the misclassification rate for the NIST SRE 2010 reduced from 29.43% to 27.15% when glottal source features are use

Archivo Digital UPM

An agent-driven semantical identifier using radial basis neural networks and reinforcement learning

Author: Napoli Christian
Pappalardo Giuseppe
Tramontana Emiliano
Publication venue
Publication date: 01/01/2014
Field of study

Due to the huge availability of documents in digital form, and the deception possibility raise bound to the essence of digital documents and the way they are spread, the authorship attribution problem has constantly increased its relevance. Nowadays, authorship attribution,for both information retrieval and analysis, has gained great importance in the context of security, trust and copyright preservation. This work proposes an innovative multi-agent driven machine learning technique that has been developed for authorship attribution. By means of a preprocessing for word-grouping and time-period related analysis of the common lexicon, we determine a bias reference level for the recurrence frequency of the words within analysed texts, and then train a Radial Basis Neural Networks (RBPNN)-based classifier to identify the correct author. The main advantage of the proposed approach lies in the generality of the semantic analysis, which can be applied to different contexts and lexical domains, without requiring any modification. Moreover, the proposed system is able to incorporate an external input, meant to tune the classifier, and then self-adjust by means of continuous learning reinforcement.Comment: Published on: Proceedings of the XV Workshop "Dagli Oggetti agli Agenti" (WOA 2014), Catania, Italy, Sepember. 25-26, 201

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Writer Identification Using Inexpensive Signal Processing Techniques

Author: P. Mahalanobis
R.W. Hamming
S. Mokhov
S.A. Mokhov
S.A. Mokhov
S.A. Mokhov
S.A. Mokhov
S.A. Mokhov
——
——
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/12/2009
Field of study

We propose to use novel and classical audio and text signal-processing and otherwise techniques for "inexpensive" fast writer identification tasks of scanned hand-written documents "visually". The "inexpensive" refers to the efficiency of the identification process in terms of CPU cycles while preserving decent accuracy for preliminary identification. This is a comparative study of multiple algorithm combinations in a pattern recognition pipeline implemented in Java around an open-source Modular Audio Recognition Framework (MARF) that can do a lot more beyond audio. We present our preliminary experimental findings in such an identification task. We simulate "visual" identification by "looking" at the hand-written document as a whole rather than trying to extract fine-grained features out of it prior classification.Comment: 9 pages; 1 figure; presented at CISSE'09 at http://conference.cisse2009.org/proceedings.aspx ; includes the the application source code; based on MARF described in arXiv:0905.123

arXiv.org e-Print Archive

Crossref