13,319 research outputs found
Writer Identification Using Inexpensive Signal Processing Techniques
We propose to use novel and classical audio and text signal-processing and
otherwise techniques for "inexpensive" fast writer identification tasks of
scanned hand-written documents "visually". The "inexpensive" refers to the
efficiency of the identification process in terms of CPU cycles while
preserving decent accuracy for preliminary identification. This is a
comparative study of multiple algorithm combinations in a pattern recognition
pipeline implemented in Java around an open-source Modular Audio Recognition
Framework (MARF) that can do a lot more beyond audio. We present our
preliminary experimental findings in such an identification task. We simulate
"visual" identification by "looking" at the hand-written document as a whole
rather than trying to extract fine-grained features out of it prior
classification.Comment: 9 pages; 1 figure; presented at CISSE'09 at
http://conference.cisse2009.org/proceedings.aspx ; includes the the
application source code; based on MARF described in arXiv:0905.123
Alcohol Language Corpus
The Alcohol Language Corpus (ALC) is the first publicly available speech corpus comprising intoxicated and sober speech of 162 female and male German speakers.
Recordings are done in the automotive environment to allow for the development of automatic alcohol detection and to ensure a consistent acoustic environment for the alcoholized and the sober recording. The recorded speech covers a variety of contents and speech styles. Breath and blood alcohol concentration measurements are provided for all speakers. A transcription according to SpeechDat/Verbmobil standards and disfluency tagging as well as an automatic phonetic segmentation are part of the corpus. An Emu version of ALC allows easy access to basic speech parameters as well as the us of R for statistical analysis of selected parts of ALC. ALC is available without restriction for scientific or commercial use at the Bavarian Archive for Speech Signals
Speaker recognition by means of restricted Boltzmann machine adaptation
Restricted Boltzmann Machines (RBMs) have shown success in speaker recognition. In this paper, RBMs are investigated in a framework comprising a universal model training and model adaptation. Taking advantage of RBM unsupervised learning algorithm, a global model is trained based on all available background data. This general speaker-independent model, referred to as URBM, is further adapted to the data of a specific speaker to build speaker-dependent model. In order to show its effectiveness, we have applied this framework to two different tasks. It has been used to discriminatively model target and impostor spectral features for classification. It has been also utilized to produce a vector-based representation for speakers. This vector-based representation, similar to i-vector, can be further used for speaker recognition using either cosine scoring or Probabilistic Linear Discriminant Analysis (PLDA). The evaluation is performed on the core test condition of the NIST SRE 2006 database.Peer ReviewedPostprint (author's final draft
Homomorphic Encryption for Speaker Recognition: Protection of Biometric Templates and Vendor Model Parameters
Data privacy is crucial when dealing with biometric data. Accounting for the
latest European data privacy regulation and payment service directive,
biometric template protection is essential for any commercial application.
Ensuring unlinkability across biometric service operators, irreversibility of
leaked encrypted templates, and renewability of e.g., voice models following
the i-vector paradigm, biometric voice-based systems are prepared for the
latest EU data privacy legislation. Employing Paillier cryptosystems, Euclidean
and cosine comparators are known to ensure data privacy demands, without loss
of discrimination nor calibration performance. Bridging gaps from template
protection to speaker recognition, two architectures are proposed for the
two-covariance comparator, serving as a generative model in this study. The
first architecture preserves privacy of biometric data capture subjects. In the
second architecture, model parameters of the comparator are encrypted as well,
such that biometric service providers can supply the same comparison modules
employing different key pairs to multiple biometric service operators. An
experimental proof-of-concept and complexity analysis is carried out on the
data from the 2013-2014 NIST i-vector machine learning challenge
Cepstral trajectories in linguistic units for text-independent speaker recognition
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-35292-8_3Proceedings of IberSPEECH, held in Madrid (Spain) on 2012.In this paper, the contributions of different linguistic units to the speaker recognition task are explored by means of temporal trajectories of their MFCC features. Inspired by successful work in forensic speaker identification, we extend the approach based on temporal contours of formant frequencies in linguistic units to design a fully automatic system that puts together both forensic and automatic speaker recognition worlds. The combination of MFCC features and unit-dependent trajectories provides a powerful tool to extract individualizing information. At a fine-grained level, we provide a calibrated likelihood ratio per linguistic unit under analysis (extremely useful in applications such as forensics), and at a coarse-grained level, we combine the individual contributions of the different units to obtain a highly discriminative single system. This approach has been tested with NIST SRE 2006 datasets and protocols, consisting of 9,720 trials from 219 male speakers for the 1side-1side English-only task, and development data being extracted from 367 male speakers from 1,808 conversations from NIST SRE 2004 and 2005 datasetsSupported by MEC grant PR-2010-123, MICINN project TEC09-14179, ForBayes project CCG10-UAM/TIC-5792 and Cátedra UAM-Telefónica
- …