354 research outputs found
Bit rates in audio source coding
The goal is to introduce and solve the audio coding optimization problem. Psychoacoustic results such as masking and excitation pattern models are combined with results from rate distortion theory to formulate the audio coding optimization problem. The solution of the audio optimization problem is a masked error spectrum, prescribing how quantization noise must be distributed over the audio spectrum to obtain a minimal bit rate and an inaudible coding errors. This result cannot only be used to estimate performance bounds, but can also be directly applied in audio coding systems. Subband coding applications to magnetic recording and transmission are discussed in some detail. Performance bounds for this type of subband coding system are derived
Reducing Audible Spectral Discontinuities
In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon.We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities
Time-scale and pitch modifications of speech signals and resynthesis from the discrete short-time Fourier transform
The modification methods described in this paper combine characteristics of PSOLA-based methods and algorithms that resynthesize speech from its short-time Fourier magnitude only. The starting point is a short-time Fourier representation of the signal. In the case of duration modification, portions, in voiced speech corresponding to pitch periods, are removed from or inserted in this representation. In the case of pitch modification, pitch periods are shortened or extended in this representation, and a number of pitch periods is inserted or removed, respectively. Since it is an important tool for both duration and pitch modification, the resynthesis-from-short-time-Fourier-magnitude-only method of Griffin and Lim (1984) and Griffin et al. (1984) is reviewed and adapted. Duration and pitch modification methods and their results are presented.\ud
\u
Biometric Authentication System on Mobile Personal Devices
We propose a secure, robust, and low-cost biometric authentication system on the mobile personal device for the personal network. The system consists of the following five key modules: 1) face detection; 2) face registration; 3) illumination normalization; 4) face verification; and 5) information fusion. For the complicated face authentication task on the devices with limited resources, the emphasis is largely on the reliability and applicability of the system. Both theoretical and practical considerations are taken. The final system is able to achieve an equal error rate of 2% under challenging testing protocols. The low hardware and software cost makes the system well adaptable to a large range of security applications
Binary Biometric Representation through Pairwise Adaptive Phase Quantization
Extracting binary strings from real-valued biometric templates is a fundamental step in template compression and protection systems, such as fuzzy commitment, fuzzy extractor, secure sketch, and helper data systems. Quantization and coding is the straightforward way to extract binary representations from arbitrary real-valued biometric modalities. In this paper, we propose a pairwise adaptive phase quantization (APQ) method, together with a long-short (LS) pairing strategy, which aims to maximize the overall detection rate. Experimental results on the FVC2000 fingerprint and the FRGC face database show reasonably good verification performances.\ud
\u
Extraction of vocal-tract system characteristics from speechsignals
We propose methods to track natural variations in the characteristics of the vocal-tract system from speech signals. We are especially interested in the cases where these characteristics vary over time, as happens in dynamic sounds such as consonant-vowel transitions. We show that the selection of appropriate analysis segments is crucial in these methods, and we propose a selection based on estimated instants of significant excitation. These instants are obtained by a method based on the average group-delay property of minimum-phase signals. In voiced speech, they correspond to the instants of glottal closure. The vocal-tract system is characterized by its formant parameters, which are extracted from the analysis segments. Because the segments are always at the same relative position in each pitch period, in voiced speech the extracted formants are consistent across successive pitch periods. We demonstrate the results of the analysis for several difficult cases of speech signals
The effect of position sources on estimated eigenvalues in intensity modeled data
In biometrics, often models are used in which the data distributions are approximated with normal distributions. In particular, the eigenface method models facial data as a mixture of fixed-position intensity signals with a normal distribution. The model parameters, a mean value and a covariance matrix, need to be estimated from a training set. Scree plots showing the eigenvalues of the estimated covariance matrices have two very typical characteristics when facial data is used: firstly, most of the curve can be approximated by a straight line on a double logarithmic plot, and secondly, if the number of samples used for the estimation is smaller than the dimensionality of these samples, using more samples for the estimation results in more intensity sources being estimated and a larger part of the scree plot curve is accurately modeled by a straight line.\ud
One explanation for this behaviour is that the fixed-position intensity model is an inaccurate model of facial data. This is further supported by previous experiments in which synthetic data with the same second order statistics as facial data gives a much higher performance of biometric systems. We hypothesize that some of the sources in face data are better modeled as position sources, and therefore the fixed-position intensity sources model should be extended with position sources. Examples of features in the face which might change position between either images of different people or images of the same person are the eyes, the pupils within the eyes and the corners of the mouth.\ud
We show experimentally that when data containing a limit number of position sources is used in a system based on the fixed-position intensity sources model, the resulting scree plots have similar characteristics as the scree plots of facial data, thus supporting our claim that facial data at least contains sources inaccurately modeled by the fixed position intensity sources model, and position sources might provide a better model for these sources.\u
Predicting Face Recognition Performance Using Image Quality
This paper proposes a data driven model to predict the performance of a face
recognition system based on image quality features. We model the relationship
between image quality features (e.g. pose, illumination, etc.) and recognition
performance measures using a probability density function. To address the issue
of limited nature of practical training data inherent in most data driven
models, we have developed a Bayesian approach to model the distribution of
recognition performance measures in small regions of the quality space. Since
the model is based solely on image quality features, it can predict performance
even before the actual recognition has taken place. We evaluate the performance
predictive capabilities of the proposed model for six face recognition systems
(two commercial and four open source) operating on three independent data sets:
MultiPIE, FRGC and CAS-PEAL. Our results show that the proposed model can
accurately predict performance using an accurate and unbiased Image Quality
Assessor (IQA). Furthermore, our experiments highlight the impact of the
unaccounted quality space -- the image quality features not considered by IQA
-- in contributing to performance prediction errors.Comment: Submitted to TPAMI journal on Apr. 22, 2015. Decision of "Revise and
resubmit as new" received on Sep. 10, 2015. At present, updating the paper to
address the feedback and concerns of the two reviewers. The re-submitted
paper will be uploaded as version 2 on arXi
Forensic Face Recognition: A Survey
Beside a few papers which focus on the forensic aspects of automatic face recognition, there is not much published about it in contrast to the literature on developing new techniques and methodologies for biometric face recognition. In this report, we review forensic facial identification which is the forensic experts‟ way of manual facial comparison. Then we review famous works in the domain of forensic face recognition. Some of these papers describe general trends in forensics [1], guidelines for manual forensic facial comparison and training of face examiners who will be required to verify the outcome of automatic forensic face recognition system [2]. Some proposes theoretical framework for application of face recognition technology in forensics [3] and automatic forensic facial comparison [4, 5]. Bayesian framework is discussed in detail and it is elaborated how it can be adapted to forensic face recognition. Several issues related with court admissibility and reliability of system are also discussed. \ud
Until now, there is no operational system available which automatically compare image of a suspect with mugshot database and provide result usable in court. The fact that biometric face recognition can in most cases be used for forensic purpose is true but the issues related to integration of technology with legal system of court still remain to be solved. There is a great need for research which is multi-disciplinary in nature and which will integrate the face recognition technology with existing legal systems. In this report we present a review of the existing literature in this domain and discuss various aspects and requirements for forensic face recognition systems particularly focusing on Bayesian framework
Non-frontal model based approach to forensic face recognition
In this paper, we propose a non-frontal model based approach which ensures that a face recognition system always gets to compare images having similar view (or pose). This requires a virtual suspect reference set that consists of non-frontal suspect images having pose similar to the surveillance view trace image. We apply the 3D model reconstruction followed by image synthesis approach to the frontal view mug shot images in the suspect reference set in order to create such a virtual suspect reference set. This strategy not only ensures a stable 3D face model reconstruction because of the relatively good quality mug shot suspect images but also provides a practical solution for forensic cases where the trace is often of very low quality. For most face recognition algorithms, the relative pose difference between the test and reference image is one of the major causes of severe degradation in recognition performance. Moreover, given appropriate training, comparing a pair of non-frontal images is no more difficult that comparing frontal view images
- …