2,468 research outputs found

    On the Distribution of Speaker Verification Scores: Generative Models for Unsupervised Calibration

    Get PDF
    Speaker verification systems whose outputs can be interpreted as log-likelihood ratios (LLR) allow for cost-effective decisions by comparing the system outputs to application-defined thresholds depending only on prior information. Classifiers often produce uncalibrated scores, and require additional processing to produce well-calibrated LLRs. Recently, generative score calibration models have been proposed, which achieve calibration performance close to that of state-of-the-art discriminative techniques for supervised scenarios, while also allowing for unsupervised training. The effectiveness of these methods, however, strongly depends on their capabilities to correctly model the target and non-target score distributions. In this work we propose theoretically grounded and accurate models for characterizing the distribution of scores of speaker verification systems. Our approach is based on tied Generalized Hyperbolic distributions and overcomes many limitations of Gaussian models. Experimental results on different NIST benchmarks, using different utterance representation front-ends and different back-end classifiers, show that our method is effective not only in supervised scenarios, but also in unsupervised tasks characterized by very low proportion of target trials

    A Generative Model for Duration-Dependent Score Calibration

    Get PDF

    ROBUST SPEAKER RECOGNITION BASED ON LATENT VARIABLE MODELS

    Get PDF
    Automatic speaker recognition in uncontrolled environments is a very challenging task due to channel distortions, additive noise and reverberation. To address these issues, this thesis studies probabilistic latent variable models of short-term spectral information that leverage large amounts of data to achieve robustness in challenging conditions. Current speaker recognition systems represent an entire speech utterance as a single point in a high-dimensional space. This representation is known as "supervector". This thesis starts by analyzing the properties of this representation. A novel visualization procedure of supervectors is presented by which qualitative insight about the information being captured is obtained. We then propose the use of an overcomplete dictionary to explicitly decompose a supervector into a speaker-specific component and an undesired variability component. An algorithm to learn the dictionary from a large collection of data is discussed and analyzed. A subset of the entries of the dictionary is learned to represent speaker-specific information and another subset to represent distortions. After encoding the supervector as a linear combination of the dictionary entries, the undesired variability is removed by discarding the contribution of the distortion components. This paradigm is closely related to the previously proposed paradigm of Joint Factor Analysis modeling of supervectors. We establish a connection between the two approaches and show how our proposed method provides improvements in terms of computation and recognition accuracy. An alternative way to handle undesired variability in supervector representations is to first project them into a lower dimensional space and then to model them in the reduced subspace. This low-dimensional projection is known as "i-vector". Unfortunately, i-vectors exhibit non-Gaussian behavior, and direct statistical modeling requires the use of heavy-tailed distributions for optimal performance. These approaches lack closed-form solutions, and therefore are hard to analyze. Moreover, they do not scale well to large datasets. Instead of directly modeling i-vectors, we propose to first apply a non-linear transformation and then use a linear-Gaussian model. We present two alternative transformations and show experimentally that the transformed i-vectors can be optimally modeled by a simple linear-Gaussian model (factor analysis). We evaluate our method on a benchmark dataset with a large amount of channel variability and show that the results compare favorably against the competitors. Also, our approach has closed-form solutions and scales gracefully to large datasets. Finally, a multi-classifier architecture trained on a multicondition fashion is proposed to address the problem of speaker recognition in the presence of additive noise. A large number of experiments are conducted to analyze the proposed architecture and to obtain guidelines for optimal performance in noisy environments. Overall, it is shown that multicondition training of multi-classifier architectures not only produces great robustness in the anticipated conditions, but also generalizes well to unseen conditions

    Alluvial Substrate Mapping by Automated Texture Segmentation of Recreational-Grade Side Scan Sonar Imagery

    Get PDF
    Side scan sonar in low-cost ‘fishfinder’ systems has become popular in aquatic ecology and sedimentology for imaging submerged riverbed sediment at coverages and resolutions sufficient to relate bed texture to grain-size. Traditional methods to map bed texture (i.e. physical samples) are relatively high-cost and low spatial coverage compared to sonar, which can continuously image several kilometers of channel in a few hours. Towards a goal of automating the classification of bed habitat features, we investigate relationships between substrates and statistical descriptors of bed textures in side scan sonar echograms of alluvial deposits. We develop a method for automated segmentation of bed textures into between two to five grain-size classes. Second-order texture statistics are used in conjunction with a Gaussian Mixture Model to classify the heterogeneous bed into small homogeneous patches of sand, gravel, and boulders with an average accuracy of 80%, 49%, and 61%, respectively. Reach-averaged proportions of these sediment types were within 3% compared to similar maps derived from multibeam sonar

    Quantifying Riverbed Sediment Using Recreational-Grade Side Scan Sonar

    Get PDF
    The size and organization of bed material, bed texture, is a fundamental attribute of channels and is one component of the physical habitat of aquatic ecosystems. Multiple discipline-specific definitions of texture exist and there is not a universally accepted metric(s) to quantify the spectrum of possible bed textures found in aquatic environments. Moreover, metrics to describe texture are strictly statistical. Recreational-grade side scan sonar systems now offer the possibility of imaging submerged riverbed sediment at resolutions potentially sufficient to identify subtle changes in bed texture with minimal cost,expertise in sonar, or logistical effort. However, inferring riverbed sediment from side scan sonar data is limited because recreational-grade systems were not designed for this purpose and methods to interpret the data have relied on manual and semi-automated routines. Visual interpretation of side scan sonar data is not practically applied to large volumes of data because it is labor intensive and lacks reproducibility. This thesis addresses current limitations associated with visual interpretation with two objectives: 1) objectively quantify side scan sonar imagery texture, and 2) develop an automated texture segmentation algorithm for broad-scale substrate characterization. To address objective 1), I used a time series of imagery collected along a 1.6 km reach of the Colorado River in Marble Canyon, AZ. A statistically based texture analysis was performed on georeferenced side scan sonar imagery to identify objective metrics that could be used to discriminate different sediment types. A Grey Level Co-occurrence Matrix based texture analysis was found to successfully discriminate the textures associated with different sediment types. Texture varies significantly at the scale of ≈ 9 m2 on side scan sonar imagery on a regular 25 cm grid. A minimum of three and maximum of five distinct textures could be observed directly from side scan sonar imagery. To address objective 2), linear least squares and a Gaussian mixture modeling approach were developed and tested. Both sediment classification methods were found to successfully classify heterogeneous riverbeds into homogeneous patches of sand, gravel, and boulders. Gaussian mixture models outperformed the least squares models because they classified gravel with the highest accuracies.Additionally, substrate maps derived from a Gaussian modeling approach were found to be able to better estimate reach averaged proportions of different sediments types when they were compared to similar maps derived from multibeam sonar
    • …
    corecore