2,976 research outputs found

    Assessment of Virgin Olive Oil Adulteration by a Rapid Luminescent Method

    Get PDF
    The adulteration of virgin olive oil with hazelnut oil is a common fraud in the food industry, which makes mandatory the development of accurate methods to guarantee the authenticity and traceability of virgin olive oil. In this work, we demonstrate the potential of a rapid luminescent method to characterize edible oils and to detect adulterations among them. A regression model based on five luminescent frequencies related to minor oil components was designed and validated, providing excellent performance for the detection of virgin olive oil adulteration

    On the Distribution of Speaker Verification Scores: Generative Models for Unsupervised Calibration

    Get PDF
    Speaker verification systems whose outputs can be interpreted as log-likelihood ratios (LLR) allow for cost-effective decisions by comparing the system outputs to application-defined thresholds depending only on prior information. Classifiers often produce uncalibrated scores, and require additional processing to produce well-calibrated LLRs. Recently, generative score calibration models have been proposed, which achieve calibration performance close to that of state-of-the-art discriminative techniques for supervised scenarios, while also allowing for unsupervised training. The effectiveness of these methods, however, strongly depends on their capabilities to correctly model the target and non-target score distributions. In this work we propose theoretically grounded and accurate models for characterizing the distribution of scores of speaker verification systems. Our approach is based on tied Generalized Hyperbolic distributions and overcomes many limitations of Gaussian models. Experimental results on different NIST benchmarks, using different utterance representation front-ends and different back-end classifiers, show that our method is effective not only in supervised scenarios, but also in unsupervised tasks characterized by very low proportion of target trials

    Alluvial Substrate Mapping by Automated Texture Segmentation of Recreational-Grade Side Scan Sonar Imagery

    Get PDF
    Side scan sonar in low-cost ‘fishfinder’ systems has become popular in aquatic ecology and sedimentology for imaging submerged riverbed sediment at coverages and resolutions sufficient to relate bed texture to grain-size. Traditional methods to map bed texture (i.e. physical samples) are relatively high-cost and low spatial coverage compared to sonar, which can continuously image several kilometers of channel in a few hours. Towards a goal of automating the classification of bed habitat features, we investigate relationships between substrates and statistical descriptors of bed textures in side scan sonar echograms of alluvial deposits. We develop a method for automated segmentation of bed textures into between two to five grain-size classes. Second-order texture statistics are used in conjunction with a Gaussian Mixture Model to classify the heterogeneous bed into small homogeneous patches of sand, gravel, and boulders with an average accuracy of 80%, 49%, and 61%, respectively. Reach-averaged proportions of these sediment types were within 3% compared to similar maps derived from multibeam sonar

    A Generative Model for Duration-Dependent Score Calibration

    Get PDF

    Quantifying Riverbed Sediment Using Recreational-Grade Side Scan Sonar

    Get PDF
    The size and organization of bed material, bed texture, is a fundamental attribute of channels and is one component of the physical habitat of aquatic ecosystems. Multiple discipline-specific definitions of texture exist and there is not a universally accepted metric(s) to quantify the spectrum of possible bed textures found in aquatic environments. Moreover, metrics to describe texture are strictly statistical. Recreational-grade side scan sonar systems now offer the possibility of imaging submerged riverbed sediment at resolutions potentially sufficient to identify subtle changes in bed texture with minimal cost,expertise in sonar, or logistical effort. However, inferring riverbed sediment from side scan sonar data is limited because recreational-grade systems were not designed for this purpose and methods to interpret the data have relied on manual and semi-automated routines. Visual interpretation of side scan sonar data is not practically applied to large volumes of data because it is labor intensive and lacks reproducibility. This thesis addresses current limitations associated with visual interpretation with two objectives: 1) objectively quantify side scan sonar imagery texture, and 2) develop an automated texture segmentation algorithm for broad-scale substrate characterization. To address objective 1), I used a time series of imagery collected along a 1.6 km reach of the Colorado River in Marble Canyon, AZ. A statistically based texture analysis was performed on georeferenced side scan sonar imagery to identify objective metrics that could be used to discriminate different sediment types. A Grey Level Co-occurrence Matrix based texture analysis was found to successfully discriminate the textures associated with different sediment types. Texture varies significantly at the scale of ≈ 9 m2 on side scan sonar imagery on a regular 25 cm grid. A minimum of three and maximum of five distinct textures could be observed directly from side scan sonar imagery. To address objective 2), linear least squares and a Gaussian mixture modeling approach were developed and tested. Both sediment classification methods were found to successfully classify heterogeneous riverbeds into homogeneous patches of sand, gravel, and boulders. Gaussian mixture models outperformed the least squares models because they classified gravel with the highest accuracies.Additionally, substrate maps derived from a Gaussian modeling approach were found to be able to better estimate reach averaged proportions of different sediments types when they were compared to similar maps derived from multibeam sonar

    Estimating the Contamination Factor's Distribution in Unsupervised Anomaly Detection

    Full text link
    Anomaly detection methods identify examples that do not follow the expected behaviour, typically in an unsupervised fashion, by assigning real-valued anomaly scores to the examples based on various heuristics. These scores need to be transformed into actual predictions by thresholding, so that the proportion of examples marked as anomalies equals the expected proportion of anomalies, called contamination factor. Unfortunately, there are no good methods for estimating the contamination factor itself. We address this need from a Bayesian perspective, introducing a method for estimating the posterior distribution of the contamination factor of a given unlabeled dataset. We leverage on outputs of several anomaly detectors as a representation that already captures the basic notion of anomalousness and estimate the contamination using a specific mixture formulation. Empirically on 22 datasets, we show that the estimated distribution is well-calibrated and that setting the threshold using the posterior mean improves the anomaly detectors' performance over several alternative methods. All code is publicly available for full reproducibility

    Calibration and Evaluation of Outlier Detection with Generated Data

    Get PDF
    Outlier detection is an essential part of data science --- an area with increasing relevance in a plethora of domains. While there already exist numerous approaches for the detection of outliers, some significant challenges remain relevant. Two prominent such challenges are that outliers are rare and not precisely defined. They both have serious consequences, especially on the calibration and evaluation of detection methods. This thesis is concerned with a possible way of dealing with these challenges: the generation of outliers. It discusses existing techniques for generating outliers but specifically also their use in tackling the mentioned challenges. In the literature, the topic of outlier generation seems to have only little general structure so far --- despite that many techniques were already proposed. Thus, the first contribution of this thesis is a unified and crisp description of the state-of-the-art in outlier generation and their usages. Given the variety of characteristics of the generated outliers and the variety of methods designed for the detection of real outliers, it becomes apparent that a comparison of detection performance should be more distinctive than state-of-the-art comparisons are. Such a distinctive comparison is tackled in the second central contribution of this thesis: a general process for the distinctive evaluation of outlier detection methods with generated data. The process developed in this thesis uses entirely artificial data in which the inliers are realistic representations of some real-world data and the outliers deviations from these inliers with specific characteristics. The realness of the inliers allows the generalization of performance evaluations to many other data domains. The carefully designed generation techniques for outliers allow insights on the effect of the characteristics of outliers. So-called hidden outliers represent a special type of outliers: they also depend on a set of selections of data attributes, i.e., a set of subspaces. Hidden outliers are only detectable in a particular set of subspaces. In the subspaces they are hidden from, they are not detectable. For outlier detection methods that make use of subspaces, hidden outliers are a blind-spot: if they hide from the subspaces, searched for outliers. Thus, hidden outliers are exciting to study, for the evaluation of detection methods that use subspaces in particular. The third central contribution of this thesis is a technique for the generation of hidden outliers. An analysis of the characteristics of such instances is featured as well. First, the concept of hidden outliers is broached theoretical for this analysis. Then the developed technique is also used to validate the theoretical findings in more realistic contexts. For example, to show that hidden outliers could appear in many real-world data sets. All in all, this dissertation gives the field of outlier generation needed structure and shows their usefulness in tackling prominent challenges of the outlier detection problem
    • 

    corecore