11,668 research outputs found

    Detecting adversarial manipulation using inductive Venn-ABERS predictors

    Get PDF
    Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. In this paper, we propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and resistant to adversarial examples. This robustness is observed on adversarials for the underlying model as well as adversarials that were generated by taking the IVAP into account. The method appears to offer competitive robustness compared to the state-of-the-art in adversarial defense yet it is computationally much more tractable

    Threshold Choice Methods: the Missing Link

    Full text link
    Many performance metrics have been introduced for the evaluation of classification performance, with different origins and niches of application: accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the absolute error, and the Brier score (with its decomposition into refinement and calibration). One way of understanding the relation among some of these metrics is the use of variable operating conditions (either in the form of misclassification costs or class proportions). Thus, a metric may correspond to some expected loss over a range of operating conditions. One dimension for the analysis has been precisely the distribution we take for this range of operating conditions, leading to some important connections in the area of proper scoring rules. However, we show that there is another dimension which has not received attention in the analysis of performance metrics. This new dimension is given by the decision rule, which is typically implemented as a threshold choice method when using scoring models. In this paper, we explore many old and new threshold choice methods: fixed, score-uniform, score-driven, rate-driven and optimal, among others. By calculating the loss of these methods for a uniform range of operating conditions we get the 0-1 loss, the absolute error, the Brier score (mean squared error), the AUC and the refinement loss respectively. This provides a comprehensive view of performance metrics as well as a systematic approach to loss minimisation, namely: take a model, apply several threshold choice methods consistent with the information which is (and will be) available about the operating condition, and compare their expected losses. In order to assist in this procedure we also derive several connections between the aforementioned performance metrics, and we highlight the role of calibration in choosing the threshold choice method

    Design and Performance of the Wide-Field X-Ray Monitor on Board the High-Energy Transient Explorer 2

    Full text link
    The Wide-field X-ray Monitor (WXM) is one of the scientific instruments carried on the High Energy Transient Explorer 2 (HETE-2) satellite launched on 2000 October 9. HETE-2 is an international mission consisting of a small satellite dedicated to provide broad-band observations and accurate localizations of gamma-ray bursts (GRBs). A unique feature of this mission is its capability to determine and transmit GRB coordinates in almost real-time through the burst alert network. The WXM consists of three elements: four identical Xe-filled one-dimensional position-sensitive proportional counters, two sets of one-dimensional coded apertures, and the main electronics. The WXM counters are sensitive to X-rays between 2 keV and 25 keV within a field-of-view of about 1.5 sr, with a total detector area of about 350 cm2^2. The in-flight triggering and localization capability can produce a real-time GRB location of several to 30 arcmin accuracy, with a limiting sensitivity of 10710^{-7} erg cm2^{-2}. In this report, the details of the mechanical structure, electronics, on-board software, ground and in-flight calibration, and in-flight performance of the WXM are discussed.Comment: 28 pages, 24 figure

    Toward Open-Set Face Recognition

    Full text link
    Much research has been conducted on both face identification and face verification, with greater focus on the latter. Research on face identification has mostly focused on using closed-set protocols, which assume that all probe images used in evaluation contain identities of subjects that are enrolled in the gallery. Real systems, however, where only a fraction of probe sample identities are enrolled in the gallery, cannot make this closed-set assumption. Instead, they must assume an open set of probe samples and be able to reject/ignore those that correspond to unknown identities. In this paper, we address the widespread misconception that thresholding verification-like scores is a good way to solve the open-set face identification problem, by formulating an open-set face identification protocol and evaluating different strategies for assessing similarity. Our open-set identification protocol is based on the canonical labeled faces in the wild (LFW) dataset. Additionally to the known identities, we introduce the concepts of known unknowns (known, but uninteresting persons) and unknown unknowns (people never seen before) to the biometric community. We compare three algorithms for assessing similarity in a deep feature space under an open-set protocol: thresholded verification-like scores, linear discriminant analysis (LDA) scores, and an extreme value machine (EVM) probabilities. Our findings suggest that thresholding EVM probabilities, which are open-set by design, outperforms thresholding verification-like scores.Comment: Accepted for Publication in CVPR 2017 Biometrics Worksho

    Development and Validation of Credit-Scoring Models

    Get PDF
    Accurate credit-granting decisions are crucial to the efficiency of the decentralized capital allocation mechanisms in modern market economies. Credit bureaus and many .nancial institutions have developed and used credit-scoring models to standardize and automate, to the extent possible, credit decisions. We build credit scoring models for bankcard markets using the Office of the Comptroller of the Currency, Risk Analysis Division (OCC/RAD) consumer credit database (CCDB). This unusu- ally rich data set allows us to evaluate a number of methods in common practice. We introduce, estimate, and validate our models, using both out-of-sample contempora- neous and future validation data sets. Model performance is compared using both separation and accuracy measures. A vendor-developed generic bureau-based score is also included in the model performance comparisons. Our results indicate that current industry practices, when carefully applied, can produce models that robustly rank-order potential borrowers both at the time of development and through the near future. However, these same methodologies are likely to fail when the the objective is to accurately estimate future rates of delinquency or probabilities of default for individual or groups of borrowers.

    The Eighth Data Release of the Sloan Digital Sky Survey: First Data from SDSS-III

    Get PDF
    The Sloan Digital Sky Survey (SDSS) started a new phase in August 2008, with new instrumentation and new surveys focused on Galactic structure and chemical evolution, measurements of the baryon oscillation feature in the clustering of galaxies and the quasar Ly alpha forest, and a radial velocity search for planets around ~8000 stars. This paper describes the first data release of SDSS-III (and the eighth counting from the beginning of the SDSS). The release includes five-band imaging of roughly 5200 deg^2 in the Southern Galactic Cap, bringing the total footprint of the SDSS imaging to 14,555 deg^2, or over a third of the Celestial Sphere. All the imaging data have been reprocessed with an improved sky-subtraction algorithm and a final, self-consistent photometric recalibration and flat-field determination. This release also includes all data from the second phase of the Sloan Extension for Galactic Understanding and Evolution (SEGUE-2), consisting of spectroscopy of approximately 118,000 stars at both high and low Galactic latitudes. All the more than half a million stellar spectra obtained with the SDSS spectrograph have been reprocessed through an improved stellar parameters pipeline, which has better determination of metallicity for high metallicity stars.Comment: Astrophysical Journal Supplements, in press (minor updates from submitted version

    Financial Integration and International Risk Sharing

    Get PDF
    In the last two decades, financial integration has increased dramatically across the world. At the same time, the fraction of countries in default has more than doubled. Contrary to theory, however, there appears to have been no substantial improvement in the degree of international risk sharing. To account for this puzzle, we construct a general equilibrium model that features a continuum of countries and default choices on state-uncontingent bonds. We model increased financial integration as a decrease in the cost of borrowing. Our main finding is that as the cost of borrowing is lowered, financial integration and sovereign default increases substantially, but the degree of risk sharing as measured by cross section and panel regressions increases hardly at all. The explanation, we propose, is that international risk sharing is not sensitive to the increase in financial integration given the current magnitude of capital flows because countries can insure themselves through accumulation of domestic assets. To get better risk sharing, capital flows among countries need to be extremely large. In addition, although the ability to default on loans provides state contingency, it restricts international risk sharing in two ways: higher borrowing rates and future exclusion from international credit marketsFinanical Integration, Risk Sharing, Globalization, Sovereign Debt
    corecore