11,668 research outputs found
Detecting adversarial manipulation using inductive Venn-ABERS predictors
Inductive Venn-ABERS predictors (IVAPs) are a type of probabilistic predictors with the theoretical guarantee that their predictions are perfectly calibrated. In this paper, we propose to exploit this calibration property for the detection of adversarial examples in binary classification tasks. By rejecting predictions if the uncertainty of the IVAP is too high, we obtain an algorithm that is both accurate on the original test set and resistant to adversarial examples. This robustness is observed on adversarials for the underlying model as well as adversarials that were generated by taking the IVAP into account. The method appears to offer competitive robustness compared to the state-of-the-art in adversarial defense yet it is computationally much more tractable
Threshold Choice Methods: the Missing Link
Many performance metrics have been introduced for the evaluation of
classification performance, with different origins and niches of application:
accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the
absolute error, and the Brier score (with its decomposition into refinement and
calibration). One way of understanding the relation among some of these metrics
is the use of variable operating conditions (either in the form of
misclassification costs or class proportions). Thus, a metric may correspond to
some expected loss over a range of operating conditions. One dimension for the
analysis has been precisely the distribution we take for this range of
operating conditions, leading to some important connections in the area of
proper scoring rules. However, we show that there is another dimension which
has not received attention in the analysis of performance metrics. This new
dimension is given by the decision rule, which is typically implemented as a
threshold choice method when using scoring models. In this paper, we explore
many old and new threshold choice methods: fixed, score-uniform, score-driven,
rate-driven and optimal, among others. By calculating the loss of these methods
for a uniform range of operating conditions we get the 0-1 loss, the absolute
error, the Brier score (mean squared error), the AUC and the refinement loss
respectively. This provides a comprehensive view of performance metrics as well
as a systematic approach to loss minimisation, namely: take a model, apply
several threshold choice methods consistent with the information which is (and
will be) available about the operating condition, and compare their expected
losses. In order to assist in this procedure we also derive several connections
between the aforementioned performance metrics, and we highlight the role of
calibration in choosing the threshold choice method
Design and Performance of the Wide-Field X-Ray Monitor on Board the High-Energy Transient Explorer 2
The Wide-field X-ray Monitor (WXM) is one of the scientific instruments
carried on the High Energy Transient Explorer 2 (HETE-2) satellite launched on
2000 October 9. HETE-2 is an international mission consisting of a small
satellite dedicated to provide broad-band observations and accurate
localizations of gamma-ray bursts (GRBs). A unique feature of this mission is
its capability to determine and transmit GRB coordinates in almost real-time
through the burst alert network. The WXM consists of three elements: four
identical Xe-filled one-dimensional position-sensitive proportional counters,
two sets of one-dimensional coded apertures, and the main electronics. The WXM
counters are sensitive to X-rays between 2 keV and 25 keV within a
field-of-view of about 1.5 sr, with a total detector area of about 350 cm.
The in-flight triggering and localization capability can produce a real-time
GRB location of several to 30 arcmin accuracy, with a limiting sensitivity of
erg cm. In this report, the details of the mechanical
structure, electronics, on-board software, ground and in-flight calibration,
and in-flight performance of the WXM are discussed.Comment: 28 pages, 24 figure
Toward Open-Set Face Recognition
Much research has been conducted on both face identification and face
verification, with greater focus on the latter. Research on face identification
has mostly focused on using closed-set protocols, which assume that all probe
images used in evaluation contain identities of subjects that are enrolled in
the gallery. Real systems, however, where only a fraction of probe sample
identities are enrolled in the gallery, cannot make this closed-set assumption.
Instead, they must assume an open set of probe samples and be able to
reject/ignore those that correspond to unknown identities. In this paper, we
address the widespread misconception that thresholding verification-like scores
is a good way to solve the open-set face identification problem, by formulating
an open-set face identification protocol and evaluating different strategies
for assessing similarity. Our open-set identification protocol is based on the
canonical labeled faces in the wild (LFW) dataset. Additionally to the known
identities, we introduce the concepts of known unknowns (known, but
uninteresting persons) and unknown unknowns (people never seen before) to the
biometric community. We compare three algorithms for assessing similarity in a
deep feature space under an open-set protocol: thresholded verification-like
scores, linear discriminant analysis (LDA) scores, and an extreme value machine
(EVM) probabilities. Our findings suggest that thresholding EVM probabilities,
which are open-set by design, outperforms thresholding verification-like
scores.Comment: Accepted for Publication in CVPR 2017 Biometrics Worksho
Development and Validation of Credit-Scoring Models
Accurate credit-granting decisions are crucial to the efficiency of the decentralized capital allocation mechanisms in modern market economies. Credit bureaus and many .nancial institutions have developed and used credit-scoring models to standardize and automate, to the extent possible, credit decisions. We build credit scoring models for bankcard markets using the Office of the Comptroller of the Currency, Risk Analysis Division (OCC/RAD) consumer credit database (CCDB). This unusu- ally rich data set allows us to evaluate a number of methods in common practice. We introduce, estimate, and validate our models, using both out-of-sample contempora- neous and future validation data sets. Model performance is compared using both separation and accuracy measures. A vendor-developed generic bureau-based score is also included in the model performance comparisons. Our results indicate that current industry practices, when carefully applied, can produce models that robustly rank-order potential borrowers both at the time of development and through the near future. However, these same methodologies are likely to fail when the the objective is to accurately estimate future rates of delinquency or probabilities of default for individual or groups of borrowers.
The Eighth Data Release of the Sloan Digital Sky Survey: First Data from SDSS-III
The Sloan Digital Sky Survey (SDSS) started a new phase in August 2008, with
new instrumentation and new surveys focused on Galactic structure and chemical
evolution, measurements of the baryon oscillation feature in the clustering of
galaxies and the quasar Ly alpha forest, and a radial velocity search for
planets around ~8000 stars. This paper describes the first data release of
SDSS-III (and the eighth counting from the beginning of the SDSS). The release
includes five-band imaging of roughly 5200 deg^2 in the Southern Galactic Cap,
bringing the total footprint of the SDSS imaging to 14,555 deg^2, or over a
third of the Celestial Sphere. All the imaging data have been reprocessed with
an improved sky-subtraction algorithm and a final, self-consistent photometric
recalibration and flat-field determination. This release also includes all data
from the second phase of the Sloan Extension for Galactic Understanding and
Evolution (SEGUE-2), consisting of spectroscopy of approximately 118,000 stars
at both high and low Galactic latitudes. All the more than half a million
stellar spectra obtained with the SDSS spectrograph have been reprocessed
through an improved stellar parameters pipeline, which has better determination
of metallicity for high metallicity stars.Comment: Astrophysical Journal Supplements, in press (minor updates from
submitted version
Financial Integration and International Risk Sharing
In the last two decades, financial integration has increased dramatically across the world. At the same time, the fraction of countries in default has more than doubled. Contrary to theory,
however, there appears to have been no substantial improvement in the degree of international risk sharing. To account for this puzzle, we construct a general equilibrium model that features a continuum of countries and default choices on state-uncontingent bonds. We model increased
financial integration as a decrease in the cost of borrowing.
Our main finding is that as the cost of borrowing is lowered, financial integration and sovereign default increases substantially, but the degree of risk sharing as measured by cross section and panel regressions increases hardly at all. The explanation, we propose, is that international risk sharing is not sensitive to the increase in financial integration given the current magnitude of capital flows because countries can insure themselves through accumulation of domestic assets.
To get better risk sharing, capital flows among countries need to be extremely large. In addition, although the ability to default on loans provides state contingency, it restricts international risk
sharing in two ways: higher borrowing rates and future exclusion from international credit marketsFinanical Integration, Risk Sharing, Globalization, Sovereign Debt
- …