10,323 research outputs found

    Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models

    Full text link
    Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called \textit{ensemble of near isotonic regression} (ENIR). The method can be considered as an extension of BBQ, a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression. ENIR is designed to address the key limitation of isotonic regression which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be combined with many existing classification models. We demonstrate the performance of ENIR on synthetic and real datasets for the commonly used binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(NlogN)O(N \log N) time, where NN is the number of samples

    Detection of radioactive material entering national ports: A Bayesian approach to radiation portal data

    Full text link
    Given the potential for illicit nuclear material being used for terrorism, most ports now inspect a large number of goods entering national borders for radioactive cargo. The U.S. Department of Homeland Security is moving toward one hundred percent inspection of all containers entering the U.S. at various ports of entry for nuclear material. We propose a Bayesian classification approach for the real-time data collected by the inline Polyvinyl Toluene radiation portal monitors. We study the computational and asymptotic properties of the proposed method and demonstrate its efficacy in simulations. Given data available to the authorities, it should be feasible to implement this approach in practice.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS334 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore