1,733 research outputs found
An experimental investigation of calibration techniques for imbalanced data
Calibration is a technique used to obtain accurate probability estimation for classification problems in real applications. Class imbalance can create considerable challenges in obtaining accurate probabilities for calibration methods. However, previous research has paid little attention to this issue. In this paper, we present an experimental investigation of some prevailing calibration methods in different imbalance scenarios. Several performance metrics are considered to evaluate different aspects of calibration performance. The experimental results show that the performance of different calibration techniques depends on the metrics and the degree of the imbalance ratio. Isotonic Regression has better overall performance on imbalanced datasets than parametric and other complex non-parametric methods. However, it performs unstably in highly imbalanced scenarios. This study provides some insights into calibration methods on imbalanced datasets, and it can be a reference for the future development of calibration methods in class imbalance scenarios
Classifier Calibration: A survey on how to assess and improve predicted class probabilities
This paper provides both an introduction to and a detailed overview of the
principles and practice of classifier calibration. A well-calibrated classifier
correctly quantifies the level of uncertainty or confidence associated with its
instance-wise predictions. This is essential for critical applications, optimal
decision making, cost-sensitive classification, and for some types of context
change. Calibration research has a rich history which predates the birth of
machine learning as an academic field by decades. However, a recent increase in
the interest on calibration has led to new methods and the extension from
binary to the multiclass setting. The space of options and issues to consider
is large, and navigating it requires the right set of concepts and tools. We
provide both introductory material and up-to-date technical details of the main
concepts and methods, including proper scoring rules and other evaluation
metrics, visualisation approaches, a comprehensive account of post-hoc
calibration methods for binary and multiclass classification, and several
advanced topics
Bootstrap Confidence Regions for Optimal Operating Conditions in Response Surface Methodology
This article concerns the application of bootstrap methodology to construct a likelihood-based confidence region for operating conditions associated with the maximum of a response surface constrained to a specified region. Unlike classical methods based on the stationary point, proper interpretation of this confidence region does not depend on unknown model parameters. In addition, the methodology does not require the assumption of normally distributed errors. The approach is demonstrated for concave-down and saddle system cases in two dimensions. Simulation studies were performed to assess the coverage probability of these regions.
AMS 2000 subj Classification: 62F25, 62F40, 62F30, 62J05.
Key words: Stationary point; Kernel density estimator; Boundary kernel
On The Rise of Health Spending and Longevity
We use a calibrated stochastic life-cycle model of endogenous health spending, asset accumulation and retirement to investigate the causes behind the increase in health spending and life expectancy over the period 1965-2005. We estimate that technological change along with the increase in the generosity of health insurance may explain independently 53% of the rise in health spending (insurance 29% and technology 24%) while income less than 10%. By simultaneously occurring over this period, these changes may have lead to a "synergy" or interaction effect which helps explain an additional 37% increase in health spending. We estimate that technological change, taking the form of increased productivity at an annual rate of 1.8%, explains 59% of the rise in life expectancy at age 50 over this period while insurance and income explain less than 10%.demand for health, health spending, insurance, technological change, longevity
The SuperCOSMOS Sky Survey. Paper II: Image detection, parameterisation, classification and photometry
In this, the second in a series of three papers concerning the SuperCOSMOS
Sky Survey, we describe the methods for image detection, parameterisation,
classification and photometry. We demonstrate the internal and external
accuracy of our object parameters. Using examples from the first release of
data, the South Galactic Cap survey, we show that our image detection
completeness is close to 100% to within 1.5 mag of the nominal plate limits. We
show that for the Bj survey data, the image classification is externally > 99%
reliable to Bj = 19.5. Internally, the image classification is reliable at a
level of > 90% to Bj=21, R=19. The photometric accuracy of our data is
typically 0.3 mag with respect to external data for m > 15. Internally, the
relative photometric accuracy in restricted position and magnitude ranges can
be as accurate as 5% for well exposed stellar images. Colours (B-R or R-I) are
externally accurate to 0.07 mag at Bj = 16.5 rising to 0.16 mag at Bj = 20.Comment: 22 pages, 16 figures; accepted for publication in MNRA
Improving acoustic vehicle classification by information fusion
We present an information fusion approach for ground vehicle classification based on the emitted acoustic signal. Many acoustic factors can contribute to the classification accuracy of working ground vehicles. Classification relying on a single feature set may lose some useful information if its underlying sound production model is not comprehensive. To improve classification accuracy, we consider an information fusion diagram, in which various aspects of an acoustic signature are taken into account and emphasized separately by two different feature extraction methods. The first set of features aims to represent internal sound production, and a number of harmonic components are extracted to characterize the factors related to the vehicleâs resonance. The second set of features is extracted based on a computationally effective discriminatory analysis, and a group of key frequency components are selected by mutual information, accounting for the sound production from the vehicleâs exterior parts. In correspondence with this structure, we further put forward a modifiedBayesian fusion algorithm, which takes advantage of matching each specific feature set with its favored classifier. To assess the proposed approach, experiments are carried out based on a data set containing acoustic signals from different types of vehicles. Results indicate that the fusion approach can effectively increase classification accuracy compared to that achieved using each individual features set alone. The Bayesian-based decision level fusion is found fusion is found to be improved than a feature level fusion approac
- âŠ