Search CORE

53,683 research outputs found

Doubly Optimized Calibrated Support Vector Machine (DOC-SVM): an algorithm for joint optimization of discrimination and calibration.

Author: Jiang Xiaoqian
Kim Jihoon
Menon Aditya
Ohno-Machado Lucila
Wang Shuang
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Historically, probabilistic models for decision support have focused on discrimination, e.g., minimizing the ranking error of predicted outcomes. Unfortunately, these models ignore another important aspect, calibration, which indicates the magnitude of correctness of model predictions. Using discrimination and calibration simultaneously can be helpful for many clinical decisions. We investigated tradeoffs between these goals, and developed a unified maximum-margin method to handle them jointly. Our approach called, Doubly Optimized Calibrated Support Vector Machine (DOC-SVM), concurrently optimizes two loss functions: the ridge regression loss and the hinge loss. Experiments using three breast cancer gene-expression datasets (i.e., GSE2034, GSE2990, and Chanrion's datasets) showed that our model generated more calibrated outputs when compared to other state-of-the-art models like Support Vector Machine (p=0.03, p=0.13, and p<0.001) and Logistic Regression (p=0.006, p=0.008, and p<0.001). DOC-SVM also demonstrated better discrimination (i.e., higher AUCs) when compared to Support Vector Machine (p=0.38, p=0.29, and p=0.047) and Logistic Regression (p=0.38, p=0.04, and p<0.0001). DOC-SVM produced a model that was better calibrated without sacrificing discrimination, and hence may be helpful in clinical decision making

CiteSeerX

Directory of Open Access Journals

eScholarship - University of California

Recommended from our members

Learning salience amoung [sic] features through contingency in the CEL framework

Author: Granger R. H., Jr.
Schlimmer Jeffrey C.
Publication venue: eScholarship, University of California
Publication date: 01/01/1985
Field of study

Determining which features in an environment are salient given a task, salience assignment, is a central problem in Machine Learning. A related phenomenon, contingency (the conditions under which relative salience among environmental features is acquired), is central to learning and memory in animal psychology. This paper presents an analysis of a set of empirical data on contingency and an algorithm for the salience assignment problem. The algorithm presented is implemented in a working computer program which interacts with a simulated environment to produce contingent associative learning corresponding to relevant behavioral data. The model also makes specific empirical predictions that can be experimentally tested

eScholarship - University of California

Multivariate Bayesian semiparametric models for authentication of food and beverages

Author: Gutiérrez Luis
Quintana Fernando A.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 27/02/2012
Field of study

Food and beverage authentication is the process by which foods or beverages are verified as complying with its label description, for example, verifying if the denomination of origin of an olive oil bottle is correct or if the variety of a certain bottle of wine matches its label description. The common way to deal with an authentication process is to measure a number of attributes on samples of food and then use these as input for a classification problem. Our motivation stems from data consisting of measurements of nine chemical compounds denominated Anthocyanins, obtained from samples of Chilean red wines of grape varieties Cabernet Sauvignon, Merlot and Carm\'{e}n\`{e}re. We consider a model-based approach to authentication through a semiparametric multivariate hierarchical linear mixed model for the mean responses, and covariance matrices that are specific to the classification categories. Specifically, we propose a model of the ANOVA-DDP type, which takes advantage of the fact that the available covariates are discrete in nature. The results suggest that the model performs well compared to other parametric alternatives. This is also corroborated by application to simulated data.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS492 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref