Search CORE

26,648 research outputs found

Regret analysis for performance metrics in multi-label classification: the case of Hamming and subset zero-one loss

Author: B. Taskar
D. McAllester
D.J.C. MacKay
G. Tsoumakas
G. Tsoumakas
G. Tsoumakas
I.H. Witten
J.C. Platt
L. Breiman
M. Boutell
R. Caruana
R. Nelsen
W. Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Ghent University Academic Bibliography

Surrogate regret bounds for generalized classification performance metrics

Author: Dembczyński Krzysztof
Kotłowski Wojciech
Publication venue
Publication date: 07/10/2016
Field of study

We consider optimization of generalized performance metrics for binary classification by means of surrogate losses. We focus on a class of metrics, which are linear-fractional functions of the false positive and false negative rates (examples of which include

F_{\beta}

-measure, Jaccard similarity coefficient, AM measure, and many others). Our analysis concerns the following two-step procedure. First, a real-valued function

f

is learned by minimizing a surrogate loss for binary classification on the training sample. It is assumed that the surrogate loss is a strongly proper composite loss function (examples of which include logistic loss, squared-error loss, exponential loss, etc.). Then, given

f

, a threshold

\widehat{\theta}

is tuned on a separate validation sample, by direct optimization of the target performance metric. We show that the regret of the resulting classifier (obtained from thresholding

f

\widehat{\theta}

) measured with respect to the target metric is upperbounded by the regret of

f

measured with respect to the surrogate loss. We also extend our results to cover multilabel classification and provide regret bounds for micro- and macro-averaging measures. Our findings are further analyzed in a computational study on both synthetic and real data sets.Comment: 22 page

arXiv.org e-Print Archive

Springer - Publisher Connector

Sparse Learning over Infinite Subgraph Features

Author: Mamitsuka Hiroshi
Takigawa Ichigaku
Publication venue
Publication date: 20/03/2014
Field of study

We present a supervised-learning algorithm from graph data (a set of graphs) for arbitrary twice-differentiable loss functions and sparse linear models over all possible subgraph features. To date, it has been shown that under all possible subgraph features, several types of sparse learning, such as Adaboost, LPBoost, LARS/LASSO, and sparse PLS regression, can be performed. Particularly emphasis is placed on simultaneous learning of relevant features from an infinite set of candidates. We first generalize techniques used in all these preceding studies to derive an unifying bounding technique for arbitrary separable functions. We then carefully use this bounding to make block coordinate gradient descent feasible over infinite subgraph features, resulting in a fast converging algorithm that can solve a wider class of sparse learning problems over graph data. We also empirically study the differences from the existing approaches in convergence property, selected subgraph features, and search-space sizes. We further discuss several unnoticed issues in sparse learning over all possible subgraph features.Comment: 42 pages, 24 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

Sample Complexity Bounds on Differentially Private Learning via Communication Complexity

Author: Feldman Vitaly
Xiao David
Publication venue
Publication date: 13/09/2015
Field of study

In this work we analyze the sample complexity of classification by differentially private algorithms. Differential privacy is a strong and well-studied notion of privacy introduced by Dwork et al. (2006) that ensures that the output of an algorithm leaks little information about the data point provided by any of the participating individuals. Sample complexity of private PAC and agnostic learning was studied in a number of prior works starting with (Kasiviswanathan et al., 2008) but a number of basic questions still remain open, most notably whether learning with privacy requires more samples than learning without privacy. We show that the sample complexity of learning with (pure) differential privacy can be arbitrarily higher than the sample complexity of learning without the privacy constraint or the sample complexity of learning with approximate differential privacy. Our second contribution and the main tool is an equivalence between the sample complexity of (pure) differentially private learning of a concept class

C

(or

SCDP(C)

) and the randomized one-way communication complexity of the evaluation problem for concepts from

C

. Using this equivalence we prove the following bounds: 1.

SCDP(C) = \Omega(LDim(C))

, where

LDim(C)

is the Littlestone's (1987) dimension characterizing the number of mistakes in the online-mistake-bound learning model. Known bounds on

LDim(C)

then imply that

SCDP(C)

can be much higher than the VC-dimension of

C

. 2. For any

t

, there exists a class

C

such that

LDim(C)=2

but

SCDP(C) \geq t

. 3. For any

t

, there exists a class

C

such that the sample complexity of (pure)

\alpha

-differentially private PAC learning is

\Omega(t/\alpha)

but the sample complexity of the relaxed

(\alpha,\beta)

-differentially private PAC learning is

O(\log(1/\beta)/\alpha)

. This resolves an open problem of Beimel et al. (2013b).Comment: Extended abstract appears in Conference on Learning Theory (COLT) 201

arXiv.org e-Print Archive

CiteSeerX