We study the supervised learning paradigm called Learning Using Privileged
Information, first suggested by Vapnik and Vashist (2009). In this paradigm, in
addition to the examples and labels, additional (privileged) information is
provided only for training examples. The goal is to use this information to
improve the classification accuracy of the resulting classifier, where this
classifier can only use the non-privileged information of new example instances
to predict their label. We study the theory of privileged learning with the
zero-one loss under the natural Privileged ERM algorithm proposed in Pechyony
and Vapnik (2010a). We provide a counter example to a claim made in that work
regarding the VC dimension of the loss class induced by this problem; We
conclude that the claim is incorrect. We then provide a correct VC dimension
analysis which gives both lower and upper bounds on the capacity of the
Privileged ERM loss class. We further show, via a generalization analysis, that
worst-case guarantees for Privileged ERM cannot improve over standard
non-privileged ERM, unless the capacity of the privileged information is
similar or smaller to that of the non-privileged information. This result
points to an important limitation of the Privileged ERM approach. In our
closing discussion, we suggest another way in which Privileged ERM might still
be helpful, even when the capacity of the privileged information is large.Comment: AISTATS 202