2 research outputs found

    The Teaching Dimension of Linear Learners

    Full text link
    Teaching dimension is a learning theoretic quantity that specifies the minimum training set size to teach a target model to a learner. Previous studies on teaching dimension focused on version-space learners which maintain all hypotheses consistent with the training data, and cannot be applied to modern machine learners which select a specific hypothesis via optimization. This paper presents the first known teaching dimension for ridge regression, support vector machines, and logistic regression. We also exhibit optimal training sets that match these teaching dimensions. Our approach generalizes to other linear learners

    Teaching and compressing for low VC-dimension

    Full text link
    In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension. Let CC be a binary concept class of size mm and VC-dimension dd. Prior to this work, the best known upper bounds for both parameters were log(m)\log(m), while the best lower bounds are linear in dd. We present significantly better upper bounds on both as follows. Set k=O(d2dloglogC)k = O(d 2^d \log \log |C|). We show that there always exists a concept cc in CC with a teaching set (i.e. a list of cc-labeled examples uniquely identifying cc in CC) of size kk. This problem was studied by Kuhlmann (1999). Our construction implies that the recursive teaching (RT) dimension of CC is at most kk as well. The RT-dimension was suggested by Zilles et al. and Doliwa et al. (2010). The same notion (under the name partial-ID width) was independently studied by Wigderson and Yehudayoff (2013). An upper bound on this parameter that depends only on dd is known just for the very simple case d=1d=1, and is open even for d=2d=2. We also make small progress towards this seemingly modest goal. We further construct sample compression schemes of size kk for CC, with additional information of klog(k)k \log(k) bits. Roughly speaking, given any list of CC-labelled examples of arbitrary length, we can retain only kk labeled examples in a way that allows to recover the labels of all others examples in the list, using additional klog(k)k\log (k) information bits. This problem was first suggested by Littlestone and Warmuth (1986).Comment: The final version is due to be published in the collection of papers "A Journey through Discrete Mathematics. A Tribute to Jiri Matousek" edited by Martin Loebl, Jaroslav Nesetril and Robin Thomas, due to be published by Springe
    corecore