21,081 research outputs found

    On the Sample Complexity of Adversarial Multi-Source PAC Learning

    Get PDF
    We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is known that in the single-source case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAC-learnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.Comment: International Conference on Machine Learning (ICML) 2020: Camera-ready. Strengthened the definition of adversarial PAC-learnability, added explicit bounds on sample complexit

    On the sample complexity of adversarial multi-source PAC learning

    Get PDF
    We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is known that in the single-source case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAC-learnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious

    Providing behaviour awareness in collaborative project courses

    Get PDF
    Several studies show that awareness mechanisms can contribute to enhance the collaboration process among students and the learning experiences during collaborative project courses. However, it is not clear what awareness information should be provided to whom, when it should be provided, and how to obtain and represent such information in an accurate and understandable way. Regardless the research efforts done in this area, the problem remains open. By recognizing the diversity of work scenarios (contexts) where the collaboration may occur, this research proposes a behaviour awareness mechanism to support collaborative work in undergraduate project courses. Based on the authors previous experiences and the literature in the area, the proposed mechanism considers personal and social awareness components, which represent metrics in a visual way, helping students realize their performance, and lecturers intervene when needed. The trustworthiness of the mechanisms for determining the metrics was verified using empirical data, and the usability and usefulness of these metrics were evaluated with undergraduate students. Experimental results show that this awareness mechanism is useful, understandable and representative of the observed scenarios.Peer ReviewedPostprint (published version

    Providing behaviour awareness in collaborative project courses

    Get PDF
    Several studies show that awareness mechanisms can contribute to enhance the collaboration process among students and the learning experiences during collaborative project courses. However, it is not clear what awareness information should be provided to whom, when it should be provided, and how to obtain and represent such information in an accurate and understandable way. Regardless the research efforts done in this area, the problem remains open. By recognizing the diversity of work scenarios (contexts) where the collaboration may occur, this research proposes a behaviour awareness mechanism to support collaborative work in undergraduate project courses. Based on the authors previous experiences and the literature in the area, the proposed mechanism considers personal and social awareness components, which represent metrics in a visual way, helping students realize their performance, and lecturers intervene when needed. The trustworthiness of the mechanisms for determining the metrics was verified using empirical data, and the usability and usefulness of these metrics were evaluated with undergraduate students. Experimental results show that this awareness mechanism is useful, understandable and representative of the observed scenarios.Peer ReviewedPostprint (published version

    A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

    Full text link
    We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering a more accurate way to deal with finite sample issues. We derive a bound minimization algorithm and show that it provides good results in real-life problems and that the derived PAC-Bayesian bound is reasonably tight

    Collaborative Development of Open Educational Resources for Open and Distance Learning

    Get PDF
    Open and distance learning (ODL) is mostly characterised by the up front development of self study educational resources that have to be paid for over time through use with larger student cohorts (typically in the hundreds per annum) than for conventional face to face classes. This different level of up front investment in educational resources, and increasing pressures to utilise more expensive formats such as rich media, means that collaborative development is necessary to firstly make use of diverse professional skills and secondly to defray these costs across institutions. The Open University (OU) has over 40 years of experience of using multi professional course teams to develop courses; of working with a wide range of other institutions to develop educational resources; and of licensing use of its educational resources to other HEIs. Many of these arrangements require formal contracts to work properly and clearly identify IPR and partner responsibilities. With the emergence of open educational resources (OER) through the use of open licences, the OU and other institutions has now been able to experiment with new ways of collaborating on the development of educational resources that are not so dependent on tight legal contracts because each partner is effectively granting rights to the others to use the educational resources they supply through the open licensing (Lane, 2011; Van Dorp and Lane, 2011). This set of case studies examines the many different collaborative models used for developing and using educational resources and explain how open licensing is making it easier to share the effort involved in developing educational resources between institutions as well as how it may enable new institutions to be able to start up open and distance learning programmes more easily and at less initial cost. Thus it looks at three initiatives involving people from the OU (namely TESSA, LECH-e, openED2.0) and contrasts these with the Peer-2-Peer University and the OER University as exemplars of how OER may change some of the fundamental features of open and distance learning in a Web 2.0 world. It concludes that while there may be multiple reasons and models for collaborating on the development of educational resources the very openness provided by the open licensing aligns both with general academic values and practice but also with well established principles of open innovation in businesses
    • …
    corecore