21,092 research outputs found
On the Sample Complexity of Adversarial Multi-Source PAC Learning
We study the problem of learning from multiple untrusted data sources, a
scenario of increasing practical relevance given the recent emergence of
crowdsourcing and collaborative learning paradigms. Specifically, we analyze
the situation in which a learning system obtains datasets from multiple
sources, some of which might be biased or even adversarially perturbed. It is
known that in the single-source case, an adversary with the power to corrupt a
fixed fraction of the training data can prevent PAC-learnability, that is, even
in the limit of infinitely much training data, no learning system can approach
the optimal test error. In this work we show that, surprisingly, the same is
not true in the multi-source setting, where the adversary can arbitrarily
corrupt a fixed fraction of the data sources. Our main results are a
generalization bound that provides finite-sample guarantees for this learning
setting, as well as corresponding lower bounds. Besides establishing
PAC-learnability our results also show that in a cooperative learning setting
sharing data with other parties has provable benefits, even if some
participants are malicious.Comment: International Conference on Machine Learning (ICML) 2020:
Camera-ready. Strengthened the definition of adversarial PAC-learnability,
added explicit bounds on sample complexit
On the sample complexity of adversarial multi-source PAC learning
We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is
known that in the single-source case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily
corrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAC-learnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some
participants are malicious
Providing behaviour awareness in collaborative project courses
Several studies show that awareness mechanisms can contribute to enhance the collaboration process among students and the learning experiences during collaborative project courses. However, it is not clear what awareness information should be provided to whom, when it should be provided, and how to obtain and represent such information in an accurate and understandable way. Regardless the research efforts done in this area, the problem remains open. By recognizing the diversity of work scenarios (contexts) where the collaboration may occur, this research proposes a behaviour awareness mechanism to support collaborative work in undergraduate project courses. Based on the authors previous experiences and the literature in the area, the proposed mechanism considers personal and social awareness components, which represent metrics in a visual way, helping students realize their performance, and lecturers intervene when needed. The trustworthiness of the mechanisms for determining the metrics was verified using empirical data, and the usability and usefulness of these metrics were evaluated with undergraduate students. Experimental results show that this awareness mechanism is useful, understandable and representative of the observed scenarios.Peer ReviewedPostprint (published version
Providing behaviour awareness in collaborative project courses
Several studies show that awareness mechanisms can contribute to enhance the collaboration process among students and the learning experiences during collaborative project courses. However, it is not clear what awareness information should be provided to whom, when it should be provided, and how to obtain and represent such information in an accurate and understandable way. Regardless the research efforts done in this area, the problem remains open. By recognizing the diversity of work scenarios (contexts) where the collaboration may occur, this research proposes a behaviour awareness mechanism to support collaborative work in undergraduate project courses. Based on the authors previous experiences and the literature in the area, the proposed mechanism considers personal and social awareness components, which represent metrics in a visual way, helping students realize their performance, and lecturers intervene when needed. The trustworthiness of the mechanisms for determining the metrics was verified using empirical data, and the usability and usefulness of these metrics were evaluated with undergraduate students. Experimental results show that this awareness mechanism is useful, understandable and representative of the observed scenarios.Peer ReviewedPostprint (published version
A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering
We formulate weighted graph clustering as a prediction problem: given a
subset of edge weights we analyze the ability of graph clustering to predict
the remaining edge weights. This formulation enables practical and theoretical
comparison of different approaches to graph clustering as well as comparison of
graph clustering with other possible ways to model the graph. We adapt the
PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009)
to derive a PAC-Bayesian generalization bound for graph clustering. The bound
shows that graph clustering should optimize a trade-off between empirical data
fit and the mutual information that clusters preserve on the graph nodes. A
similar trade-off derived from information-theoretic considerations was already
shown to produce state-of-the-art results in practice (Slonim et al., 2005;
Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by
providing a better theoretical foundation, suggesting formal generalization
guarantees, and offering a more accurate way to deal with finite sample issues.
We derive a bound minimization algorithm and show that it provides good results
in real-life problems and that the derived PAC-Bayesian bound is reasonably
tight
Collaborative Development of Open Educational Resources for Open and Distance Learning
Open and distance learning (ODL) is mostly characterised by the up front development of self study educational resources that have to be paid for over time through use with larger student cohorts (typically in the hundreds per annum) than for conventional face to face classes. This different level of up front investment in educational resources, and increasing pressures to utilise more expensive formats such as rich media, means that collaborative development is necessary to firstly make use of diverse professional skills and secondly to defray these costs across institutions. The Open University (OU) has over 40 years of experience of using multi professional course teams to develop courses; of working with a wide range of other institutions to develop educational resources; and of licensing use of its educational resources to other HEIs. Many of these arrangements require formal contracts to work properly and clearly identify IPR and partner responsibilities. With the emergence of open educational resources (OER) through the use of open licences, the OU and other institutions has now been able to experiment with new ways of collaborating on the development of educational resources that are not so dependent on tight legal contracts because each partner is effectively granting rights to the others to use the educational resources they supply through the open licensing (Lane, 2011; Van Dorp and Lane, 2011). This set of case studies examines the many different collaborative models used for developing and using educational resources and explain how open licensing is making it easier to share the effort involved in developing educational resources between institutions as well as how it may enable new institutions to be able to start up open and distance learning programmes more easily and at less initial cost. Thus it looks at three initiatives involving people from the OU (namely TESSA, LECH-e, openED2.0) and contrasts these with the Peer-2-Peer University and the OER University as exemplars of how OER may change some of the fundamental features of open and distance learning in a Web 2.0 world. It concludes that while there may be multiple reasons and models for collaborating on the development of educational resources the very openness provided by the open licensing aligns both with general academic values and practice but also with well established principles of open innovation in businesses
- …