417,710 research outputs found

    Solving Multiclass Learning Problems via Error-Correcting Output Codes

    Full text link
    Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k &gt 2 values (i.e., k ``classes''). The definition is acquired by studying collections of training examples of the form [x_i, f (x_i)]. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of overfitting avoidance techniques such as decision-tree pruning. Finally, we show that---like the other methods---the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.Comment: See http://www.jair.org/ for any accompanying file

    Weblogs in Higher Education - Why Do Students (Not) Blog?

    Get PDF
    Positive impacts on learning through blogging, such as active knowledge construction and reflective writing, have been reported. However, not many students use weblogs in informal contexts, even when appropriate facilities are offered by their universities. While motivations for blogging have been subject to empirical studies, little research has addressed the issue of why students choose not to blog. This paper presents an empirical study undertaken to gain insights into the decision making process of students when deciding whether to keep a blog or not. A better understanding of students' motivations for (not) blogging may help decision makers at universities in the process of selecting, introducing, and maintaining similar services. As informal learning gains increased recognition, results of this study can help to advance appropriate designs of informal learning contexts in Higher Education. The method of ethnographic decision tree modelling was applied in an empirical study conducted at the Vienna University of Technology, Austria. Since 2004, the university has been offering free weblog accounts for all students and staff members upon entering school, not bound to any course or exam. Qualitative, open interviews were held with 3 active bloggers, 3 former bloggers, and 3 non‑ bloggers to elicit their decision criteria. Decision tree models were developed out of the interviews. It turned out that the modelling worked best when splitting the decision process into two parts: one model representing decisions on whether to start a weblog at all, and a second model representing criteria on whether to continue with a weblog once it was set up. The models were tested for their validity through questionnaires developed out of the decision tree models. 30 questionnaires have been distributed to bloggers, former bloggers and non‑ bloggers. Results show that the main reasons for students not to keep a weblog include a preference for direct (online) communication, and concerns about the loss of privacy through blogging. Furthermore, the results indicate that intrinsic motivation factors keep students blogging, whereas stopping a weblog is mostly attributable to external factors

    Secure Two-Party Protocol for Privacy-Preserving Classification via Differential Privacy

    Get PDF
    Privacy-preserving distributed data mining is the study of mining on distributed data—owned by multiple data owners—in a non-secure environment, where the mining protocol does not reveal any sensitive information to the data owners, the individual privacy is preserved, and the output mining model is practically useful. In this thesis, we propose a secure two-party protocol for building a privacy-preserving decision tree classifier over distributed data using differential privacy. We utilize secure multiparty computation to ensure that the protocol is privacy-preserving. Our algorithm also utilizes parallel and sequential compositions, and applies distributed exponential mechanism to ensure that the output is differentially-private. We implemented our protocol in a distributed environment on real-life data, and the experimental results show that the protocol produces decision tree classifiers with high utility while being reasonably efficient and scalable
    • …
    corecore