2,564 research outputs found
Training Set Debugging Using Trusted Items
Training set bugs are flaws in the data that adversely affect machine
learning. The training set is usually too large for man- ual inspection, but
one may have the resources to verify a few trusted items. The set of trusted
items may not by itself be adequate for learning, so we propose an algorithm
that uses these items to identify bugs in the training set and thus im- proves
learning. Specifically, our approach seeks the smallest set of changes to the
training set labels such that the model learned from this corrected training
set predicts labels of the trusted items correctly. We flag the items whose
labels are changed as potential bugs, whose labels can be checked for veracity
by human experts. To find the bugs in this way is a challenging combinatorial
bilevel optimization problem, but it can be relaxed into a continuous
optimization problem. Ex- periments on toy and real data demonstrate that our
approach can identify training set bugs effectively and suggest appro- priate
changes to the labels. Our algorithm is a step toward trustworthy machine
learning.Comment: AAAI 201
Open-TEE - An Open Virtual Trusted Execution Environment
Hardware-based Trusted Execution Environments (TEEs) are widely deployed in
mobile devices. Yet their use has been limited primarily to applications
developed by the device vendors. Recent standardization of TEE interfaces by
GlobalPlatform (GP) promises to partially address this problem by enabling
GP-compliant trusted applications to run on TEEs from different vendors.
Nevertheless ordinary developers wishing to develop trusted applications face
significant challenges. Access to hardware TEE interfaces are difficult to
obtain without support from vendors. Tools and software needed to develop and
debug trusted applications may be expensive or non-existent.
In this paper, we describe Open-TEE, a virtual, hardware-independent TEE
implemented in software. Open-TEE conforms to GP specifications. It allows
developers to develop and debug trusted applications with the same tools they
use for developing software in general. Once a trusted application is fully
debugged, it can be compiled for any actual hardware TEE. Through performance
measurements and a user study we demonstrate that Open-TEE is efficient and
easy to use. We have made Open- TEE freely available as open source.Comment: Author's version of article to appear in 14th IEEE International
Conference on Trust, Security and Privacy in Computing and Communications,
TrustCom 2015, Helsinki, Finland, August 20-22, 201
Interactive correction of mislabeled training data
In this paper, we develop a visual analysis method for interactively improving the quality of labeled data, which is essential to the success of supervised and semi-supervised learning. The quality improvement is achieved through the use of user-selected trusted items. We employ a bi-level optimization model to accurately match the labels of the trusted items and to minimize the training loss. Based on this model, a scalable data correction algorithm is developed to handle tens of thousands of labeled data efficiently. The selection of the trusted items is facilitated by an incremental tSNE with improved computational efficiency and layout stability to ensure a smooth transition between different levels. We evaluated our method on real-world datasets through quantitative evaluation and case studies, and the results were generally favorable
Interactive correction of mislabeled training data
In this paper, we develop a visual analysis method for interactively improving the quality of labeled data, which is essential to the success of supervised and semi-supervised learning. The quality improvement is achieved through the use of user-selected trusted items. We employ a bi-level optimization model to accurately match the labels of the trusted items and to minimize the training loss. Based on this model, a scalable data correction algorithm is developed to handle tens of thousands of labeled data efficiently. The selection of the trusted items is facilitated by an incremental tSNE with improved computational efficiency and layout stability to ensure a smooth transition between different levels. We evaluated our method on real-world datasets through quantitative evaluation and case studies, and the results were generally favorable
Manipulating Predictions over Discrete Inputs in Machine Teaching
Machine teaching often involves the creation of an optimal (typically
minimal) dataset to help a model (referred to as the `student') achieve
specific goals given by a teacher. While abundant in the continuous domain, the
studies on the effectiveness of machine teaching in the discrete domain are
relatively limited. This paper focuses on machine teaching in the discrete
domain, specifically on manipulating student models' predictions based on the
goals of teachers via changing the training data efficiently. We formulate this
task as a combinatorial optimization problem and solve it by proposing an
iterative searching algorithm. Our algorithm demonstrates significant numerical
merit in the scenarios where a teacher attempts at correcting erroneous
predictions to improve the student's models, or maliciously manipulating the
model to misclassify some specific samples to the target class aligned with his
personal profits. Experimental results show that our proposed algorithm can
have superior performance in effectively and efficiently manipulating the
predictions of the model, surpassing conventional baselines.Comment: 8 pages, 2 figure
Requirements for Provenance on the Web
From where did this tweet originate? Was this quote from the New York Times modified? Daily, we rely on data from the Web but often it is difficult or impossible to determine where it came from or how it was produced. This lack of provenance is particularly evident when people and systems deal with Web information or with any environment where information comes from sources of varying quality. Provenance is not captured pervasively in information systems. There are major technical, social, and economic impediments that stand in the way of using provenance effectively. This paper synthesizes requirements for provenance on the Web for a number of dimensions focusing on three key aspects of provenance: the content of provenance, the management of provenance records, and the uses of provenance information. To illustrate these requirements, we use three synthesized scenarios that encompass provenance problems faced by Web users toda
- …