2,564 research outputs found

    Training Set Debugging Using Trusted Items

    Full text link
    Training set bugs are flaws in the data that adversely affect machine learning. The training set is usually too large for man- ual inspection, but one may have the resources to verify a few trusted items. The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus im- proves learning. Specifically, our approach seeks the smallest set of changes to the training set labels such that the model learned from this corrected training set predicts labels of the trusted items correctly. We flag the items whose labels are changed as potential bugs, whose labels can be checked for veracity by human experts. To find the bugs in this way is a challenging combinatorial bilevel optimization problem, but it can be relaxed into a continuous optimization problem. Ex- periments on toy and real data demonstrate that our approach can identify training set bugs effectively and suggest appro- priate changes to the labels. Our algorithm is a step toward trustworthy machine learning.Comment: AAAI 201

    Open-TEE - An Open Virtual Trusted Execution Environment

    Full text link
    Hardware-based Trusted Execution Environments (TEEs) are widely deployed in mobile devices. Yet their use has been limited primarily to applications developed by the device vendors. Recent standardization of TEE interfaces by GlobalPlatform (GP) promises to partially address this problem by enabling GP-compliant trusted applications to run on TEEs from different vendors. Nevertheless ordinary developers wishing to develop trusted applications face significant challenges. Access to hardware TEE interfaces are difficult to obtain without support from vendors. Tools and software needed to develop and debug trusted applications may be expensive or non-existent. In this paper, we describe Open-TEE, a virtual, hardware-independent TEE implemented in software. Open-TEE conforms to GP specifications. It allows developers to develop and debug trusted applications with the same tools they use for developing software in general. Once a trusted application is fully debugged, it can be compiled for any actual hardware TEE. Through performance measurements and a user study we demonstrate that Open-TEE is efficient and easy to use. We have made Open- TEE freely available as open source.Comment: Author's version of article to appear in 14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2015, Helsinki, Finland, August 20-22, 201

    Interactive correction of mislabeled training data

    Get PDF
    In this paper, we develop a visual analysis method for interactively improving the quality of labeled data, which is essential to the success of supervised and semi-supervised learning. The quality improvement is achieved through the use of user-selected trusted items. We employ a bi-level optimization model to accurately match the labels of the trusted items and to minimize the training loss. Based on this model, a scalable data correction algorithm is developed to handle tens of thousands of labeled data efficiently. The selection of the trusted items is facilitated by an incremental tSNE with improved computational efficiency and layout stability to ensure a smooth transition between different levels. We evaluated our method on real-world datasets through quantitative evaluation and case studies, and the results were generally favorable

    Interactive correction of mislabeled training data

    Get PDF
    In this paper, we develop a visual analysis method for interactively improving the quality of labeled data, which is essential to the success of supervised and semi-supervised learning. The quality improvement is achieved through the use of user-selected trusted items. We employ a bi-level optimization model to accurately match the labels of the trusted items and to minimize the training loss. Based on this model, a scalable data correction algorithm is developed to handle tens of thousands of labeled data efficiently. The selection of the trusted items is facilitated by an incremental tSNE with improved computational efficiency and layout stability to ensure a smooth transition between different levels. We evaluated our method on real-world datasets through quantitative evaluation and case studies, and the results were generally favorable

    Manipulating Predictions over Discrete Inputs in Machine Teaching

    Full text link
    Machine teaching often involves the creation of an optimal (typically minimal) dataset to help a model (referred to as the `student') achieve specific goals given by a teacher. While abundant in the continuous domain, the studies on the effectiveness of machine teaching in the discrete domain are relatively limited. This paper focuses on machine teaching in the discrete domain, specifically on manipulating student models' predictions based on the goals of teachers via changing the training data efficiently. We formulate this task as a combinatorial optimization problem and solve it by proposing an iterative searching algorithm. Our algorithm demonstrates significant numerical merit in the scenarios where a teacher attempts at correcting erroneous predictions to improve the student's models, or maliciously manipulating the model to misclassify some specific samples to the target class aligned with his personal profits. Experimental results show that our proposed algorithm can have superior performance in effectively and efficiently manipulating the predictions of the model, surpassing conventional baselines.Comment: 8 pages, 2 figure

    Requirements for Provenance on the Web

    Get PDF
    From where did this tweet originate? Was this quote from the New York Times modified? Daily, we rely on data from the Web but often it is difficult or impossible to determine where it came from or how it was produced. This lack of provenance is particularly evident when people and systems deal with Web information or with any environment where information comes from sources of varying quality. Provenance is not captured pervasively in information systems. There are major technical, social, and economic impediments that stand in the way of using provenance effectively. This paper synthesizes requirements for provenance on the Web for a number of dimensions focusing on three key aspects of provenance: the content of provenance, the management of provenance records, and the uses of provenance information. To illustrate these requirements, we use three synthesized scenarios that encompass provenance problems faced by Web users toda
    • …
    corecore