292 research outputs found

    Tools of Trade of the Next Blue-Collar Job? Antecedents, Design Features, and Outcomes of Interactive Labeling Systems

    Get PDF
    Supervised machine learning is becoming increasingly popular - and so is the need for annotated training data. Such data often needs to be manually labeled by human workers, not unlikely to negatively impact the involved workforce. To alleviate this issue, a new information systems class has emerged - interactive labeling systems. However, this young, but rapidly growing field lacks guidance and structure regarding the design of such systems. Against this backdrop, this paper describes antecedents, design features, and outcomes of interactive labeling systems. We perform a systematic literature review, identifying 188 relevant articles. Our results are presented as a morphological box with 14 dimensions, which we evaluate using card sorting. By additionally offering this box as a web-based artifact, we provide actionable guidance for interactive labeling system development for scholars and practitioners. Lastly, we discuss imbalances in the article distribution of our morphological box and suggest future work directions

    ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems

    Full text link
    Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The learning machine leverages big data to find examples that maximize the training value of its interaction with the teacher. When the teacher is restricted to labeling examples selected by the machine, this problem is an instance of active learning. When the teacher can provide additional information to the machine (e.g., suggestions on what examples or predictive features should be used) as the learning task progresses, then the problem becomes one of interactive learning. To accommodate the two-way communication channel needed for efficient interactive learning, the teacher and the machine need an environment that supports an interaction language. The machine can access, process, and summarize more examples than the teacher can see in a lifetime. Based on the machine's output, the teacher can revise the definition of the task or make it more precise. Both the teacher and the machine continuously learn and benefit from the interaction. We have built a platform to (1) produce valuable and deployable models and (2) support research on both the machine learning and user interface challenges of the interactive learning problem. The platform relies on a dedicated, low-latency, distributed, in-memory architecture that allows us to construct web-scale learning machines with quick interaction speed. The purpose of this paper is to describe this architecture and demonstrate how it supports our research efforts. Preliminary results are presented as illustrations of the architecture but are not the primary focus of the paper

    A face annotation framework with partial clustering and interactive labeling

    Get PDF
    Face annotation technology is important for a photo management system. In this paper, we propose a novel interactive face annotation framework combining unsupervised and interactive learning. There are two main contributions in our framework. In the unsupervised stage, a partial clustering algorithm is proposed to find the most evident clusters instead of grouping all instances into clusters, which leads to a good initial labeling for later user interaction. In the interactive stage, an efficient labeling procedure based on minimization of both global system uncertainty and estimated number of user operations is proposed to reduce user interaction as much as possible. Experimental results show that the proposed annotation framework can significantly reduce the face annotation workload and is superior to existing solutions in the literature. 1

    VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]

    Full text link
    We introduce VOCALExplore, a system designed to support users in building domain-specific models over video datasets. VOCALExplore supports interactive labeling sessions and trains models using user-supplied labels. VOCALExplore maximizes model quality by automatically deciding how to select samples based on observed skew in the collected labels. It also selects the optimal video representations to use when training models by casting feature selection as a rising bandit problem. Finally, VOCALExplore implements optimizations to achieve low latency without sacrificing model performance. We demonstrate that VOCALExplore achieves close to the best possible model quality given candidate acquisition functions and feature extractors, and it does so with low visible latency (~1 second per iteration) and no expensive preprocessing

    Visual Interactive Labeling of Large Multimedia News Corpora

    Get PDF
    The semantic annotation of large multimedia corpora is essential for numerous tasks. Be it for the training of classification algorithms, efficient content retrieval, or for analytical reasoning, appropriate labels are often the first necessity before automatic processing becomes efficient. However, manual labeling of large datasets is time-consuming and tedious. Hence, we present a new visual approach for labeling and retrieval of reports in multimedia news corpora. It combines automatic classifier training based on caption text from news reports with human interpretation to ease the annotation process. In our approach, users can initialize labels with keyword queries and iteratively annotate examples to train a classifier. The proposed visualization displays representative results in an overview that allows to follow different annotation strategies (e.g., active learning) and assess the quality of the classifier. Based on a usage scenario, we demonstrate the successful application of our approach. Therein, users label several topics which interest them and retrieve related documents with high confidence from three years of news reports
    • …
    corecore