115 research outputs found

    Learning to Predict the Wisdom of Crowds

    Full text link
    The problem of "approximating the crowd" is that of estimating the crowd's majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, "CrowdSense," that works in an online fashion to dynamically sample subsets of labelers based on an exploration/exploitation criterion. The algorithm produces a weighted combination of a subset of the labelers' votes that approximates the crowd's opinion.Comment: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991

    Wisely Using a Budget for Crowdsourcing

    Get PDF
    The problem of “approximating the crowd” is that of estimating the crowd’s majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, “CrowdSense,” that works in an online fashion where examples come one at a time. Crowd-Sense dynamically samples subsets of labelers based on an exploration/exploitation criterion. The algorithm produces a weighted combination of a subset of the labelers’ votes that approximates the crowd’s opinion. We then introduce two variations of CrowdSense that make various distributional assumptions to handle distinct crowd characteristics. In particular, the first algorithm makes a statistical independence assumption of the probabilities for large crowds, whereas the second algorithm finds a lower bound on how often the current sub-crowd agrees with the crowd majority vote. Our experiments on CrowdSense and several baselines demonstrate that we can reliably approximate the entire crowd’s vote by collecting opinions from a representative subset of the crowd

    Two frameworks for integrating knowledge in induction

    Get PDF
    The use of knowledge in inductive learning is critical for improving the quality of the concept definitions generated, reducing the number of examples required in order to learn effective concept definitions, and reducing the computation needed to find good concept definitions. Relevant knowledge may come in many forms (such as examples, descriptions, advice, and constraints) and from many sources (such as books, teachers, databases, and scientific instruments). How to extract the relevant knowledge from this plethora of possibilities, and then to integrate it together so as to appropriately affect the induction process is perhaps the key issue at this point in inductive learning. Here the focus is on the integration part of this problem; that is, how induction algorithms can, and do, utilize a range of extracted knowledge. Preliminary work on a transformational framework for defining knowledge-intensive inductive algorithms out of relatively knowledge-free algorithms is described, as is a more tentative problems-space framework that attempts to cover all induction algorithms within a single general approach. These frameworks help to organize what is known about current knowledge-intensive induction algorithms, and to point towards new algorithms

    Badges

    No full text

    The Computational Complexity of the Candidate-Elimination Algorithm

    No full text
    Mitchell's original work on version spaces (Mitchell, 1982) presented an analysis of the computational complexity of version spaces. However, this analysis proved somewhat coarse, as it was parameterized by s and g, the maximum sizes that the S and G sets reach during learning. As has been pointed out by Haussler (1988) , g can be exponential in the number of examples processed. This paper presents a more fine-grained analysis of the computational complexity of version spaces, demonstrates its equivalence to Mitchell's analysis, and instantiates it for two commonly used conjunctive concept description languages. 1 Introduction The problem of inductive concept learning---to form general rules from data---has been wellstudied in machine learning and artificial intelligence. The problem can be stated as follows: Given: ffl Training Data: Positive and negative examples of a concept to be identified. ffl Concept Description Language: A language in which the final concept definition must..

    Improving short text classification using unlabeled background knowledge to assess document similarity

    No full text
    We describe a method for improving the classification of short text strings using a combination of labeled training data plus a secondary corpus of unlabeled but related longer documents. We show that such unlabeled background knowledge can greatly decrease error rates, particularly if the number of examples or the size of the strings in the training set is small. This is particularly useful when labeling text is a labor-intensive job and when there is a large amount of information available about a particular problem on the World Wide Web. Our approach views the task as one of information integration using WHIRL, a tool that combines database functionalities with techniques from the information-retrieval literature. 1

    Using LSI for Text Classification in the Presence of Background Text

    No full text
    This paper presents work that uses Latent Semantic Indexing (LSI) for text classification. However, in addition to relying on labeled training data, we improve classification accuracy by also using unlabeled data and other forms of available "background" text in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expandedterm-by-document matrix that includes both the labeled data as well as any available and relevant background text. We report the performance of this approach on data sets both with and without the inclusion of the background text, and compare our work to other efforts that can incorporate unlabeled data and other background text in the classification process
    • …
    corecore