Search CORE

115 research outputs found

Learning to Predict the Wisdom of Crowds

Author: Ertekin Seyda
Hirsh Haym
Rudin Cynthia
Publication venue
Publication date: 01/01/2012
Field of study

The problem of "approximating the crowd" is that of estimating the crowd's majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, "CrowdSense," that works in an online fashion to dynamically sample subsets of labelers based on an exploration/exploitation criterion. The algorithm produces a weighted combination of a subset of the labelers' votes that approximates the crowd's opinion.Comment: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991

arXiv.org e-Print Archive

CiteSeerX

OpenMETU (Middle East Technical University)

Wisely Using a Budget for Crowdsourcing

Author: Ertekin Seyda
Hirsh Haym
Rudin Cynthia
Publication venue: Massachusetts Institute of Technology, Operations Research Center
Publication date: 01/01/2012
Field of study

The problem of “approximating the crowd” is that of estimating the crowd’s majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, “CrowdSense,” that works in an online fashion where examples come one at a time. Crowd-Sense dynamically samples subsets of labelers based on an exploration/exploitation criterion. The algorithm produces a weighted combination of a subset of the labelers’ votes that approximates the crowd’s opinion. We then introduce two variations of CrowdSense that make various distributional assumptions to handle distinct crowd characteristics. In particular, the first algorithm makes a statistical independence assumption of the probabilities for large crowds, whereas the second algorithm finds a lower bound on how often the current sub-crowd agrees with the crowd majority vote. Our experiments on CrowdSense and several baselines demonstrate that we can reliably approximate the entire crowd’s vote by collecting opinions from a representative subset of the crowd

CiteSeerX

DSpace@MIT

Two frameworks for integrating knowledge in induction

Author: Cohen William W.
Hirsh Haym
Rosenbloom Paul S.
Smith Benjamin D.
Publication venue
Publication date
Field of study

The use of knowledge in inductive learning is critical for improving the quality of the concept definitions generated, reducing the number of examples required in order to learn effective concept definitions, and reducing the computation needed to find good concept definitions. Relevant knowledge may come in many forms (such as examples, descriptions, advice, and constraints) and from many sources (such as books, teachers, databases, and scientific instruments). How to extract the relevant knowledge from this plethora of possibilities, and then to integrate it together so as to appropriately affect the induction process is perhaps the key issue at this point in inductive learning. Here the focus is on the integration part of this problem; that is, how induction algorithms can, and do, utilize a range of extracted knowledge. Preliminary work on a transformational framework for defining knowledge-intensive inductive algorithms out of relatively knowledge-free algorithms is described, as is a more tentative problems-space framework that attempts to cover all induction algorithms within a single general approach. These frameworks help to organize what is known about current knowledge-intensive induction algorithms, and to point towards new algorithms

NASA Technical Reports Server

The learnability of description logics with equality constraints

Author: A. Blumer
A. Borgida
A. Frisch
A. Frisch
D. Angluin
D. Bobrow
D. Conklin
D. Haussler
D. Helmbold
F. Pfenning
G. D. Plotkin
Haym Hirsh
J. R. Quinlan
K. Morik
L. G. Valiant
L. Pitt
L. Pitt
M. Gold
M. Kearns
M. R. Quillian
M. Vilain
N. Littlestone
P. Devanbu
P. Idestam-Almquist
R. L. Rivest
S. Muggleton
T. G. Dietterich
W. Buntine
W. W. Cohen
William W. Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1994
Field of study

Crossref

Badges

Author: Haym Hirsh
Publication venue: UCI Machine Learning Repository
Publication date: 01/01/1994
Field of study

Ezid

The Computational Complexity of the Candidate-Elimination Algorithm

Author: Haym Hirsh
Publication venue
Publication date
Field of study

Mitchell's original work on version spaces (Mitchell, 1982) presented an analysis of the computational complexity of version spaces. However, this analysis proved somewhat coarse, as it was parameterized by s and g, the maximum sizes that the S and G sets reach during learning. As has been pointed out by Haussler (1988) , g can be exponential in the number of examples processed. This paper presents a more fine-grained analysis of the computational complexity of version spaces, demonstrates its equivalence to Mitchell's analysis, and instantiates it for two commonly used conjunctive concept description languages. 1 Introduction The problem of inductive concept learning---to form general rules from data---has been wellstudied in machine learning and artificial intelligence. The problem can be stated as follows: Given: ffl Training Data: Positive and negative examples of a concept to be identified. ffl Concept Description Language: A language in which the final concept definition must..

CiteSeerX

Improving short text classification using unlabeled background knowledge to assess document similarity

Author: Haym Hirsh
Sarah Zelikovitz
Publication venue
Publication date: 01/01/2000
Field of study

We describe a method for improving the classification of short text strings using a combination of labeled training data plus a secondary corpus of unlabeled but related longer documents. We show that such unlabeled background knowledge can greatly decrease error rates, particularly if the number of examples or the size of the strings in the training set is small. This is particularly useful when labeling text is a labor-intensive job and when there is a large amount of information available about a particular problem on the World Wide Web. Our approach views the task as one of information integration using WHIRL, a tool that combines database functionalities with techniques from the information-retrieval literature. 1

CiteSeerX

Using LSI for Text Classification in the Presence of Background Text

Author: Haym Hirsh
Sarah Zelikovitz
Publication venue: ACM Press
Publication date: 01/01/2001
Field of study

This paper presents work that uses Latent Semantic Indexing (LSI) for text classification. However, in addition to relying on labeled training data, we improve classification accuracy by also using unlabeled data and other forms of available "background" text in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expandedterm-by-document matrix that includes both the labeled data as well as any available and relevant background text. We report the performance of this approach on data sets both with and without the inclusion of the background text, and compare our work to other efforts that can incorporate unlabeled data and other background text in the classification process

CiteSeerX

Crossref