139 research outputs found
Analyzing the Amazon Mechanical Turk Marketplace
Since the concept of crowdsourcing is relatively new, many potential
participants have questions about the AMT marketplace. For example, a
common set of questions that pop up in an 'introduction to crowdsourcing
and AMT' session are the following: What type of tasks can be completed
in the marketplace? How much does it cost? How fast can I get results
back? How big is the AMT marketplace? The answers for these questions
remain largely anecdotal and based on personal observations and
experiences. To understand better what types of tasks are being
completed today using crowdsourcing techniques, we started collecting
data about the AMT marketplace. We present a preliminary analysis of the
dataset and provide directions for interesting future research
Classification-Aware Hidden-Web Text Database Selection,
Many valuable text databases on the web have noncrawlable contents that are “hidden” behind
search interfaces. Metasearchers are helpful tools for searching over multiple such “hidden-web”
text databases at once through a unified query interface. An important step in the metasearching
process is database selection, or determining which databases are the most relevant for a given
user query. The state-of-the-art database selection techniques rely on statistical summaries of the
database contents, generally including the database vocabulary and associated word frequencies.
Unfortunately, hidden-web text databases typically do not export such summaries, so previous research
has developed algorithms for constructing approximate content summaries from document
samples extracted from the databases via querying.We present a novel “focused-probing” sampling
algorithm that detects the topics covered in a database and adaptively extracts documents that
are representative of the topic coverage of the database. Our algorithm is the first to construct
content summaries that include the frequencies of the words in the database. Unfortunately, Zipf’s
law practically guarantees that for any relatively large database, content summaries built from
moderately sized document samples will fail to cover many low-frequency words; in turn, incomplete
content summaries might negatively affect the database selection process, especially for short
queries with infrequent words. To enhance the sparse document samples and improve the database
selection decisions, we exploit the fact that topically similar databases tend to have similar
vocabularies, so samples extracted from databases with a similar topical focus can complement
each other. We have developed two database selection algorithms that exploit this observation.
The first algorithm proceeds hierarchically and selects the best categories for a query, and then
sends the query to the appropriate databases in the chosen categories. The second algorithm uses “shrinkage,” a statistical technique for improving parameter estimation in the face of sparse data,
to enhance the database content summaries with category-specific words.We describe how to modify
existing database selection algorithms to adaptively decide (at runtime) whether shrinkage is
beneficial for a query. A thorough evaluation over a variety of databases, including 315 real web databases
as well as TREC data, suggests that the proposed sampling methods generate high-quality
content summaries and that the database selection algorithms produce significantly more relevant
database selection decisions and overall search results than existing algorithms.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
A Framework for Quality Assurance in Crowdsourcing
The emergence of online paid micro-crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), allows on-demand and at scale distribution of tasks to human workers around the world. In such settings, online workers come and complete small tasks posted by a company, working for as long or as little as they wish. Such temporary employer-employee relationships give rise to adverse selection, moral hazard, and many other challenges. How can we ensure that the submitted work is accurate, especially when the verification cost is comparable to the cost of performing the task? How can we estimate the exhibited quality of the workers? What pricing strategies should be used to induce the effort of workers with varying ability levels? We develop a comprehensive framework for managing the quality in such micro crowdsourcing settings: First, we describe an algorithm for estimating the error rates of the participating workers, and show how to separate systematic worker biases from unrecoverable errors and generate an unbiased “worker quality” measurement. Next, we present a selective repeated-labeling algorithm that acquires labels in a way so that quality requirements can be met at minimum cost. Then, we propose a quality-adjusted pricing scheme that adjusts the payment level according to the contributed value by each worker. We test our compensation scheme in a principal-agent setting in which workers respond to incentives by varying their effort. Our simulation results demonstrate that the proposed pricing scheme is able to induce workers to exert higher levels of effort and yield larger profits for employers compared to the commonly adopted uniform pricing schemes. We also describe strategies that build on our quality control and pricing framework, to tackle crowdsourced tasks of increasingly higher complexity, while still maintaining a tight quality control of the process
Modeling Dependency in Prediction Markets
In the last decade, prediction markets became popular forecasting tools
in areas ranging from election results to movie revenues and Oscar
nominations. One of the features that make prediction markets
particularly attractive for decision support applications is that they
can be used to answer what-if questions and estimate probabilities of
complex events. Traditional approach to answering such questions
involves running a combinatorial prediction market, what is not always
possible. In this paper, we present an alternative, statistical approach
to pricing complex claims, which is based on analyzing co-movements of
prediction market prices for basis events. Experimental evaluation of
our technique on a collection of 51 InTrade contracts representing the
Democratic Party Nominee winning Electoral College Votes of a particular
state shows that the approach outperforms traditional forecasting
methods such as price and return regressions and can be used to extract
meaningful business intelligence from raw price data
Recommended from our members
Summarizing and Searching Hidden-Web Databases Hierarchically Using Focused Probes
Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query, a task that typically relies on statistical summaries of the database contents. Unfortunately, web-accessible text databases do not generally export content summaries. In this paper, we present an algorithm to derive content summaries from "uncooperative" databases by using "focused query probes," which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. The content summaries that result from this algorithm are efficient to derive and more accurate than those from previously proposed probing techniques for content-summary extraction. We also present a novel database selection algorithm that exploits both the extracted content summaries and a hierarchical classification of the databases, automatically derived during probing, to produce accurate results even for imperfect content summaries. Finally, we evaluate our techniques thoroughly using a variety of databases, including 50 real web-accessible text databases
Modeling Volatility in Prediction Markets
Nowadays, there is a significant experimental evidence of excellent ex-post predictive accuracy in certain types of prediction markets, such as markets for elections. This evidence shows that prediction markets are efficient mechanisms for aggregating information and are more accurate in forecasting events than traditional forecasting methods, such as polls. Interpretation of prediction market prices as probabilities has been extensively studied in the literature, however little attention so far has been given to understanding volatility of prediction market prices. In this paper, we present a model of a prediction market with a binary payoff on a competitive event involving two parties. In our model, each party has some underlying ``ability'' process that describes its ability to win and evolves as an Ito diffusion. We show that if the prediction market for this event is efficient and accurate, the price of the corresponding contract will also follow a diffusion and its instantaneous volatility is a particular function of the current claim price and its time to expiration. We generalize our results to competitive events involving more than two parties and show that volatilities of prediction market contracts for such events are again functions of the current claim prices and the time to expiration, as well as of several additional parameters (ternary correlations of the underlying Brownian motions). In the experimental section, we validate our model on a set of InTrade prediction markets and show that it is consistent with observed volatilities of contract returns and outperforms the well-known GARCH model in predicting future contract volatility from historical price data. To demonstrate the practical value of our model, we apply it to pricing options on prediction market contracts, such as those recently introduced by InTrade. Other potential applications of this model include detection of significant market moves and improving forecast standard errors
Estimating the Socio-Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics
With the rapid growth of the Internet, the ability of users to create and publish content has created active electronic communities that provide a wealth of product information. However, the high volume of reviews that are typically published for a single product makes harder for individuals as well as manufacturers to locate the best reviews and understand the true underlying quality of a product. In this paper, we re-examine the impact of reviews on economic outcomes like product sales and see how different factors affect social outcomes like the extent of their perceived usefulness. Our approach explores multiple aspects of review text, such as lexical, grammatical, semantic, and stylistic levels to identify important text-based features. In addition, we also examine multiple reviewer-level features such as average usefulness of past reviews and the self-disclosed identity measures of reviewers that are displayed next to a review. Our econometric analysis reveals that the extent of subjectivity, informativeness, readability, and linguistic correctness in reviews matters in influencing sales and perceived usefulness. Reviews that have a mixture of objective, and highly subjective sentences have a negative effect on product sales, compared to reviews that tend to include only subjective or only objective information. However, such reviews are considered more informative (or helpful) by the users. By using Random Forest based classifiers, we show that we can accurately predict the impact of reviews on sales and their perceived usefulness. Reviews for products that have received widely fluctuating reviews, also have reviews of widely fluctuating helpfulness. In particular, we find that highly detailed and readable reviews can have low helpfulness votes in cases when users tend to vote negatively not because they disapprove of the review quality but rather to convey their disapproval of the review polarity. We examine the relative importance of the three broad feature categories: `reviewer-related' features, `review subjectivity' features, and `review readability' features, and find that using any of the three feature sets results in a statistically equivalent performance as in the case of using all available features. This paper is the first study that integrates econometric, text mining, and predictive modeling techniques toward a more complete analysis of the information captured by user-generated online reviews in order to estimate their socio-economic impact. Our results can have implications for judicious design of opinion forums
The Dimensions of Reputation in Electronic Markets
We analyze how di erent dimensions of a seller's reputation a ect pricing power in electronic markets. We do
so by using text mining techniques to identify and structure dimensions of importance from feedback posted
on reputation systems, by aggregating and scoring these dimensions based on the sentiment they contain,
and using them to estimate a series of econometric models associating reputation with price premiums. We
nd that di erent dimensions do indeed a ect pricing power di erentially, and that a negative reputation
hurts more than a positive one helps on some dimensions but not on others. We provide the rst evidence
that sellers of identical products in electronic markets di erentiate themselves based on a distinguishing
dimension of strength, and that buyers vary in the relative importance they place on di erent ful lment
characteristics. We highlight the importance of textual reputation feedback further by demonstrating it
substantially improves the performance of a classi er we have trained to predict future sales. This paper is
the rst study that integrates econometric, text mining and predictive modeling techniques toward a more
complete analysis of the information captured by reputation systems, and it presents new evidence of the
importance of their e ective and judicious design.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Get Another Label? Improving Data Quality and Data Mining
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity and show several main results: (i) Repeated-labeling can improve label and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a robust technique that combines different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Demographics of Mechanical Turk
We present the results of a survey that collected information about the
demographics of participants on Amazon Mechanical Turk, together with
information about their level of activity and motivation for working on
Amazon Mechanical Turk. We find that approximately 50% of the workers
come from the United States and 40% come from India. Country of origin
tends to change the motivating reasons for workers to participate in the
marketplace. Significantly more workers from India participate on
Mechanical Turk because the online marketplace is a primary source of
income, while in the US most workers consider Mechanical Turk a
secondary source of income. While money is a primary motivating reason
for workers to participate in the marketplace, workers also cite a
variety of other motivating reasons, including entertainment and education
- …