20 research outputs found

    MTurk 101: An Introduction to Amazon Mechanical Turk for Extension Professionals

    Get PDF
    Amazon Mechanical Turk (MTurk) is an online marketplace for labor recruitment that has become a popular platform for data collection. In particular, MTurk can be a valuable tool for Extension professionals. As an example, MTurk workers can provide feedback, write reviews, or give input on a website design. In this article we discuss the many uses of MTurk for Extension professionals and provide best practices for its use

    The (Statistical) Power of Mechanical Turk

    Get PDF
    In this paper, I argue for the use of Amazon Mechanical Turk (AMT) in language research. AMT is an online marketplace of paid workers who may be used as subjects, which can greatly increase the statistical power of studies quickly and with minimal funding. I will show thatā€”despite some obvious limitations of using distant subjectsā€”properly designed experiments completed on AMT are trustworthy, cheap, and much faster than traditional face-to-face data collection. Not only this, but AMT workers may help with data analysis, which can greatly increase the scope of research that one researcher may carry out. This paper will first argue several reasons for using online subjects, then quickly outline how to build a survey-type experiment using AMT, and finally review several best practices for ensuring reliable data

    Introduction to the special issue on annotated corpora

    Get PDF
    International audienceLes corpus annoteĢs sont toujours plus cruciaux, aussi bien pour la recherche scien- tifique en linguistique que le traitement automatique des langues. Ce numeĢro speĢcial passe brieĢ€vement en revue lā€™eĢvolution du domaine et souligne les deĢfis aĢ€ relever en restant dans le cadre actuel dā€™annotations utilisant des cateĢgories analytiques, ainsi que ceux remettant en question le cadre lui-meĢ‚me. Il preĢsente trois articles, lā€™un concernant lā€™eĢvaluation de la qualiteĢ dā€™annotation, et deux concernant des corpus arboreĢs du francĢ§ais, lā€™un traitant du plus ancien projet de corpus arboreĢ du francĢ§ais, le French Treebank, le second concernant la conversion de corpus francĢ§ais dans le scheĢma interlingue des Universal Dependencies, offrant ainsi une illustration de lā€™histoire du deĢveloppement des corpus arboreĢs.Annotated corpora are increasingly important for linguistic scholarship, science and technology. This special issue briefly surveys the development of the field and points to challenges within the current framework of annotation using analytical categories as well as challenges to the framework itself. It presents three articles, one concerning the evaluation of the quality of annotation, and two concerning French treebanks, one dealing with the oldest project for French, the French Treebank, the second concerning the conversion of French corpora into the cross-lingual framework of Universal Dependencies, thus offering an illustration of the history of treebank development worldwide

    Crowdsourcing Emotions in Music Domain

    Get PDF
    An important source of intelligence for music emotion recognition today comes from user-provided community tags about songs or artists. Recent crowdsourcing approaches such as harvesting social tags, design of collaborative games and web services or the use of Mechanical Turk, are becoming popular in the literature. They provide a cheap, quick and efficient method, contrary to professional labeling of songs which is expensive and does not scale for creating large datasets. In this paper we discuss the viability of various crowdsourcing instruments providing examples from research works. We also share our own experience, illustrating the steps we followed using tags collected from Last.fm for the creation of two music mood datasets which are rendered public. While processing affect tags of Last.fm, we observed that they tend to be biased towards positive emotions; the resulting dataset thus contain more positive songs than negative ones

    Descartes: Generating Short Descriptions of Wikipedia Articles

    Full text link
    Wikipedia is one of the richest knowledge sources on the Web today. In order to facilitate navigating, searching, and maintaining its content, Wikipedia's guidelines state that all articles should be annotated with a so-called short description indicating the article's topic (e.g., the short description of beer is "Alcoholic drink made from fermented cereal grains"). Nonetheless, a large fraction of articles (ranging from 10.2% in Dutch to 99.7% in Kazakh) have no short description yet, with detrimental effects for millions of Wikipedia users. Motivated by this problem, we introduce the novel task of automatically generating short descriptions for Wikipedia articles and propose Descartes, a multilingual model for tackling it. Descartes integrates three sources of information to generate an article description in a target language: the text of the article in all its language versions, the already-existing descriptions (if any) of the article in other languages, and semantic type information obtained from a knowledge graph. We evaluate a Descartes model trained for handling 25 languages simultaneously, showing that it beats baselines (including a strong translation-based baseline) and performs on par with monolingual models tailored for specific languages. A human evaluation on three languages further shows that the quality of Descartes's descriptions is largely indistinguishable from that of human-written descriptions; e.g., 91.3% of our English descriptions (vs. 92.1% of human-written descriptions) pass the bar for inclusion in Wikipedia, suggesting that Descartes is ready for production, with the potential to support human editors in filling a major gap in today's Wikipedia across languages

    Five sources of bias in natural language processing

    Get PDF
    Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context. Here, we provide a simple, actionable summary of this recent work. We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and finally (5) the research design (or how we conceptualize our research). We explore each of the bias sources in detail in this article, including examples and links to related work, as well as potential counter-measures

    Characterizing the Global Crowd Workforce: A Cross-Country Comparison of Crowdworker Demographics

    Full text link
    Micro-task crowdsourcing is an international phenomenon that has emerged during the past decade. This paper sets out to explore the characteristics of the international crowd workforce and provides a cross-national comparison of the crowd workforce in ten countries. We provide an analysis and comparison of demographic characteristics and shed light on the significance of micro-task income for workers in different countries. This study is the first large-scale country-level analysis of the characteristics of workers on the platform Figure Eight (formerly CrowdFlower), one of the two platforms dominating the micro-task market. We find large differences between the characteristics of the crowd workforces of different countries, both regarding demography and regarding the importance of micro-task income for workers. Furthermore, we find that the composition of the workforce in the ten countries was largely stable across samples taken at different points in time

    Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk

    Get PDF
    Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data collection
    corecore