6 research outputs found

    On the adoption of crowdsourcing for theory testing

    Get PDF
    This paper examines the possibilities of using the crowdsourcing strategy for theory testing. We first analyse the relationships between theory building and theory testing activities. Then, based on a systematic review of 248 papers published in MISQ, we characterise the intents and pat-tern systems of activities that have been used for theory testing. Finally, we ascertain which ac-tivities can be crowdsourced or not and pinpoint a set of pathways supporting partial and total crowdsourcing. The obtained results show that a large number of activities related to data gath-ering can be crowdsourced, and that a number of intents have viable pathways supporting par-tial crowdsourcing

    Webbasierte linguistische Forschung: MΓΆglichkeiten und Begrenzungen beim Umgang mit Massendaten

    Get PDF
    Over the past ten to fifteen years, web-based methods of sociological research have emerged alongside classical methods such as interviews, observations and experiments, and linguistic research is increasingly relying upon them as well. This paper provides an overview of three web-based approaches, i.e. online surveys, crowd-sourcing and web-based corpus analyses. Examples from specific projects serve to reflect upon these methods, address their potential and limitations, and make a critical appraisal. Internet-based empirical research produces vast and highly diverse quantities of (speaker-based or textual) data, presenting linguistic research with new opportunities and challenges. New procedures are required to make effective use of these resources

    ΠšΠΎΠ»Π»Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Π΅ ΠΏΠΎΡ‚ΠΎΠΊΠΎΠ²Ρ‹Π΅ вычислСния: рСляционныС ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΈ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΡ‹

    Get PDF
    Recently, microtask crowdsourcing has become a popular approach for addressing various data mining problems. Crowdsourcing workflows for approaching such problems are composed of several data processing stages which require consistent representation for making the work reproducible. This paper is devoted to the problem of reproducibility and formalization of the microtask crowdsourcing process. A computational model for microtask crowdsourcing based on an extended relational model and a dataflow computational model has been proposed. The proposed collaborative dataflow computational model is designed for processing the input data sources by executing annotation stages and automatic synchronization stages simultaneously. Data processing stages and connections between them are expressed by using collaborative computation workflows represented as loosely connected directed acyclic graphs. A synchronous algorithm for executing such workflows has been described. The computational model has been evaluated by applying it to two tasks from the computational linguistics field: concept lexicalization refining in electronic thesauri and establishing hierarchical relations between such concepts. The β€œAdd–Remove–Confirm” procedure is designed for adding the missing lexemes to the concepts while removing the odd ones. The β€œGenus–Species–Match” procedure is designed for establishing β€œis-a” relations between the concepts provided with the corresponding word pairs. The experiments involving both volunteers from popular online social networks and paid workers from crowdsourcing marketplaces confirm applicability of these procedures for enhancing lexical resources.Β Π’ послСднСС врСмя краудсорсинг Π½Π° основС выполСния ΠΌΠΈΠΊΡ€ΠΎΠ·Π°Π΄Π°Ρ‡ ΠΏΠΎΠ»ΡƒΡ‡ΠΈΠ» ΡˆΠΈΡ€ΠΎΠΊΠΎΠ΅ ΠΏΡ€ΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π² области Π°Π½Π°Π»ΠΈΠ·Π° нСструктурированных Π΄Π°Π½Π½Ρ‹Ρ…. Π Π°Π·Ρ€Π°Π±Π°Ρ‚Ρ‹Π²Π°ΡŽΡ‚ΡΡ спСциализированныС ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΈΠΊΠΈ, состоящиС ΠΈΠ· мноТСства этапов ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ исходных Π΄Π°Π½Π½Ρ‹Ρ…, Ρ‚Ρ€Π΅Π±ΡƒΡŽΡ‰ΠΈΡ… согласованности ΠΈΡ… прСдставлСния для обСспСчСния воспроизводимости Ρ€Π°Π±ΠΎΡ‚Ρ‹. Данная ΡΡ‚Π°Ρ‚ΡŒΡ посвящСна Ρ€Π΅ΡˆΠ΅Π½ΠΈΡŽ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡ‹ воспроизводимости ΠΈ Ρ„ΠΎΡ€ΠΌΠ°Π»ΠΈΠ·Π°Ρ†ΠΈΠΈ процСсса краудсорсинга ΠΌΠΈΠΊΡ€ΠΎΠ·Π°Π΄Π°Ρ‡Π°ΠΌΠΈ. ΠŸΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π° модСль ΠΊΠΎΠ»Π»Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… ΠΏΠΎΡ‚ΠΎΠΊΠΎΠ²Ρ‹Ρ… вычислСний Π½Π° основС Ρ€Π°ΡΡˆΠΈΡ€Π΅Π½Π½ΠΎΠΈΜ† рСляционной ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΈ ΠΏΠΎΡ‚ΠΎΠΊΠΎΠ²ΠΎΠΈΜ† ΠΌΠΎΠ΄Π΅Π»ΠΈ вычислСний. МодСль ΠΏΡ€Π΅Π΄Π½Π°Π·Π½Π°Ρ‡Π΅Π½Π° для ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ исходных Π΄Π°Π½Π½Ρ‹Ρ… Π² Π²ΠΈΠ΄Π΅ рСляционных ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΠΈΜ† ΠΏΡƒΡ‚Π΅ΠΌ ΠΏΠ°Ρ€Π°Π»Π»Π΅Π»ΡŒΠ½ΠΎΠ³ΠΎ выполнСния этапов Ρ€Π°Π·ΠΌΠ΅Ρ‚ΠΊΠΈ ΠΌΠΈΠΊΡ€ΠΎΠ·Π°Π΄Π°Ρ‡Π°ΠΌΠΈ ΠΈ этапов автоматичСской синхронизации. Π­Ρ‚Π°ΠΏΡ‹ ΠΎΠ±Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ Π΄Π°Π½Π½Ρ‹Ρ… ΠΈ связи ΠΌΠ΅ΠΆΠ΄Ρƒ Π½ΠΈΠΌΠΈ Π·Π°ΠΏΠΈΡΡ‹Π²Π°ΡŽΡ‚ΡΡ с использованиСм схСмы ΠΊΠΎΠ»Π»Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… вычислСний, ΠΏΡ€Π΅Π΄ΡΡ‚Π°Π²Π»ΡΡŽΡ‰Π΅ΠΈΜ† собой слабо связный ΠΎΡ€ΠΈΠ΅Π½Ρ‚ΠΈΡ€ΠΎΠ²Π°Π½Π½Ρ‹ΠΈΜ† ацикличСский Π³Ρ€Π°Ρ„. Описан синхронный Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌ выполнСния схСм ΠΊΠΎΠ»Π»Π΅ΠΊΡ‚ΠΈΠ²Π½Ρ‹Ρ… вычислСний. ΠŸΡ€ΠΎΠ΄Π΅ΠΌΠΎΠ½ΡΡ‚Ρ€ΠΈΡ€ΠΎΠ²Π°Π½Ρ‹ прилоТСния ΠΌΠΎΠ΄Π΅Π»ΠΈ Π² области ΠΊΠΎΠΌΠΏΡŒΡŽΡ‚Π΅Ρ€Π½ΠΎΠΈΜ† лингвистики для уточнСния лСксикализации понятий Π² элСктронных тСзаурусах ΠΈ построСния Ρ€ΠΎΠ΄ΠΎ-Π²ΠΈΠ΄ΠΎΠ²Ρ‹Ρ… ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΠΈΜ† ΠΌΠ΅ΠΆΠ΄Ρƒ понятиями ΠΏΡ€ΠΈ ΠΏΠΎΠΌΠΎΡ‰ΠΈ краудсорсинга. ΠŸΡ€ΠΎΡ†Π΅Π΄ΡƒΡ€Π° Β«Π΄ΠΎΠ±Π°Π²ΠΈΡ‚ΡŒβ€“ΡƒΠ΄Π°Π»ΠΈΡ‚ΡŒβ€“ΠΏΠΎΠ΄Ρ‚Π²Π΅Ρ€Π΄ΠΈΡ‚ΡŒΒ» позволяСт внСсти Π² Π»Π΅ΠΊΡΠΈΠΊΠ°Π»ΠΈΠ·Π°Ρ†ΠΈΡŽ понятий Π½Π΅Π΄ΠΎΡΡ‚Π°ΡŽΡ‰ΠΈΠ΅ лСксСмы ΠΈ ΠΈΡΠΊΠ»ΡŽΡ‡ΠΈΡ‚ΡŒ посторонниС. ΠŸΡ€ΠΎΡ†Π΅Π΄ΡƒΡ€Π° Β«Ρ€ΠΎΠ΄β€“Π²ΠΈΠ΄β€“ΡΠΎΠΏΠΎΡΡ‚Π°Π²ΠΈΡ‚ΡŒΒ» позволяСт ΡΡ„ΠΎΡ€ΠΌΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ Π³ΠΈΠΏΠΎ-гипСронимичСскиС ΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΡ ΠΌΠ΅ΠΆΠ΄Ρƒ понятиями Π½Π° основС ΡΠΎΠΎΡ‚Π²Π΅Ρ‚ΡΡ‚Π²ΡƒΡŽΡ‰ΠΈΡ… Ρ€ΠΎΠ΄ΠΎ-Π²ΠΈΠ΄ΠΎΠ²Ρ‹Ρ… ΠΏΠ°Ρ€ слов. Π Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹ экспСримСнтов Π½Π° ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π»Π°Ρ… ΠΎΡ‚ΠΊΡ€Ρ‹Ρ‚ΠΎΠ³ΠΎ элСктронного тСзауруса русского языка ΠΏΠΎΠ΄Ρ‚Π²Π΅Ρ€ΠΆΠ΄Π°ΡŽΡ‚ ΠΏΡ€ΠΈΠΌΠ΅Π½ΠΈΠΌΠΎΡΡ‚ΡŒ Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½Π½Ρ‹Ρ… ΠΏΡ€ΠΎΡ†Π΅Π΄ΡƒΡ€ для развития лСксичСских рСсурсов. Π’ экспСримСнтах приняли участиС ΠΊΠ°ΠΊ Π²ΠΎΠ»ΠΎΠ½Ρ‚Π΅Ρ€Ρ‹ ΠΈΠ· популярных ΡΠΎΡ†ΠΈΠ°Π»ΡŒΠ½Ρ‹Ρ… сСтСй, Ρ‚Π°ΠΊ ΠΈ ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΠΈ Π±ΠΈΡ€ΠΆ краудсорсинга (Π·Π° Π²ΠΎΠ·Π½Π°Π³Ρ€Π°ΠΆΠ΄Π΅Π½ΠΈΠ΅ Π² Ρ„ΠΎΡ€ΠΌΠ΅ ΠΌΠΈΠΊΡ€ΠΎΠΏΠ»Π°Ρ‚Π΅ΠΆΠ΅ΠΈΜ†).

    Annotation als Methode der digitalen Diskurslinguistik

    Get PDF
    Annotation hat sich von einer analog geprΓ€gten Kulturtechnik, deren Geschichte sich bis in die Antike zurΓΌckverfolgen lΓ€sst, hin zu einer digitalen Praktik entwickelt, die in verschiedenen Spannungsfeldern verortet werden kann. Sie wird einerseits als alltagsweltliche Rezeptions-, aber auch Kommunikationsform vollzogen (und ist somit potenzieller Gegenstand von linguistischen Untersuchungen, z.B. Hashtag-Analysen), andererseits auch als wissenschaftliche Methode angewendet. Methodisch bildet sie eine Schnittstelle zwischen hermeneutischer und automatisiert-quantifizierender Analyse. Aus diskurslinguistischer Perspektive verbindet das Annotieren wissenschaftliche MetadiskursivitΓ€t mit diskursiver Involviertheit. Diese Verortungen werden im vorliegenden Aufsatz diskutiert und anhand von konkreten Annotationsprojekten veranschaulicht. Im Mittelpunkt steht dabei das kollaborativ-diskursive Annotieren als wissenschaftliche Methode in der digitalen (Diskurs-)Linguistik

    Composing Measures for Computing Text Similarity

    Get PDF
    We present a comprehensive study of computing similarity between texts. We start from the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. We thus define the notion of text similarity and distinguish it from related tasks such as textual entailment and near-duplicate detection. We then identify multiple text dimensions, i.e. characteristics inherent to texts that can be used to judge text similarity, for which we provide empirical evidence. We discuss state-of-the-art text similarity measures previously proposed in the literature, before continuing with a thorough discussion of common evaluation metrics and datasets. Based on the analysis, we devise an architecture which combines text similarity measures in a unified classification framework. We apply our system in two evaluation settings, for which it consistently outperforms prior work and competing systems: (a) an intrinsic evaluation in the context of the Semantic Textual Similarity Task as part of the Semantic Evaluation (SemEval) exercises, and (b) an extrinsic evaluation for the detection of text reuse. As a basis for future work, we introduce DKPro Similarity, an open source software package which streamlines the development of text similarity measures and complete experimental setups

    Vokabular-globale lexikalische Substitution in einem Vektorraummodell

    Get PDF
    corecore