6 research outputs found
On the adoption of crowdsourcing for theory testing
This paper examines the possibilities of using the crowdsourcing strategy for theory testing. We first analyse the relationships between theory building and theory testing activities. Then, based on a systematic review of 248 papers published in MISQ, we characterise the intents and pat-tern systems of activities that have been used for theory testing. Finally, we ascertain which ac-tivities can be crowdsourced or not and pinpoint a set of pathways supporting partial and total crowdsourcing. The obtained results show that a large number of activities related to data gath-ering can be crowdsourced, and that a number of intents have viable pathways supporting par-tial crowdsourcing
Webbasierte linguistische Forschung: MΓΆglichkeiten und Begrenzungen beim Umgang mit Massendaten
Over the past ten to fifteen years, web-based methods of sociological research have emerged alongside classical methods such as interviews, observations and experiments, and linguistic research is increasingly relying upon them as well. This paper provides an overview of three web-based approaches, i.e. online surveys, crowd-sourcing and web-based corpus analyses. Examples from specific projects serve to reflect upon these methods, address their potential and limitations, and make a critical appraisal. Internet-based empirical research produces vast and highly diverse quantities of (speaker-based or textual) data, presenting linguistic research with new opportunities and challenges. New procedures are required to make effective use of these resources
ΠΠΎΠ»Π»Π΅ΠΊΡΠΈΠ²Π½ΡΠ΅ ΠΏΠΎΡΠΎΠΊΠΎΠ²ΡΠ΅ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΡ: ΡΠ΅Π»ΡΡΠΈΠΎΠ½Π½ΡΠ΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΈ Π°Π»Π³ΠΎΡΠΈΡΠΌΡ
Recently, microtask crowdsourcing has become a popular approach for addressing various data mining problems. Crowdsourcing workflows for approaching such problems are composed of several data processing stages which require consistent representation for making the work reproducible. This paper is devoted to the problem of reproducibility and formalization of the microtask crowdsourcing process. A computational model for microtask crowdsourcing based on an extended relational model and a dataflow computational model has been proposed. The proposed collaborative dataflow computational model is designed for processing the input data sources by executing annotation stages and automatic synchronization stages simultaneously. Data processing stages and connections between them are expressed by using collaborative computation workflows represented as loosely connected directed acyclic graphs. A synchronous algorithm for executing such workflows has been described. The computational model has been evaluated by applying it to two tasks from the computational linguistics field: concept lexicalization refining in electronic thesauri and establishing hierarchical relations between such concepts. The βAddβRemoveβConfirmβ procedure is designed for adding the missing lexemes to the concepts while removing the odd ones. The βGenusβSpeciesβMatchβ procedure is designed for establishing βis-aβ relations between the concepts provided with the corresponding word pairs. The experiments involving both volunteers from popular online social networks and paid workers from crowdsourcing marketplaces confirm applicability of these procedures for enhancing lexical resources.Β Π ΠΏΠΎΡΠ»Π΅Π΄Π½Π΅Π΅ Π²ΡΠ΅ΠΌΡ ΠΊΡΠ°ΡΠ΄ΡΠΎΡΡΠΈΠ½Π³ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ Π²ΡΠΏΠΎΠ»Π΅Π½ΠΈΡ ΠΌΠΈΠΊΡΠΎΠ·Π°Π΄Π°Ρ ΠΏΠΎΠ»ΡΡΠΈΠ» ΡΠΈΡΠΎΠΊΠΎΠ΅ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π² ΠΎΠ±Π»Π°ΡΡΠΈ Π°Π½Π°Π»ΠΈΠ·Π° Π½Π΅ΡΡΡΡΠΊΡΡΡΠΈΡΠΎΠ²Π°Π½Π½ΡΡ
Π΄Π°Π½Π½ΡΡ
. Π Π°Π·ΡΠ°Π±Π°ΡΡΠ²Π°ΡΡΡΡ ΡΠΏΠ΅ΡΠΈΠ°Π»ΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π½ΡΠ΅ ΠΌΠ΅ΡΠΎΠ΄ΠΈΠΊΠΈ, ΡΠΎΡΡΠΎΡΡΠΈΠ΅ ΠΈΠ· ΠΌΠ½ΠΎΠΆΠ΅ΡΡΠ²Π° ΡΡΠ°ΠΏΠΎΠ² ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΈΡΡ
ΠΎΠ΄Π½ΡΡ
Π΄Π°Π½Π½ΡΡ
, ΡΡΠ΅Π±ΡΡΡΠΈΡ
ΡΠΎΠ³Π»Π°ΡΠΎΠ²Π°Π½Π½ΠΎΡΡΠΈ ΠΈΡ
ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΡ Π΄Π»Ρ ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠ΅Π½ΠΈΡ Π²ΠΎΡΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΠΌΠΎΡΡΠΈ ΡΠ°Π±ΠΎΡΡ. ΠΠ°Π½Π½Π°Ρ ΡΡΠ°ΡΡΡ ΠΏΠΎΡΠ²ΡΡΠ΅Π½Π° ΡΠ΅ΡΠ΅Π½ΠΈΡ ΠΏΡΠΎΠ±Π»Π΅ΠΌΡ Π²ΠΎΡΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΠΌΠΎΡΡΠΈ ΠΈ ΡΠΎΡΠΌΠ°Π»ΠΈΠ·Π°ΡΠΈΠΈ ΠΏΡΠΎΡΠ΅ΡΡΠ° ΠΊΡΠ°ΡΠ΄ΡΠΎΡΡΠΈΠ½Π³Π° ΠΌΠΈΠΊΡΠΎΠ·Π°Π΄Π°ΡΠ°ΠΌΠΈ. ΠΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π° ΠΌΠΎΠ΄Π΅Π»Ρ ΠΊΠΎΠ»Π»Π΅ΠΊΡΠΈΠ²Π½ΡΡ
ΠΏΠΎΡΠΎΠΊΠΎΠ²ΡΡ
Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΈΜ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΡΠ°ΡΡΠΈΡΠ΅Π½Π½ΠΎΠΈΜ ΡΠ΅Π»ΡΡΠΈΠΎΠ½Π½ΠΎΠΈΜ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΈ ΠΏΠΎΡΠΎΠΊΠΎΠ²ΠΎΠΈΜ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΈΜ. ΠΠΎΠ΄Π΅Π»Ρ ΠΏΡΠ΅Π΄Π½Π°Π·Π½Π°ΡΠ΅Π½Π° Π΄Π»Ρ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΈΡΡ
ΠΎΠ΄Π½ΡΡ
Π΄Π°Π½Π½ΡΡ
Π² Π²ΠΈΠ΄Π΅ ΡΠ΅Π»ΡΡΠΈΠΎΠ½Π½ΡΡ
ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠΈΜ ΠΏΡΡΠ΅ΠΌ ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΠΎΠ³ΠΎ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΡΠ°ΠΏΠΎΠ² ΡΠ°Π·ΠΌΠ΅ΡΠΊΠΈ ΠΌΠΈΠΊΡΠΎΠ·Π°Π΄Π°ΡΠ°ΠΌΠΈ ΠΈ ΡΡΠ°ΠΏΠΎΠ² Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΈΜ ΡΠΈΠ½Ρ
ΡΠΎΠ½ΠΈΠ·Π°ΡΠΈΠΈ. ΠΡΠ°ΠΏΡ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ Π΄Π°Π½Π½ΡΡ
ΠΈ ΡΠ²ΡΠ·ΠΈ ΠΌΠ΅ΠΆΠ΄Ρ Π½ΠΈΠΌΠΈ Π·Π°ΠΏΠΈΡΡΠ²Π°ΡΡΡΡ Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΡΡ
Π΅ΠΌΡ ΠΊΠΎΠ»Π»Π΅ΠΊΡΠΈΠ²Π½ΡΡ
Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΈΜ, ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΡΡΠ΅ΠΈΜ ΡΠΎΠ±ΠΎΠΈΜ ΡΠ»Π°Π±ΠΎ ΡΠ²ΡΠ·Π½ΡΠΈΜ ΠΎΡΠΈΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π½Π½ΡΠΈΜ Π°ΡΠΈΠΊΠ»ΠΈΡΠ΅ΡΠΊΠΈΠΈΜ Π³ΡΠ°Ρ. ΠΠΏΠΈΡΠ°Π½ ΡΠΈΠ½Ρ
ΡΠΎΠ½Π½ΡΠΈΜ Π°Π»Π³ΠΎΡΠΈΡΠΌ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΡ
Π΅ΠΌ ΠΊΠΎΠ»Π»Π΅ΠΊΡΠΈΠ²Π½ΡΡ
Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΈΜ. ΠΡΠΎΠ΄Π΅ΠΌΠΎΠ½ΡΡΡΠΈΡΠΎΠ²Π°Π½Ρ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π² ΠΎΠ±Π»Π°ΡΡΠΈ ΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠ½ΠΎΠΈΜ Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΠΊΠΈ Π΄Π»Ρ ΡΡΠΎΡΠ½Π΅Π½ΠΈΡ Π»Π΅ΠΊΡΠΈΠΊΠ°Π»ΠΈΠ·Π°ΡΠΈΠΈ ΠΏΠΎΠ½ΡΡΠΈΠΈΜ Π² ΡΠ»Π΅ΠΊΡΡΠΎΠ½Π½ΡΡ
ΡΠ΅Π·Π°ΡΡΡΡΠ°Ρ
ΠΈ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΡΠΎΠ΄ΠΎ-Π²ΠΈΠ΄ΠΎΠ²ΡΡ
ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠΈΜ ΠΌΠ΅ΠΆΠ΄Ρ ΠΏΠΎΠ½ΡΡΠΈΡΠΌΠΈ ΠΏΡΠΈ ΠΏΠΎΠΌΠΎΡΠΈ ΠΊΡΠ°ΡΠ΄ΡΠΎΡΡΠΈΠ½Π³Π°. ΠΡΠΎΡΠ΅Π΄ΡΡΠ° Β«Π΄ΠΎΠ±Π°Π²ΠΈΡΡβΡΠ΄Π°Π»ΠΈΡΡβΠΏΠΎΠ΄ΡΠ²Π΅ΡΠ΄ΠΈΡΡΒ» ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ Π²Π½Π΅ΡΡΠΈ Π² Π»Π΅ΠΊΡΠΈΠΊΠ°Π»ΠΈΠ·Π°ΡΠΈΡ ΠΏΠΎΠ½ΡΡΠΈΠΈΜ Π½Π΅Π΄ΠΎΡΡΠ°ΡΡΠΈΠ΅ Π»Π΅ΠΊΡΠ΅ΠΌΡ ΠΈ ΠΈΡΠΊΠ»ΡΡΠΈΡΡ ΠΏΠΎΡΡΠΎΡΠΎΠ½Π½ΠΈΠ΅. ΠΡΠΎΡΠ΅Π΄ΡΡΠ° Β«ΡΠΎΠ΄βΠ²ΠΈΠ΄βΡΠΎΠΏΠΎΡΡΠ°Π²ΠΈΡΡΒ» ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ ΡΡΠΎΡΠΌΠΈΡΠΎΠ²Π°ΡΡ Π³ΠΈΠΏΠΎ-Π³ΠΈΠΏΠ΅ΡΠΎΠ½ΠΈΠΌΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΡ ΠΌΠ΅ΠΆΠ΄Ρ ΠΏΠΎΠ½ΡΡΠΈΡΠΌΠΈ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΡΡΡΠΈΡ
ΡΠΎΠ΄ΠΎ-Π²ΠΈΠ΄ΠΎΠ²ΡΡ
ΠΏΠ°Ρ ΡΠ»ΠΎΠ². Π Π΅Π·ΡΠ»ΡΡΠ°ΡΡ ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠΎΠ² Π½Π° ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»Π°Ρ
ΠΎΡΠΊΡΡΡΠΎΠ³ΠΎ ΡΠ»Π΅ΠΊΡΡΠΎΠ½Π½ΠΎΠ³ΠΎ ΡΠ΅Π·Π°ΡΡΡΡΠ° ΡΡΡΡΠΊΠΎΠ³ΠΎ ΡΠ·ΡΠΊΠ° ΠΏΠΎΠ΄ΡΠ²Π΅ΡΠΆΠ΄Π°ΡΡ ΠΏΡΠΈΠΌΠ΅Π½ΠΈΠΌΠΎΡΡΡ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠ°Π½Π½ΡΡ
ΠΏΡΠΎΡΠ΅Π΄ΡΡ Π΄Π»Ρ ΡΠ°Π·Π²ΠΈΡΠΈΡ Π»Π΅ΠΊΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΡΠ΅ΡΡΡΡΠΎΠ². Π ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Ρ
ΠΏΡΠΈΠ½ΡΠ»ΠΈ ΡΡΠ°ΡΡΠΈΠ΅ ΠΊΠ°ΠΊ Π²ΠΎΠ»ΠΎΠ½ΡΠ΅ΡΡ ΠΈΠ· ΠΏΠΎΠΏΡΠ»ΡΡΠ½ΡΡ
ΡΠΎΡΠΈΠ°Π»ΡΠ½ΡΡ
ΡΠ΅ΡΠ΅ΠΈΜ, ΡΠ°ΠΊ ΠΈ ΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΠ΅Π»ΠΈ Π±ΠΈΡΠΆ ΠΊΡΠ°ΡΠ΄ΡΠΎΡΡΠΈΠ½Π³Π° (Π·Π° Π²ΠΎΠ·Π½Π°Π³ΡΠ°ΠΆΠ΄Π΅Π½ΠΈΠ΅ Π² ΡΠΎΡΠΌΠ΅ ΠΌΠΈΠΊΡΠΎΠΏΠ»Π°ΡΠ΅ΠΆΠ΅ΠΈΜ).
Annotation als Methode der digitalen Diskurslinguistik
Annotation hat sich von einer analog geprΓ€gten Kulturtechnik, deren Geschichte sich bis in die Antike zurΓΌckverfolgen lΓ€sst, hin zu einer digitalen Praktik entwickelt, die in verschiedenen Spannungsfeldern verortet werden kann. Sie wird einerseits als alltagsweltliche Rezeptions-, aber auch Kommunikationsform vollzogen (und ist somit potenzieller Gegenstand von linguistischen Untersuchungen, z.B. Hashtag-Analysen), andererseits auch als wissenschaftliche Methode angewendet. Methodisch bildet sie eine Schnittstelle zwischen hermeneutischer und automatisiert-quantifizierender Analyse. Aus diskurslinguistischer Perspektive verbindet das Annotieren wissenschaftliche MetadiskursivitΓ€t mit diskursiver Involviertheit. Diese Verortungen werden im vorliegenden Aufsatz diskutiert und anhand von konkreten Annotationsprojekten veranschaulicht. Im Mittelpunkt steht dabei das kollaborativ-diskursive Annotieren als wissenschaftliche Methode in der digitalen (Diskurs-)Linguistik
Composing Measures for Computing Text Similarity
We present a comprehensive study of computing similarity between texts. We start from the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. We thus define the notion of text similarity and distinguish it from related tasks such as textual entailment and near-duplicate detection. We then identify multiple text dimensions, i.e. characteristics inherent to texts that can be used to judge text similarity, for which we provide empirical evidence.
We discuss state-of-the-art text similarity measures previously proposed in the literature, before continuing with a thorough discussion of common evaluation metrics and datasets. Based on the analysis, we devise an architecture which combines text similarity measures in a unified classification framework. We apply our system in two evaluation settings, for which it consistently outperforms prior work and competing systems: (a) an intrinsic evaluation in the context of the Semantic Textual Similarity Task as part of the Semantic Evaluation (SemEval) exercises, and (b) an extrinsic evaluation for the detection of text reuse.
As a basis for future work, we introduce DKPro Similarity, an open source software package which streamlines the development of text similarity measures and complete experimental setups