Search CORE

6 research outputs found

On the adoption of crowdsourcing for theory testing

Author: Antunes Pedro
Enwereuzo Ijeoma
Johnstone David
Publication venue: AIS Electronic Library (AISeL)
Publication date: 28/11/2018
Field of study

This paper examines the possibilities of using the crowdsourcing strategy for theory testing. We first analyse the relationships between theory building and theory testing activities. Then, based on a systematic review of 248 papers published in MISQ, we characterise the intents and pat-tern systems of activities that have been used for theory testing. Finally, we ascertain which ac-tivities can be crowdsourced or not and pinpoint a set of pathways supporting partial and total crowdsourcing. The obtained results show that a large number of activities related to data gath-ering can be crowdsourced, and that a number of intents have viable pathways supporting par-tial crowdsourcing

AIS Electronic Library (AISeL)

Webbasierte linguistische Forschung: Möglichkeiten und Begrenzungen beim Umgang mit Massendaten

Author: Biemann Chris
Juska-Bacher Britta
Quasthoff Uwe
Publication venue: University of Bern
Publication date: 01/12/2013
Field of study

Over the past ten to fifteen years, web-based methods of sociological research have emerged alongside classical methods such as interviews, observations and experiments, and linguistic research is increasingly relying upon them as well. This paper provides an overview of three web-based approaches, i.e. online surveys, crowd-sourcing and web-based corpus analyses. Examples from specific projects serve to reflect upon these methods, address their potential and limitations, and make a critical appraisal. Internet-based empirical research produces vast and highly diverse quantities of (speaker-based or textual) data, presenting linguistic research with new opportunities and challenges. New procedures are required to make effective use of these resources

Directory of Open Access Journals

BOP Serials

Коллективные потоковые вычисления: реляционные модели и алгоритмы

Author: D. Ustalov A.
Д. Усталов А.
Publication venue: 'P.G. Demidov Yaroslavl State University'
Publication date: 20/04/2016
Field of study

Recently, microtask crowdsourcing has become a popular approach for addressing various data mining problems. Crowdsourcing workflows for approaching such problems are composed of several data processing stages which require consistent representation for making the work reproducible. This paper is devoted to the problem of reproducibility and formalization of the microtask crowdsourcing process. A computational model for microtask crowdsourcing based on an extended relational model and a dataflow computational model has been proposed. The proposed collaborative dataflow computational model is designed for processing the input data sources by executing annotation stages and automatic synchronization stages simultaneously. Data processing stages and connections between them are expressed by using collaborative computation workflows represented as loosely connected directed acyclic graphs. A synchronous algorithm for executing such workflows has been described. The computational model has been evaluated by applying it to two tasks from the computational linguistics field: concept lexicalization refining in electronic thesauri and establishing hierarchical relations between such concepts. The “Add–Remove–Confirm” procedure is designed for adding the missing lexemes to the concepts while removing the odd ones. The “Genus–Species–Match” procedure is designed for establishing “is-a” relations between the concepts provided with the corresponding word pairs. The experiments involving both volunteers from popular online social networks and paid workers from crowdsourcing marketplaces confirm applicability of these procedures for enhancing lexical resources. В последнее время краудсорсинг на основе выполения микрозадач получил широкое применение в области анализа неструктурированных данных. Разрабатываются специализированные методики, состоящие из множества этапов обработки исходных данных, требующих согласованности их представления для обеспечения воспроизводимости работы. Данная статья посвящена решению проблемы воспроизводимости и формализации процесса краудсорсинга микрозадачами. Предложена модель коллективных потоковых вычислений на основе расширенной реляционной модели и потоковой модели вычислений. Модель предназначена для обработки исходных данных в виде реляционных отношений путем параллельного выполнения этапов разметки микрозадачами и этапов автоматической синхронизации. Этапы обработки данных и связи между ними записываются с использованием схемы коллективных вычислений, представляющей собой слабо связный ориентированный ациклический граф. Описан синхронный алгоритм выполнения схем коллективных вычислений. Продемонстрированы приложения модели в области компьютерной лингвистики для уточнения лексикализации понятий в электронных тезаурусах и построения родо-видовых отношений между понятиями при помощи краудсорсинга. Процедура «добавить–удалить–подтвердить» позволяет внести в лексикализацию понятий недостающие лексемы и исключить посторонние. Процедура «род–вид–сопоставить» позволяет сформировать гипо-гиперонимические отношения между понятиями на основе соответствующих родо-видовых пар слов. Результаты экспериментов на материалах открытого электронного тезауруса русского языка подтверждают применимость разработанных процедур для развития лексических ресурсов. В экспериментах приняли участие как волонтеры из популярных социальных сетей, так и пользователи бирж краудсорсинга (за вознаграждение в форме микроплатежей).

Modeling and Analysis of Information Systems / Моделирование и анализ информационных систем (МАИС)

Annotation als Methode der digitalen Diskurslinguistik

Author: Bender Michael
Publication venue: Wissenschaftliches Netzwerk „Diskurse – digital: Theorien, Methoden, Fallstudien“
Publication date: 29/04/2020
Field of study

Annotation hat sich von einer analog geprägten Kulturtechnik, deren Geschichte sich bis in die Antike zurückverfolgen lässt, hin zu einer digitalen Praktik entwickelt, die in verschiedenen Spannungsfeldern verortet werden kann. Sie wird einerseits als alltagsweltliche Rezeptions-, aber auch Kommunikationsform vollzogen (und ist somit potenzieller Gegenstand von linguistischen Untersuchungen, z.B. Hashtag-Analysen), andererseits auch als wissenschaftliche Methode angewendet. Methodisch bildet sie eine Schnittstelle zwischen hermeneutischer und automatisiert-quantifizierender Analyse. Aus diskurslinguistischer Perspektive verbindet das Annotieren wissenschaftliche Metadiskursivität mit diskursiver Involviertheit. Diese Verortungen werden im vorliegenden Aufsatz diskutiert und anhand von konkreten Annotationsprojekten veranschaulicht. Im Mittelpunkt steht dabei das kollaborativ-diskursive Annotieren als wissenschaftliche Methode in der digitalen (Diskurs-)Linguistik

Universität Mannheim: MAJOURNALS

Composing Measures for Computing Text Similarity

Author: Bär Daniel
Gurevych Iryna
Zesch Torsten
Publication venue
Publication date: 26/01/2015
Field of study

We present a comprehensive study of computing similarity between texts. We start from the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. We thus define the notion of text similarity and distinguish it from related tasks such as textual entailment and near-duplicate detection. We then identify multiple text dimensions, i.e. characteristics inherent to texts that can be used to judge text similarity, for which we provide empirical evidence. We discuss state-of-the-art text similarity measures previously proposed in the literature, before continuing with a thorough discussion of common evaluation metrics and datasets. Based on the analysis, we devise an architecture which combines text similarity measures in a unified classification framework. We apply our system in two evaluation settings, for which it consistently outperforms prior work and competing systems: (a) an intrinsic evaluation in the context of the Semantic Textual Similarity Task as part of the Semantic Evaluation (SemEval) exercises, and (b) an extrinsic evaluation for the detection of text reuse. As a basis for future work, we introduce DKPro Similarity, an open source software package which streamlines the development of text similarity measures and complete experimental setups

TUbiblio

tuprints

Vokabular-globale lexikalische Substitution in einem Vektorraummodell

Author: Erkus Burak
Publication venue
Publication date: 01/01/2015
Field of study