Search CORE

5 research outputs found

Webbasierte linguistische Forschung: Möglichkeiten und Begrenzungen beim Umgang mit Massendaten

Author: Biemann Chris
Juska-Bacher Britta
Quasthoff Uwe
Publication venue: University of Bern
Publication date: 01/12/2013
Field of study

Over the past ten to fifteen years, web-based methods of sociological research have emerged alongside classical methods such as interviews, observations and experiments, and linguistic research is increasingly relying upon them as well. This paper provides an overview of three web-based approaches, i.e. online surveys, crowd-sourcing and web-based corpus analyses. Examples from specific projects serve to reflect upon these methods, address their potential and limitations, and make a critical appraisal. Internet-based empirical research produces vast and highly diverse quantities of (speaker-based or textual) data, presenting linguistic research with new opportunities and challenges. New procedures are required to make effective use of these resources

Directory of Open Access Journals

BOP Serials

ERROR-RELATED NEGATIVITIES DURING SPELLING JUDGMENTS EXPOSE ORTHOGRAPHIC KNOWLEDGE

Author: Harris Lindsay Nicole
Publication venue
Publication date: 26/01/2012
Field of study

Understanding the role of phonological awareness in reading has been the focus of much psycholinguistic research, but less attention has been paid to understanding knowledge of the spellings that activate phonology. We carried out two experiments using ERPs to expose linguistic processes related to orthographic knowledge during judgments about the spellings of English words. In the first experiment, we confirmed that the error-related negativity (ERN) can be elicited during spelling decisions, and that its magnitude was correlated with behavioral measures of spelling knowledge. In the second experiment, we manipulated the phonology of misspelled stimuli and observed that ERN magnitudes were larger when misspelled words altered the phonology of their correctly spelled counterparts than when they preserved it. This finding has implications for the influence of internal phonological and orthographic representations on error monitoring during reading. In both experiments, ERN effect sizes were correlated with performance on a number of reading-related assessments, including offline spelling ability and vocabulary knowledge, affirming the interdependent nature of reading processes and suggesting the usefulness of ERNs for indexing knowledge of a wide range of reading-related skills

D-Scholarship@Pitt

Human-in-the-Loop Learning From Crowdsourcing and Social Media

Author: Liu Tong
Publication venue: RIT Scholar Works
Publication date: 01/06/2020
Field of study

Computational social studies using public social media data have become more and more popular because of the large amount of user-generated data available. The richness of social media data, coupled with noise and subjectivity, raise significant challenges for computationally studying social issues in a feasible and scalable manner. Machine learning problems are, as a result, often subjective or ambiguous when humans are involved. That is, humans solving the same problems might come to legitimate but completely different conclusions, based on their personal experiences and beliefs. When building supervised learning models, particularly when using crowdsourced training data, multiple annotations per data item are usually reduced to a single label representing ground truth. This inevitably hides a rich source of diversity and subjectivity of opinions about the labels. Label distribution learning associates for each data item a probability distribution over the labels for that item, thus it can preserve diversities of opinions, beliefs, etc. that conventional learning hides or ignores. We propose a humans-in-the-loop learning framework to model and study large volumes of unlabeled subjective social media data with less human effort. We study various annotation tasks given to crowdsourced annotators and methods for aggregating their contributions in a manner that preserves subjectivity and disagreement. We introduce a strategy for learning label distributions with only five-to-ten labels per item by aggregating human-annotated labels over multiple, semantically related data items. We conduct experiments using our learning framework on data related to two subjective social issues (work and employment, and suicide prevention) that touch many people worldwide. Our methods can be applied to a broad variety of problems, particularly social problems. Our experimental results suggest that specific label aggregation methods can help provide reliable representative semantics at the population level

RIT Scholar Works

Statistical Analysis and Design of Crowdsourcing Applications

Author: Kapelner Adam
Publication venue: ScholarlyCommons
Publication date: 01/01/2014
Field of study

This thesis develops methods for the analysis and design of crowdsourced experiments and crowdsourced labeling tasks. Much of this document focuses on applications including running natural field experiments, estimating the number of objects in images and collecting labels for word sense disambiguation. Observed shortcomings of the crowdsourced experiments inspired the development of methodology for running more powerful experiments via matching on-the-fly. Using the label data to estimate response functions inspired work on non-parametric function estimation using Bayesian Additive Regression Trees (BART). This work then inspired extensions to BART such as incorporation of missing data as well as a user-friendly R package

ScholarlyCommons@Penn