5 research outputs found

    Webbasierte linguistische Forschung: Möglichkeiten und Begrenzungen beim Umgang mit Massendaten

    Get PDF
    Over the past ten to fifteen years, web-based methods of sociological research have emerged alongside classical methods such as interviews, observations and experiments, and linguistic research is increasingly relying upon them as well. This paper provides an overview of three web-based approaches, i.e. online surveys, crowd-sourcing and web-based corpus analyses. Examples from specific projects serve to reflect upon these methods, address their potential and limitations, and make a critical appraisal. Internet-based empirical research produces vast and highly diverse quantities of (speaker-based or textual) data, presenting linguistic research with new opportunities and challenges. New procedures are required to make effective use of these resources

    ERROR-RELATED NEGATIVITIES DURING SPELLING JUDGMENTS EXPOSE ORTHOGRAPHIC KNOWLEDGE

    Get PDF
    Understanding the role of phonological awareness in reading has been the focus of much psycholinguistic research, but less attention has been paid to understanding knowledge of the spellings that activate phonology. We carried out two experiments using ERPs to expose linguistic processes related to orthographic knowledge during judgments about the spellings of English words. In the first experiment, we confirmed that the error-related negativity (ERN) can be elicited during spelling decisions, and that its magnitude was correlated with behavioral measures of spelling knowledge. In the second experiment, we manipulated the phonology of misspelled stimuli and observed that ERN magnitudes were larger when misspelled words altered the phonology of their correctly spelled counterparts than when they preserved it. This finding has implications for the influence of internal phonological and orthographic representations on error monitoring during reading. In both experiments, ERN effect sizes were correlated with performance on a number of reading-related assessments, including offline spelling ability and vocabulary knowledge, affirming the interdependent nature of reading processes and suggesting the usefulness of ERNs for indexing knowledge of a wide range of reading-related skills

    Human-in-the-Loop Learning From Crowdsourcing and Social Media

    Get PDF
    Computational social studies using public social media data have become more and more popular because of the large amount of user-generated data available. The richness of social media data, coupled with noise and subjectivity, raise significant challenges for computationally studying social issues in a feasible and scalable manner. Machine learning problems are, as a result, often subjective or ambiguous when humans are involved. That is, humans solving the same problems might come to legitimate but completely different conclusions, based on their personal experiences and beliefs. When building supervised learning models, particularly when using crowdsourced training data, multiple annotations per data item are usually reduced to a single label representing ground truth. This inevitably hides a rich source of diversity and subjectivity of opinions about the labels. Label distribution learning associates for each data item a probability distribution over the labels for that item, thus it can preserve diversities of opinions, beliefs, etc. that conventional learning hides or ignores. We propose a humans-in-the-loop learning framework to model and study large volumes of unlabeled subjective social media data with less human effort. We study various annotation tasks given to crowdsourced annotators and methods for aggregating their contributions in a manner that preserves subjectivity and disagreement. We introduce a strategy for learning label distributions with only five-to-ten labels per item by aggregating human-annotated labels over multiple, semantically related data items. We conduct experiments using our learning framework on data related to two subjective social issues (work and employment, and suicide prevention) that touch many people worldwide. Our methods can be applied to a broad variety of problems, particularly social problems. Our experimental results suggest that specific label aggregation methods can help provide reliable representative semantics at the population level

    Statistical Analysis and Design of Crowdsourcing Applications

    Get PDF
    This thesis develops methods for the analysis and design of crowdsourced experiments and crowdsourced labeling tasks. Much of this document focuses on applications including running natural field experiments, estimating the number of objects in images and collecting labels for word sense disambiguation. Observed shortcomings of the crowdsourced experiments inspired the development of methodology for running more powerful experiments via matching on-the-fly. Using the label data to estimate response functions inspired work on non-parametric function estimation using Bayesian Additive Regression Trees (BART). This work then inspired extensions to BART such as incorporation of missing data as well as a user-friendly R package
    corecore