10 research outputs found

    An introduction to crowdsourcing for language and multimedia technology research

    Get PDF
    Language and multimedia technology research often relies on large manually constructed datasets for training or evaluation of algorithms and systems. Constructing these datasets is often expensive with significant challenges in terms of recruitment of personnel to carry out the work. Crowdsourcing methods using scalable pools of workers available on-demand offers a flexible means of rapid low-cost construction of many of these datasets to support existing research requirements and potentially promote new research initiatives that would otherwise not be possible

    Developing and validating a methodology for crowdsourcing L2 speech ratings in Amazon Mechanical Turk

    Get PDF
    Researchers have increasingly turned to Amazon Mechanical Turk (AMT) to crowdsource speech data, predominantly in English. Although AMT and similar platforms are well positioned to enhance the state of the art in L2 research, it is unclear if crowdsourced L2 speech ratings are reliable, particularly in languages other than English. The present study describes the development and deployment of an AMT task to crowdsource comprehensibility, fluency, and accentedness ratings for L2 Spanish speech samples. Fifty-four AMT workers who were native Spanish speakers from 11 countries participated in the ratings. Intraclass correlation coefficients were used to estimate group-level interrater reliability, and Rasch analyses were undertaken to examine individual differences in rater severity and fit. Excellent reliability was observed for the comprehensibility and fluency ratings, but indices were slightly lower for accentedness, leading to recommendations to improve the task for future data collection

    To What Do I Owe This Visit? The Drawbacks and Benefits of In-Role and Non-Role Intrusions

    Get PDF
    Workplace intrusions—unexpected encounters initiated by another person that disrupt an individual’s work—are generally characterized as negative experiences that deplete resources, increase role and information overload, and promote strain. In contrast, our research argues that intrusions may also provide benefits to the employees who are intruded upon. Taking a multistudy approach, we investigate how intrusions impact the extent to which employees engage in their own work—work engagement—and the extent to which they engage in work with others—collaboration. We also investigate the indirect effects of intrusions on employees’ task-focused and person-focused citizenship behavior through these mechanisms. We tested our predictions with a within-person experimental critical incident study (Study 1), an experiment (Study 2), and an experience-sampling methodology study with a sample of scientists involved in research and development (Study 3). Our research investigates the dynamics of various types of workplace intrusions, with results suggesting that intrusions may lead to beneficial employee outcomes in addition to the adverse outcomes previously demonstrated in the literature. Given the ubiquitous nature of intrusions in organizations, our findings have both theoretical and practical significance

    Using Amazon Mechanical Turk for Transcription of Non-Native Speech

    No full text
    This study investigates the use of Amazon Mechanical Turk for the transcription of nonnative speech. Multiple transcriptions were obtained from several distinct MTurk workers and were combined to produce merged transcriptions that had higher levels of agreement with a gold standard transcription than the individual transcriptions. Three different methods for merging transcriptions were compared across two types of responses (spontaneous and read-aloud). The results show that the merged MTurk transcriptions are as accurate as an individual expert transcriber for the readaloud responses, and are only slightly less accurate for the spontaneous responses.

    CLUB Working Papers in Linguistics Volume 6

    Get PDF
    Questo sesto volume della collana “CLUB Working Papers in Linguistics” raccoglie alcuni dei contributi presentati nel corso delle iniziative organizzate dal Circolo Linguistico dell’UniversitĂ  di Bologna nell’anno accademico 2020-2021. Risalgono al programma ufficiale i primi tre saggi, a firma rispettivamente di Elisa Corino (UniversitĂ  di Torino), Marina Benedetti (UniversitĂ  per Stranieri di Siena) e Andrea SansĂČ (UniversitĂ  dell’Insubria). I successivi tre contributi sono stati originariamente presentati in occasione dei seminari periodici del Circolo; si tratta dei lavori di Silvia Brambilla e Idea Basile (UniversitĂ  di Bologna e UniversitĂ  Roma “La Sapienza”), Marta Maffia e Massimo Pettorino (UniversitĂ  di Napoli “L’Orientale”) e Anna Dall’Acqua (UniversitĂ  di Bologna e Injenia S.r.L.). Il volume si chiude con un articolo di Ottavia Cepraga, vincitrice del premio ‘Una tesi in linguistica’ 2021

    Automated mood boards - Ontology-based semantic image retrieval

    Get PDF
    The main goal of this research is to support concept designers’ search for inspirational and meaningful images in developing mood boards. Finding the right images has become a well-known challenge as the amount of images stored and shared on the Internet and elsewhere keeps increasing steadily and rapidly. The development of image retrieval technologies, which collect, store and pre-process image information to return relevant images instantly in response to users’ needs, have achieved great progress in the last decade. However, the keyword-based content description and query processing techniques for Image Retrieval (IR) currently used have their limitations. Most of these techniques are adapted from the Information Retrieval research, and therefore provide limited capabilities to grasp and exploit conceptualisations due to their inability to handle ambiguity, synonymy, and semantic constraints. Conceptual search (i.e. searching by meaning rather than literal strings) aims to solve the limitations of the keyword-based models. Starting from this point, this thesis investigates the existing IR models, which are oriented to the exploitation of domain knowledge in support of semantic search capabilities, with a focus on the use of lexical ontologies to improve the semantic perspective. It introduces a technique for extracting semantic DNA (SDNA) from textual image annotations and constructing semantic image signatures. The semantic signatures are called semantic chromosomes; they contain semantic information related to the images. Central to the method of constructing semantic signatures is the concept disambiguation technique developed, which identifies the most relevant SDNA by measuring the semantic importance of each word/phrase in the image annotation. In addition, a conceptual model of an ontology-based system for generating visual mood boards is proposed. The proposed model, which is adapted from the Vector Space Model, exploits the use of semantic chromosomes in semantic indexing and assessing the semantic similarity of images within a collection

    Constitution d'un corpus oral deFLE : enjeux théoriques et méthodologiques

    Get PDF
    The need to design linguistic corpora to support research in linguistics has triggered the development of numerous studies exploring various approaches and methodologies regarding good practices for written corpus building. Fewer studies are available when it comes to spoken data and those that concern the interlanguage of learners are even rarer. The CIL project (Corpus Inter Langue), under completion at the University of Rennes2 and supervised by a research team specialising in the fields of linguistics and pedagogy (LIDILE), aims at building a large corpus of written and spoken productions in EFL and in FFL. This phd dissertation mainly focuses on the FFL (French as a Foreign Language) corpus (CIL-FLE).The first chapter of the thesis is dedicated to the study of oral speech as a linguistic object from both a historical and an epistemological perspective. The second chapter tackles the question of corpus linguistics generally speaking as well as the concept/ notion of corpus as a linguistic object. Regarding corpus linguistics, we will review and explore the diverse approaches and methods that are used so as to carry out research enquiries: introspection, elicitation or consultation of authentic data. The concept of corpus is then analysed according to/following a series of criteria which we will closely examine in order to propose a definition of the linguistic corpus. The third and last chapter will implement the former theoretical findings through the description of the CIL corpus design. Thus, corpus constituents, transcription and archiving protocols will be described in detail. We are particularly interested in the transcription protocol and we will insist on the difficulties encountered when attempting to transcribe learners ‘data. Finally, the CIL-FLE corpus, which contains approximately 105 000 words and was developed all along this phd, will be described.Les mĂ©thodologies de constitution de corpus linguistiques ont Ă©tĂ© amplement Ă©tudiĂ©es, mais sont moins abondantes quand il s’agit de corpus oraux ; ces mĂ©thodologies sont encore plus rares en ce qui concerne l’interlangue orale. Le projet CIL (Corpus Inter Langue), en cours de finalisation Ă  l’UniversitĂ© Rennes 2 et sous la supervision de l’équipe d’accueil LIDILE (EA 3874), vise Ă  la constitution d’un corpus de productions Ă©crites et orales d’apprenants en FLE et ALE. Cette thĂšse concerne le corpus oral de FLE du projet global (CIL-FLE). Partant du constat que l’intĂ©rĂȘt des linguistes pour la langue orale a systĂ©matiquement Ă©tĂ© en retard par rapport Ă  celui portĂ© Ă  la langue Ă©crite, nous nous intĂ©ressons dans un premier temps Ă  l’étude de l’oralitĂ© dans diffĂ©rents domaines de la linguistique d’un point de vue historique et Ă©pistĂ©mologique. Le second chapitre est consacrĂ© Ă  la linguistique de corpus de maniĂšre gĂ©nĂ©rale et au corpus en tant qu’objet linguistique en particulier. En ce qui concerne la linguistique de corpus, nous tentons de prĂ©senter les diffĂ©rentes mĂ©thodologies auxquelles les linguistes ont recours lorsqu’il s’agit de consulter des donnĂ©es : introspection, Ă©licitation ou consultation de donnĂ©es authentiques. Le concept de corpus est ensuite analysĂ© selon un ensemble de critĂšres dĂ©finitoires que nous Ă©tudions en dĂ©tail, afin de proposer une dĂ©finition du corpus linguistique. Le troisiĂšme et dernier chapitre est la mise en application des constats thĂ©oriques dans la constitution du corpus CIL-FLE : nous dĂ©taillons lesconstituants du corpus, les protocoles de collecte et d’archivage. C’est au protocole de transcription que nous nous intĂ©ressons en particulier, en insistant sur les difficultĂ©s de la transcription de l’interlangue. Le corpus CILFLE, qui reprĂ©sente environ 105000 mots, reprĂ©sente le fruit de ce travail et sera ainsi dĂ©taillĂ©
    corecore