176,263 research outputs found

    Community post-editing of machine-translated user-generated content

    Get PDF
    With the constant growth of user-generated content (UGC) online, the demand for quick translations of large volumes of texts increases. This is often met with a combination of machine translation (MT) and post-editing (PE). Despite extensive research in post-editing with professional translators or translation students, there are few PE studies with lay post-editors, such as domain experts. This thesis explores lay post-editing as a feasible solution for UGC in a technology support forum, machine translated from English into German. This context of lay post-editing in an online community prompts for a redefinition of quality. We adopt a mixed-methods approach, investigating PE quality quantitatively with an error annotation, a domain specialist evaluation and an end-user evaluation. We further explore post-editing behaviour, i.e. specific edits performed, from a qualitative perspective. With the involvement of community members, the need for a PE competence model becomes even more pressing. We investigate whether Gopferich’s translation competence (TC) model (2009) may serve as a basis for lay post-editing. Our quantitative data proves with statistical significance that lay post-editing is a feasible concept, producing variable output, however. On a qualitative level, post-editing is successful for short segments requiring ~35% post-editing effort. No post-editing patterns were detected for segments requiring more PE effort. Lastly, our data suggests that PE quality is largely independent of the profile characteristics measured. This thesis constitutes an important advance in lay post-editing and benchmarking the evaluation of its output, uncovering difficulties in pinpointing reasons for variance in the resulting quality

    Community-based post-editing of machine-translated content: monolingual vs. bilingual

    Get PDF
    We carried out a machine-translation postediting pilot study with users of an IT support forum community. For both language pairs (English to German, English to French), 4 native speakers for each language were recruited. They performed monolingual and bilingual postediting tasks on machine-translated forum content. The post-edited content was evaluated using human evaluation (fluency, comprehensibility, fidelity). We found that monolingual post-editing can lead to improved fluency and comprehensibility scores similar to those achieved through bilingual post-editing, while we found that fidelity improved considerably more for the bilingual set-up. Furthermore, the performance across post-editors varied greatly and it was found that some post-editors are able to produce better quality in a monolingual set-up than others

    Impact of automatic segmentation on the quality, productivity and self-reported post-editing effort of intralingual subtitles

    Get PDF
    This paper describes the evaluation methodology followed to measure the impact of using a machine learning algorithm to automatically segment intralingual subtitles. The segmentation quality, productivity and self-reported post-editing effort achieved with such approach are shown to improve those obtained by the technique based in counting characters, mainly employed for automatic subtitle segmentation currently. The corpus used to train and test the proposed automated segmentation method is also described and shared with the community, in order to foster further research in this are

    Traduction automatisée d'une oeuvre littéraire: une étude pilote

    No full text
    International audienceCurrent machine translation (MT ) techniques are continuously improving. In specific areas, post-editing (PE) allows to obtain high-quality translations relatively quickly. But is such a pipeline (MT+PE) usable to translate a lite- rary work (fiction, short story) ? This paper tries to bring a preliminary answer to this question. A short story by American writer Richard Powers, still not available in French, is automatically translated and post-edited and then revised by non- professional translators. The LIG post-editing platform allows to read and edit the short story suggesting (for the future) a community of readers-editors that continuously improve the translations of their favorite author. In addition to presen- ting experimental evaluation results of the pipeline MT+PE (MT system used, auomatic evaluation), we also discuss the quality of the translation output from the perspective of a panel of readers (who read the translated short story in French, and answered to a survey afterwards). Finally, some remarks of the official french translator of R. Powers, requested on this occasion, are given at the end of this article.Les techniques actuelles de traduction automatique (TA) permettent de produire des traductions dont la qualitĂ© ne cesse de croitre. Dans des domaines spĂ©cifiques, la post-Ă©dition (PE) de traductions automatiques permet, par ailleurs, d'obtenir des traductions de qualitĂ© relativement rapidement. Mais un tel pipeline (TA+PE) est il envisageable pour traduire une oeuvre littĂ©raire ? Cet article propose une Ă©bauche de rĂ©ponse Ă  cette question. Un essai de l'auteur amĂ©ricain Richard Powers, encore non disponible en français, est traduit automatiquement puis post-Ă©ditĂ© et rĂ©visĂ© par des traducteurs non-professionnels. La plateforme de post-Ă©dition du LIG utilisĂ©e permet de lire et Ă©diter l'oeuvre traduite en français continuellement, suggĂ©rant (pour le futur) une communautĂ© de lecteurs-rĂ©viseurs qui amĂ©liorent en continu les traductions de leur auteur favori. En plus de la prĂ©sentation des rĂ©sultats d'Ă©valuation expĂ©rimentale du pipeline TA+PE (systĂšme de TA utilisĂ©, scores automatiques), nous discutons Ă©galement la qualitĂ© de la traduction produite du point de vue d'un panel de lecteurs (ayant lu la traduction en français, puis rĂ©pondu Ă  une enquĂȘte). Enfin, quelques remarques du traducteur français de R. Powers, sollicitĂ© Ă  cette occasion, sont prĂ©sentĂ©es Ă  la fin de cet article

    Building Open Educational Resources from the Ground Up: South Africa's Free High School Science Texts

    Get PDF
    This paper presents a case study of the development of the South African project Free High School Science Texts (FHSST), an initiative to develop a free high school science text for all teachers and learners in South Africa. The goals of the case study were two-fold: to examine and analyze the practices associated with the successes and challenges encountered by FHSST; and to encourage a participatory, analytical process that will assist other open education projects in thinking about and sharing their practices, processes, and strategies. Beyond its implications for South African education, the FHSST project can serve as a model for peer production of open content, offering insights into planning and decision making around 1) recruiting volunteers; 2) sustaining their participation; 3) using technology to create effective workflow; 4) conducting hackathons; and 5) facilitating teacher trials. Findings from this study offers insights into overall approaches and goals that may prove instrumental across open education projects, serving as a reference for development of assessment tools and resources that may assist open education projects in tracking, sharing, and advancing their learnings and success

    Integrating N-best SMT outputs into a TM system

    Get PDF
    In this paper, we propose a novel frame- work to enrich Translation Memory (TM) systems with Statistical Machine Translation (SMT) outputs using ranking. In order to offer the human translators multiple choices, instead of only using the top SMT output and top TM hit, we merge the N-best output from the SMT system and the k-best hits with highest fuzzy match scores from the TM system. The merged list is then ranked according to the prospective post-editing effort and provided to the translators to aid their work. Experiments show that our ranked output achieve 0.8747 precision at top 1 and 0.8134 precision at top 5. Our framework facilitates a tight integration between SMT and TM, where full advantage is taken of TM while high quality SMT output is availed of to improve the productivity of human translators

    Creating, Doing, and Sustaining OER: Lessons from Six Open Educational Resource Projects

    Get PDF
    The development of free-to-use open educational resources (OER) has generated a dynamic field of widespread interest and study regarding methods for creating and sustaining OER. To help foster a thriving OER movement with potential for knowledge-sharing across program, organizational and national boundaries, the Institute for Knowledge Management in Education (ISKME), developed and conducted case study research programs in collaboration with six OER projects from around the world. Embodying a range of challenges and opportunities among a diverse set of OER projects, the case studies intended to track, analyze and share key developments in the creation, use and reuse of OER. The specific cases include: CurriculumNet, Curriki, Free High School Science Texts (FHSST), Training Commons, Stanford Encyclopedia of Philosophy (SEP), and Teachers' Domain

    Towards predicting post-editing productivity

    Get PDF
    Machine translation (MT) quality is generally measured via automatic metrics, producing scores that have no meaning for translators who are required to post-edit MT output or for project managers who have to plan and budget for transla- tion projects. This paper investigates correlations between two such automatic metrics (general text matcher and translation edit rate) and post-editing productivity. For the purposes of this paper, productivity is measured via processing speed and cognitive measures of effort using eye tracking as a tool. Processing speed, average fixation time and count are found to correlate well with the scores for groups of segments. Segments with high GTM and TER scores require substantially less time and cognitive effort than medium or low-scoring segments. Future research involving score thresholds and confidence estimation is suggested

    Identifying Unclear Questions in Community Question Answering Websites

    Get PDF
    Thousands of complex natural language questions are submitted to community question answering websites on a daily basis, rendering them as one of the most important information sources these days. However, oftentimes submitted questions are unclear and cannot be answered without further clarification questions by expert community members. This study is the first to investigate the complex task of classifying a question as clear or unclear, i.e., if it requires further clarification. We construct a novel dataset and propose a classification approach that is based on the notion of similar questions. This approach is compared to state-of-the-art text classification baselines. Our main finding is that the similar questions approach is a viable alternative that can be used as a stepping stone towards the development of supportive user interfaces for question formulation.Comment: Proceedings of the 41th European Conference on Information Retrieval (ECIR '19), 201

    FEMwiki: crowdsourcing semantic taxonomy and wiki input to domain experts while keeping editorial control: Mission Possible!

    Get PDF
    Highly specialized professional communities of practice (CoP) inevitably need to operate across geographically dispersed area - members frequently need to interact and share professional content. Crowdsourcing using wiki platforms provides a novel way for a professional community to share ideas and collaborate on content creation, curation, maintenance and sharing. This is the aim of the Field Epidemiological Manual wiki (FEMwiki) project enabling online collaborative content sharing and interaction for field epidemiologists around a growing training wiki resource. However, while user contributions are the driving force for content creation, any medical information resource needs to keep editorial control and quality assurance. This requirement is typically in conflict with community-driven Web 2.0 content creation. However, to maximize the opportunities for the network of epidemiologists actively editing the wiki content while keeping quality and editorial control, a novel structure was developed to encourage crowdsourcing – a support for dual versioning for each wiki page enabling maintenance of expertreviewed pages in parallel with user-updated versions, and a clear navigation between the related versions. Secondly, the training wiki content needs to be organized in a semantically-enhanced taxonomical navigation structure enabling domain experts to find information on a growing site easily. This also provides an ideal opportunity for crowdsourcing. We developed a user-editable collaborative interface crowdsourcing the taxonomy live maintenance to the community of field epidemiologists by embedding the taxonomy in a training wiki platform and generating the semantic navigation hierarchy on the fly. Launched in 2010, FEMwiki is a real world service supporting field epidemiologists in Europe and worldwide. The crowdsourcing success was evaluated by assessing the number and type of changes made by the professional network of epidemiologists over several months and demonstrated that crowdsourcing encourages user to edit existing and create new content and also leads to expansion of the domain taxonomy
    • 

    corecore