645 research outputs found

    Toward Crowdsourcing Translation Post-editing: A Thematic Systematic Review

    Get PDF
    Crowdsourcing Translation as a Post-Editing Method (CTPE) has emerged as a rapid and inexpensive method for translation and has drawn significant attention in recent years. This qualitative study aims to analyze and synthesize the approaches and aspects underpinning CTPE research and to identify its potential that is yet to be discovered. Through a systematic literature review focused on empirical papers, we examined the limited literature thematically and identified recurring central themes. Our review reveals that the topic of CTPE requires further attention and that its potential benefits are yet to be fully discovered. We discuss the eight core concepts that emerged during our analysis, including the purpose of CTPE, CTPE areas of application, ongoing CTPE processes, platform and crowd characteristics, motivation, CTPE domains, and future perspectives. By highlighting the strengths of CTPE, we conclude that it has the potential to be a highly effective translation method in various domains

    eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing

    Get PDF
    Training models for the automatic correction of machine-translated text usually relies on data consisting of (source, MT, human post- edit) triplets providing, for each source sentence, examples of translation errors with the corresponding corrections made by a human post-editor. Ideally, a large amount of data of this kind should allow the model to learn reliable correction patterns and effectively apply them at test stage on unseen (source, MT) pairs. In practice, however, their limited availability calls for solutions that also integrate in the training process other sources of knowledge. Along this direction, state-of-the-art results have been recently achieved by systems that, in addition to a limited amount of available training data, exploit artificial corpora that approximate elements of the "gold" training instances with automatic translations. Following this idea, we present eSCAPE, the largest freely-available Synthetic Corpus for Automatic Post-Editing released so far. eSCAPE consists of millions of entries in which the MT element of the training triplets has been obtained by translating the source side of publicly-available parallel corpora, and using the target side as an artificial human post-edit. Translations are obtained both with phrase-based and neural models. For each MT paradigm, eSCAPE contains 7.2 million triplets for English-German and 3.3 millions for English-Italian, resulting in a total of 14,4 and 6,6 million instances respectively. The usefulness of eSCAPE is proved through experiments in a general-domain scenario, the most challenging one for automatic post-editing. For both language directions, the models trained on our artificial data always improve MT quality with statistically significant gains. The current version of eSCAPE can be freely downloaded from: http://hltshare.fbk.eu/QT21/eSCAPE.html.Comment: Accepted at LREC 201

    Findings of the 2017 Conference on Machine Translation

    Get PDF
    This paper presents the results of the WMT17 shared tasks, which included three machine translation (MT) tasks (news, biomedical, and multimodal), two evaluation tasks (metrics and run-time estimation of MT quality), an automatic post-editing task, a neural MT training task, and a bandit learning task

    Applying crowdsourced translation in auto-translate feature of @people Instagram account

    Get PDF
    ENGLISH: For the past few years, the crowdsourcing method has been increasingly applied in various fields due to its convenience and cheapness. One application of this method is Crowdsourced Translation, which is a new method in which translations are generated through the simultaneous efforts of a large number of translators. One example of using this method is the automatic translation feature on social media such as Facebook, Twitter and Instagram. Despite its advantages, the translation results from this method are considered to be less accurate. In this study, the researcher aims to analyze the results of automatic translation on Instagram using the Crowdsourced Translation method. The selected data are 20 posts from the @people account, which is a big magazine company from America that reports current events every day. This study uses a descriptive qualitative method in the form of text studies using data in the form of words, phrases, clauses, and sentences taken from the captions of the 20 Instagram posts. The researcher uses the theory of translation from David Vilar et al. (2006) about the translation error and Peter Newmark (1988) which explains that there are 4 levels that must be considered in translating a text, namely the textual, referential, cohesive, and natural levels. Each caption is translated into Indonesian which is the mother tongue of the researcher. After the analysis was carried out, it was discovered that out of the 20 posts from @people analyzed, 119 errors were found in total. The details are: 26 errors in missing words, 10 errors in word order, 81 errors in incorrect words, 2 errors in unknown word, and 0 errors in punctuation. Based on Newmark’s theory, out of 119, 73 errors occurred because of the difficulties faced at the textual level, 14 errors at the referential level, 10 errors at the cohesive level, and 22 errors at the level of naturalness. The results of this study can prevent social media users from consuming fake news and hoaxes which can cause conflict in cyberspace. INDONESIA: Beberapa tahun terakhir, metode crowdsourcing semakin banyak diterapkan di berbagai bidang karena kemudahan dan murahnya.Salah satu aplikasi dari metode ini adalah Terjemahan Crowdsourced, yang merupakan metode baru di mana terjemahan dihasilkan melalui upaya simultan dari sejumlah besar penerjemah.Salah satu contoh penggunaan metode ini adalah fitur terjemahan otomatis di media sosial seperti Facebook, Twitter, dan Instagram.Meski memiliki kelebihan, hasil terjemahan dari metode ini dinilai kurang akurat.Pada penelitian ini, peneliti bertujuan untuk menganalisis hasil terjemahan otomatis pada Instagram dengan menggunakan metode Crowdsourced Translation.Data yang dipilih adalah 20 postingan dari akun @people yang merupakan perusahaan majalah besar asal Amerika yang melaporkan kejadian terkini setiap harinya.Penelitian ini menggunakan metode kualitatif deskriptif berupa kajian teks dengan menggunakan data berupa kata, frasa, klausa, dan kalimat yang diambil dari caption 20 postingan Instagram. Peneliti menggunakan teori penerjemahan dari David Vilar et al. (2006) tentang kesalahan penerjemahan dan Peter Newmark (1988) yang menjelaskan bahwa ada 4 tingkatan yang harus diperhatikan dalam menerjemahkan sebuah teks, yaitu tingkatan tekstual, referensial, kohesif, dan alami. Setiap caption diterjemahkan ke dalam bahasa Indonesia yang merupakan bahasa ibu peneliti. Setelah dilakukan analisis, diketahui bahwa dari 20 postingan @people yang dianalisis, total ditemukan 119 kesalahan. Perinciannya adalah: 26 kesalahan kata yang hilang, 10 kesalahan urutan kata, 81 kesalahan kata yang salah, 2 kesalahan kata yang tidak dikenal, dan 0 kesalahan tanda baca. Berdasarkan teori Newmark, dari 119, 73 kesalahan terjadi karena kesulitan yang dihadapi pada tingkat tekstual, 14 kesalahan pada tingkat referensial, 10 kesalahan pada tingkat kohesif, dan 22 kesalahan pada tingkat kealamian. Hasil penelitian ini dapat mencegah pengguna media sosial mengkonsumsi berita bohong dan hoax yang dapat menimbulkan konflik di dunia maya. ARABIC: على مدى السنوات القليلة الماضية ، تم تطبيق طريقة التعهيد الجماعي بشكل متزايد في مختلف المجالات بسبب ملاءمتها ورخص ثمنها. أحد تطبيقات هذه الطريقة هو الترجمة الجماعية ، وهي طريقة جديدة يتم فيها إنشاء الترجمات من خلال الجهود المتزامنة لعدد كبير من المترجمين. أحد الأمثلة على استخدام هذه الطريقة هو ميزة الترجمة التلقائية على وسائل التواصل الاجتماعي مثل Facebook و Twitter و Instagram. على الرغم من مزاياها ، إلا أن نتائج الترجمة من هذه الطريقة تعتبر أقل دقة. يهدف الباحث في هذه الدراسة إلى تحليل نتائج الترجمة الآلية على Instagram باستخدام طريقة Crowdsourced Translation. البيانات المختارة هي 20 مشاركة من حسابpeople ، وهي شركة مجلات كبيرة من أمريكا تقدم تقارير عن الأحداث الجارية كل يوم. تستخدم هذه الدراسة المنهج النوعي الوصفي في شكل دراسات نصية باستخدام البيانات في شكل كلمات وعبارات وجمل وجمل مأخوذة من تعليق منشورات إنستغرام العشرين. يستخدم الباحث نظرية الترجمة من David Vilar et al. (2006) حول خطأ الترجمة و Peter Newmark (1988) الذي يوضح أن هناك 4 مستويات يجب مراعاتها في ترجمة النص ، وهي المستويات النصية والمرجعية والمتماسكة والطبيعية. تمت ترجمة كل تسمية توضيحية إلى اللغة الإندونيسية وهي اللغة الأم للباحث. بعد إجراء التحليل ، تم اكتشاف أنه من بين 20 مشاركة منpeople التي تم تحليلها ، تم العثور على 119 خطأ في المجموع. التفاصيل هي: 26 خطأ في الكلمات المفقودة ، و 10 أخطاء في ترتيب الكلمات ، و 81 خطأ في الكلمات غير الصحيحة ، وخطأان في الكلمات غير المعروفة ، و 0 أخطاء في علامات الترقيم. استنادًا إلى نظرية نيومارك ، من أصل 119 ، حدث 73 خطأ بسبب الصعوبات التي تمت مواجهتها على المستوى النصي ، و 14 خطأ على المستوى المرجعي ، و 10 أخطاء على مستوى التماسك ، و 22 خطأ على مستوى الطبيعة. يمكن أن تمنع نتائج هذه الدراسة مستخدمي وسائل التواصل الاجتماعي من استهلاك الأخبار المزيفة والخداع التي يمكن أن تسبب صراعًا في الفضاء الإلكترون

    Worker-Job Recommendation for Mixed Crowdsourcing Systems: Algorithms, Models, Metrics and Service-Oriented Architecture

    Get PDF
    Crowdsourcing is used as model to distribute work over the Internet via an open call to anonymous human workers, who opt to take up work offerings sometimes for some small compensation. Increasingly, crowdsourcing systems are integrated into workflows to provide human computation capabilities. These workflows consist of machine-based workers that work harmoniously on different phases of a task with their human counterparts. This body of work addresses workflows where machines and human workers have the capacity to fulfill the requirements for same tasks. To maximize performance through the delegation of work to the most competent worker, this work outlines a collaborative filtering based approach with a bottom up evaluation based on workers' performance history and their inferred skillsets. Within the model, there are several algorithms, formulae and evaluative metrics. The work also introduces the notion of an Open Push-Pull model; a paradigm that maximizes on the services and strengths of the open call model, while seeking to address its weaknesses such as platform lock-in that affects access to jobs and availability of the worker pool. The work outlines the model in terms of a service-oriented architecture (SOA). It provides a supporting conceptual model for the architecture and an operational model that facilitates both human and machine workers. It also defines evaluative metrics for understanding the true capabilities of the worker pool. Techniques presented in this work can be used to expand the potential worker pool to compete for tasks through the incorporation of machine-oriented workers via virtualization and other electronic services, and human workers via existing crowds. Results in this work articulate the flexibility of our approach to support both human and machine workers within a competitive model while supporting tasks spanning multiple domains and problem spaces. It addresses the inefficiencies of current top-down approaches in worker-job recommendation through use of a bottom-up approach which adapts to dynamic and rapidly changing data. The work contrasts the shortcomings of top-down approaches' dependency on professed profiles which can be under-represented, over-represented or falsified in other ways with evaluative metrics that can be used for the individual and collective assessment of workers within a labor pool.Ph.D., Computer Science -- Drexel University, 201

    Understanding and improving subjective measures in human-computer interaction

    Get PDF
    In Human-Computer Interaction (HCI), research has shifted from a focus on usability and performance towards the holistic notion of User Experience (UX). Research into UX places special emphasis on concepts from psychology, such as emotion, trust, and motivation. Under this paradigm, elaborate methods to capture the richness and diversity of subjective experiences are needed. Although psychology offers a long-standing tradition of developing self-reported scales, it is currently undergoing radical changes in research and reporting practice. Hence, UX research is facing several challenges, such as the widespread use of ad-hoc questionnaires with unknown or unsatisfactory psychometric properties, or a lack of replication and transparency. Therefore, this thesis contributes to several gaps in the research by developing and validating self-reported scales in the domain of user motivation (manuscript 1), perceived user interface language quality (manuscript 2), and user trust (manuscript 3). Furthermore, issues of online research and practical considerations to ensure data quality are empirically examined (manuscript 4). Overall, this thesis provides well-documented templates for scale development, and may help improve scientific rigor in HCI
    corecore