221 research outputs found
Toxicity in Multilingual Machine Translation at Scale
Machine Translation systems can produce different types of errors, some of
which are characterized as critical or catastrophic due to the specific
negative impact that they can have on users. In this paper we focus on one type
of critical error: added toxicity. We evaluate and analyze added toxicity when
translating a large evaluation dataset (HOLISTICBIAS, over 472k sentences,
covering 13 demographic axes) from English into 164 languages. An automatic
toxicity evaluation shows that added toxicity across languages varies from 0%
to 5%. The output languages with the most added toxicity tend to be
low-resource ones, and the demographic axes with the most added toxicity
include sexual orientation, gender and sex, and ability. We also perform human
evaluation on a subset of 8 translation directions, confirming the prevalence
of true added toxicity. We use a measurement of the amount of source
contribution to the translation, where a low source contribution implies
hallucination, to interpret what causes toxicity. Making use of the input
attributions allows us to explain toxicity, because the source contributions
significantly correlate with toxicity for 84% of languages studied. Given our
findings, our recommendations to reduce added toxicity are to curate training
data to avoid mistranslations, mitigate hallucination and check unstable
translations
Preventing Violence Against Women: Emerging Practices of Canadian Activism through Social Media
The Canadian womenâs movement has seen a recent surge in attention and participation. In a rising cycle of contention, broad collective action campaigns can appear as a single social movement. This research uses a comparative case study to examine three cases of varying scale through the causal mechanisms of signaling, innovation, and campaigns/coalitions to examine how social media contribute to the emerging repertoire of contention. The three cases under investigation are a localized case: Safe Stampede, a national case: Missing and Murdered Indigenous Women, and a transnational case: Womenâs March on Washington. Results show that social media are not only integral to collective action but also influence the nature of many emerging practices. Organizers utilize personalized participation and localization, favoring tactics that improve visual imagery for social media posts. In outreach efforts, they rely on the scale of social media to connect with influencers, traditional media, and conscience constituents through affordances such as hashtags and addressivity markers. The affordances of social media encourage tactics designed to generate viral content and to leverage shame as a motivator for change. A sense of duty spurs organizers and participants to greater action beyond what might be termed clicktivism. Whether a campaign targets only the local community or a global one, organizers seek to localize their message for regional supporters. In all cases, ideological differences must be resolved in order to maintain solidarity and prevent damaging divides. As social movements progress, they tend to follow predictable patterns toward institutionalization, especially as a cycle of contention begins to recede
Translingual Pedagogy in the First-Year Composition Classroom: an Examination of Divergent Student Uptake
117 PagesWhat impact can translingual pedagogy have in introductory composition classes? This thesis describes how first-year composition (FYC) students at a Primarily White Institution (PWI) participated in complicated, divergent uptake as they learned about terms such as translingualism, translation, and Global Englishes in the Fall of 2022. The research for this project was guided by the following questions. 1. In what ways were students understanding, resisting, or engaging with ideas such as Global Englishes, translingualism, and translation? When did moments of âdiscursive turbulenceâ occur in student uptake, and what was the result of that turbulence? (Ware and Zilles) 2. How did students understand translation and all communication as fluid, culturally and rhetorically situated, and beyond alphanumeric text, including semiotic and multimodal resources? How did students understand the concept of translingualism, and themselves as translingual communicators in the world? 3. How were students of different linguistic backgrounds developing linguistic empathy for the speakers of marginalized languages that do not fit into the monolingual ânormâ in the U.S. academic system? 4. How can FYC instructors make translingual pedagogy accessible and comprehensible for traditionally considered âmonolingualâ English-speaking students? What impact does translingual pedagogy have on our âmultilingualâ students? Chapter 1 of this thesis reviews literature that discusses the importance of translingual pedagogy in language arts classrooms. Then, this chapter talks about translation research and especially focuses on Laura Gonzalesâ concept of âA Revised Rhetoric of Translationâ in her book, Sites of Translation. Chapter 2 considers the historical relationship between the linguistics, TESOL, and composition fields. Then, the chapter dives into the research questions that guided this project and the different methodologies used to conduct this project. More specifically, I describe a narrative theory and activity theory approach that I adopted as I collected student data. The chapter also gives an overview of the class I taught, and the components students had to complete in each unit. Chapter 3 gives an overview of my process as I collected data from student writing. In this chapter, I give my overall impressions of how students learned important class terms, such as translingualism, translation, and Global Englishes. Then, I chose 5 specific students to research whose uptakes were diverging and unique as they processed class terms in their writing. For each of the five students, I create a P-CHAT map and a collage to visually represent the studentsâ writing. I provide an analysis of each image and a description of the insights I gained about student uptake as I created and processed these visual representations of the data. Chapter four offers my interpretation of the data I collected. This chapter explores Ware and Zillesâs concept of âdiscursive turbulenceâ and how turbulence frames the stops and starts students experience in their learning. Then, I draw conclusions from the data and offer pedagogical recommendations for instructors seeking to incorporate translingual pedagogy in their FYC classroom. I also discuss the limitations of the project and suggestions for future research regarding divergent uptake and discursive turbulence.
KEYWORDS: Pedagogical Cultural-Historical Activity theory; discursive turbulence; narrative theory; translingualism; pedagogy; translation; Global Englishes; A Revised Rhetoric of Translatio
Recommended from our members
From High School to Post-Secondary Life--Exploring the College Transition Experiences of Bilingual Latinx Youth
The current neoliberal education system often positions bilingual youth as deficient or lacking in skills. The discourse from some academic research paradigms tends to also take up this deficit orientation, focusing on the issues and needs of Latinx bilingual students, or the pedagogical strategies to âclose achievement gaps.â
The NYC Department of Education has attempted to address gaps in achievement by offering increased access to college and career readiness programs, positioning access as synonymous to equity. However, access alone does not lead to equity when the systems and norms that prioritize assimilation to the dominant white culture are not being challenged; moreover, increased access will not lead to equity if the voices and experiences of marginalized youth experiencing the transition to college are not amplified.
This project will add to the growing body of scholarly work that aims to subvert deficit discourse around bilingual students by inviting them to author their own stories about their experiences in the transition to college. These narratives bring up various aspects of the transition to college: how first-generation Latinx bilingual youth navigate cultural and linguistic expectations in college, how they navigate the white, western, and patriarchal institutional norms of the college going process, sources of support in their educational journeys, what factors influenced their college choices, and how they have experienced college in the context of a global pandemic.
This research recognizes bilingual studentsâ experiences and knowledges as truths, positioning them as knowledge creators. The purpose of this study is to document and explore how first-generation Latinx/ bilingual students experience the transition from high school to college, and how they navigate and question spaces in high school and college fraught with linguistic and cultural erasure. Employing Chicana Feminist epistemologies and post-positive realist perspectives of identity, this study will use plĂĄticas to better understand the experiences of Latinx students as they transition to college, what educators can do to support their transition, and to think about how educators can work alongside Latinx students to fight erasure
Scholarly Communication Librarianship and Open Knowledge
The intersection of scholarly communication librarianship and open education offers a unique opportunity to expand knowledge of scholarly communication topics in both education and practice. Open resources can address the gap in teaching timely and critical scholarly communication topicsâcopyright in teaching and research environments, academic publishing, emerging modes of scholarship, impact measurementâwhile increasing access to resources and equitable participation in education and scholarly communication.
Scholarly Communication Librarianship and Open Knowledge is an open textbook and practitionerâs guide that collects theory, practice, and case studies from nearly 80 experts in scholarly communication and open education. Divided into three parts:
*What is Scholarly Communication?
*Scholarly Communication and Open Culture
*Voices from the Field: Perspectives, Intersections, and Case Studies
The book delves into the economic, social, policy, and legal aspects of scholarly communication as well as open access, open data, open education, and open science and infrastructure. Practitioners provide insight into the relationship between university presses and academic libraries, defining collection development as operational scholarly communication, and promotion and tenure and the challenge for open access.
Scholarly Communication Librarianship and Open Knowledge is a thorough guide meant to increase instruction on scholarly communication and open education issues and practices so library workers can continue to meet the changing needs of students and faculty. It is also a political statement about the future to which we aspire and a challenge to the industrial, commercial, capitalistic tendencies encroaching on higher education. Students, readers, educators, and adaptors of this resource can find and embrace these themes throughout the text and embody them in their work
Towards Reliable and Inclusive Natural Language Generation
Natural language generation (NLG) is an important subfield of natural language processing (NLP) that produces natural language output. Despite notable advancements made by large-scale pre-trained language models in NLG, there remain several unresolved challenges. This thesis aims to enhance NLG from two significant aspects: reliability and inclusiveness. For reliability, on the one hand, we introduce novel training objectives that improve the alignment of language generation models with desired model behaviors. To improve the answerability of model-generated questions, we use a question answering model to provide additional rewards to a question generation model, encouraging the production of more answerable questions. In addition, we propose to train language models with a mixture of forward and reverse cross-entropies, demonstrating that the resulting models yield better generated text without complex decoding strategies. On the other hand, we propose novel evaluation methods to assess the performance of NLG models accurately and comprehensively. By combining human and automatic evaluations, we strike a balance between reliability and reproducibility. We delve into the unexplored issue of unfaithfulness in extractive summaries and conclude that extractive summarization does not guarantee faithfulness. For inclusiveness, we extend the coverage of NLG techniques to low-resource or endangered languages. We develop the first machine translation system for supporting translation between Cherokee, an endangered Native American language, and English, and we propose a roadmap for utilizing NLP to support language revitalization efforts. Additionally, we investigate the underrepresentation of low-resource languages during multilingual tokenization, a crucial data preprocessing step in training multilingual NLG models, and we present best practices for training multilingual tokenizers. Overall, this thesis works towards enhancing the trustworthiness of NLG models in practice and facilitating support for a more diverse range of languages worldwide.Doctor of Philosoph
Hegemony of BBC, CNN and Al Jazeeraâs Framing of Protests in China: The Cases of Wukan and Hong Kong
This study focuses on how the global news media report on protests in China. It contributes an original analysis of the global news media coverage of protests in China from both the theoretical and empirical perspectives. The research is based on the purposive sampling of the BBC, CNN and Al Jazeera English, in order to discuss how international news media outlets report on protests in mainland China and Hong Kong, especially given that they are non-Western contexts. Samples from Wukan and Hong Kong are evaluated by using both quantitative and qualitative methods, including qualitative analysis software (NVivo), framing analysis and critical discourse analysis to determine the ways in which they are represented by the selected news outlets. The main findings have revealed hegemony in the news representations of protests in China, which includes biases, domestication, and geopoliticised news angles. The analysis in the Wukan case showed that the reports offered a limited voice to the Chinese side, while carrying frames of bias from Western journalists. The analysis of the selected global news reports unmasked ideological presuppositions about Chinese political reform, including the perception that the Chinese regime was monolithic, and that most Chinese protesters craved Western democracy. On the contrary, the evidence from the Al Jazeera documentary analysed in this study illustrated a Chinese government that is loosely structured, and that the protesters were more concerned about the land issue than they were about political ideology. As for the Hong Kong case, the results indicated that there were traces of domestication and the geo-politics of news regarding HK protests in both CNN and the BBC in relation to several topics, whereas Al Jazeera had a slightly different approach to reporting the protests: The BBC and CNN tended to relate protests with domestic politics and topics, while AJE balanced pro-Britain and pro-America discourse among the protesters. The study also discussed Orientalism, which is still highly relevant to Hong Kongersâ identity issues, and how Western media report on China today.
The research findings add to work by other scholars in media and journalism that has questioned the partiality of leading international or global (Western) media, particularly when it comes to reporting on non-Western and less developed countries. The research adds original evidence and insights to debates on the hegemony of international news coverage of protests, in the context of the Global South. It should be noted that leading media from the dominant Global North, in this case, excluding Al Jazeera, project the interests of the developed countries while voices from the Global South are less heard
Unsupervised structure induction and multimodal grounding
Structured representations build upon symbolic abstraction (e.g., words in natural language and visual concepts in natural images), offer a principled way of encoding our perceptions about the physical world, and enable the human-like generalization of machine learning systems. The predominant paradigm for learning structured representations of the observed data has been supervised learning, but it is limited in several respects. First, supervised learning is challenging given the scarcity of labeled data. Second, conventional approaches to structured prediction have been relying on a single modality (e.g., either images or text), ignoring the learning cues that may have been specified in and can be readily obtained from other modalities of data. In this thesis, we investigate unsupervised approaches to structure induction in a multimodal setting.
Unsupervised learning is inherently difficult in general, let alone inducing complex and discrete structures from data without direct supervision. By considering the multimodal setting, we leverage the alignments between different data modalities (e.g., text, audio, and images) to facilitate the learning of structure-induction models, e.g., knowing that the individual words in ``a white pigeon'' always appear with the same visual object, a language parser is likely to treat them as a whole (i.e., phrase). The multimodal learning setting is practically viable because multimodal alignments are generally abundant. For example, they can be found in online posts such as news and tweets that usually contain images and associated text, and in (YouTube) videos, where audio, scripts, and scenes are synchronized and grounded in each other.
We develop structure-induction models, which are capable of exploiting bimodal image-text alignments, for two modalities: (1) for natural language, we consider unsupervised syntactic parsing with phrase-structure grammars and regularize the parser by using visual image groundings; and (2) for visual images, we induce scene graph representations by mapping arguments and predicates in the text to their visual counterparts (i.e., visual objects and relations among them) in an unsupervised manner. While useful, crossmodal alignments are not always abundantly available on the web, e.g., the alignments between non-speech audio and text. We tackle the challenge by sharing the visual modality between image-text alignment and image-audio alignment; images function as a pivot and connect audio and text. The contributions of this thesis span from model development to data collection. We demonstrated the feasibility of applying multimodal learning techniques to unsupervised structure induction and multimodal alignment collection. Our work opens up new avenues for multimodal and unsupervised structured representation learning
- âŠ