84 research outputs found

    The Ordinariness of Code-meshing in the Indonesian Linguistic Landscapes

    Get PDF
    Code-meshing as a strategic linguistic practice has been considered a rarity in a high-stake writing practice (e.g. academic writing). Studies in composition scholarship have demonstrated that such a practice needs arduous intellectual endeavors and extra rhetorical efforts to be realized. That is, code-meshing requires an exceptionally high linguistic adeptness, language awareness, and rhetorical sensitivity in order to be performed effectively. As such, the products of code-meshing in scholarly writing are often seen as a marked form of textual realization. This article shows that while strenuous struggles are needed to practice code-meshing in academic writing (i.e. high-stake translingual practice), such a practice can be performed as mundane, ordinary, unremarkable, and relaxed activities (i.e. low-stake translingual practice) in linguistic landscapes or signage displayed in public places. Illustrations of the code-meshed texts in the latter case will be provided, and then examined to account for their ordinariness.  In light of the vibrant low-stake translingual practice, I shall develop an important notion of grassroots performativity to suggest the everydayness of quotidian language practices enacted by multilingual language users in their own community.

    Metalinguistic tactics in the Hong Kong protest movement

    Get PDF
    This paper explores the metalinguistic tactics used by Hong Kong protesters in 2014 and 2019 and how they reflected and exploited a range of dominant ideologies about language in the city. These tactics are considered both in terms of their rhetorical utility in the “message war” between protesters and authorities, and their significance in the broader sociolinguistic context of Hong Kong. The analysis reveals how such tactics entailed both opportunities and risks, allowing protesters to create shareable discursive artifacts that spread quickly over social media and to promote in-group solidarity and distrust of their political opponents, but also limiting their ability to broaden the appeal of their messages to certain segments of the population and implicating them in upholding language ideologies that promote exclusion and marginalization

    Cross-language Plagiarism Detection over Continuous-space- and Knowledge Graph-based Representations of Language

    Full text link
    This is the author’s version of a work that was accepted for publication in Knowledge-Based Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Knowledge-Based Systems 111 (2016) 87–99. DOI 10.1016/j.knosys.2016.08.004.Cross-language (CL) plagiarism detection aims at detecting plagiarised fragments of text among documents in different languages. The main research question of this work is on whether knowledge graph representations and continuous space representations can complement to each other and improve the state-of-the-art performance in CL plagiarism detection methods. In this sense, we propose and evaluate hybrid models to assess the semantic similarity of two segments of text in different languages. The proposed hybrid models combine knowledge graph representations with continuous space representations aiming at exploiting their complementarity in capturing different aspects of cross-lingual similarity. We also present the continuous word alignment-based similarity analysis, a new model to estimate similarity between text fragments. We compare the aforementioned approaches with several state-of-the-art models in the task of CL plagiarism detection and study their performance in detecting different length and obfuscation types of plagiarism cases. We conduct experiments over Spanish-English and GermanEnglish datasets. Experimental results show that continuous representations allow the continuous word alignment-based similarity analysis model to obtain competitive results and the knowledge-based document similarity model to outperform the state-of-the-art in CL plagiarism detection. © 2016 Elsevier B.V. All rights reserved.This research has been carried out in framework of the FPI-UPV pre-doctoral grant (No de registro - 3505) awarded to Parth Gupta and in the framework of the national projects DIANA-APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01), and SomEMBED: SOcial Media language understanding - EMBEDing contexts (TIN2015-71147-C2-1-P). We would like to thank Martin Potthast, Daniel Ortiz-Martinez, and Luis A. Leiva for their support and comments during this research.Franco-Salvador, M.; Gupta, PA.; Rosso, P.; Banchs, R. (2016). Cross-language Plagiarism Detection over Continuous-space- and Knowledge Graph-based Representations of Language. Knowledge-Based Systems. 111:87-99. https://doi.org/10.1016/j.knosys.2016.08.004S879911

    Toward a translingual composition: ancient rhetorics and language difference

    Get PDF
    The purpose of this dissertation is to outline a pedagogy that promotes language difference in college composition classrooms. Scholarship on language difference has strived for decades to transform teaching practices in mainstream, developmental, and second-language writing instruction. Despite compelling arguments in support of linguistic diversity, a majority of secondary and postsecondary writing teachers in the U.S. still privilege Standard English. However, non-native speakers of English now outnumber native speakers worldwide, a fact which promises to redefine what "standard" means from a translingual perspective. It is becoming clearer that multilingual writers, versed in flexible hermeneutic strategies and able to draw on a variety of Englishes and languages to make meaning, have significant advantages over monolingual students. My dissertation anticipates the pedagogical and programmatic changes necessitated by this global language shift. To this end, I join a number of scholars in arguing for a revival of classical style and the progymnasmata, albeit with the unique agenda of strengthening pedagogies of language difference. Although adapting classical rhetorics to promote translingual practices such as code-meshing at first seems to contradict the spirit of language difference given the dominant perception of Greco-Roman culture as imperialistic and intolerant of diversity, I reread neglected rhetoricians such as Quintilian in order to recover their latent multilingual potential

    Pedagogy : reconsiderations and reorientations.

    Get PDF
    This dissertation is a critical intervention into the question of student agency. An interdisciplinary project that draws upon philosophy and linguistics, it reviews four major tendencies that have animated composition pedagogy over the last several decades— process theory, social-constructivism, procedural rhetoric, and trans-lingual pedagogies— and identifies some of the key tensions that both motivate and problematize these approaches. First, it examines the debate between Peter Elbow and David Bartholomae, and the interplay between teachers’ authority and student agency. Second, it explores the imbrications between representation and materiality in social constructivism. Third, it uses Alain Badiou’s Being and Event to analyze the tensions between (nominally) formulaic composition strategies and the elusiveness of kairos. Fourth, it investigates non-standard English dialects, Suresh Canagarajah’s concept of “code meshing,” and the competing conceptualizations of language as a static system, and as a dynamic, emergent process of sedimentation. Rather than attempting to resolve these tensions, my dissertation dramatizes them, painting a fuller, clearer picture of the contradictions that every classroom inhabits. In doing so, I do not privilege any single approach over the others. Instead, I call for a particular pedagogical disposition that can productively inform all of them: a resistance to closure, an openness to critical puzzlement, a negative capability that invites the rupture of rigid structures and schemas. With regard to composition studies more broadly, my dissertation dissects the key terms and assumptions of the debates surrounding these pedagogical tendencies, forwarding a more nuanced theoretical platform on which they can transpire. Ultimately, my dissertation aims to inform pedagogical practice and curriculum development more generally, and lead to an enriched understanding of how student agency can vitalize the classroom

    Cross-view Embeddings for Information Retrieval

    Full text link
    In this dissertation, we deal with the cross-view tasks related to information retrieval using embedding methods. We study existing methodologies and propose new methods to overcome their limitations. We formally introduce the concept of mixed-script IR, which deals with the challenges faced by an IR system when a language is written in different scripts because of various technological and sociological factors. Mixed-script terms are represented by a small and finite feature space comprised of character n-grams. We propose the cross-view autoencoder (CAE) to model such terms in an abstract space and CAE provides the state-of-the-art performance. We study a wide variety of models for cross-language information retrieval (CLIR) and propose a model based on compositional neural networks (XCNN) which overcomes the limitations of the existing methods and achieves the best results for many CLIR tasks such as ad-hoc retrieval, parallel sentence retrieval and cross-language plagiarism detection. We empirically test the proposed models for these tasks on publicly available datasets and present the results with analyses. In this dissertation, we also explore an effective method to incorporate contextual similarity for lexical selection in machine translation. Concretely, we investigate a feature based on context available in source sentence calculated using deep autoencoders. The proposed feature exhibits statistically significant improvements over the strong baselines for English-to-Spanish and English-to-Hindi translation tasks. Finally, we explore the the methods to evaluate the quality of autoencoder generated representations of text data and analyse its architectural properties. For this, we propose two metrics based on reconstruction capabilities of the autoencoders: structure preservation index (SPI) and similarity accumulation index (SAI). We also introduce a concept of critical bottleneck dimensionality (CBD) below which the structural information is lost and present analyses linking CBD and language perplexity.En esta disertación estudiamos problemas de vistas-múltiples relacionados con la recuperación de información utilizando técnicas de representación en espacios de baja dimensionalidad. Estudiamos las técnicas existentes y proponemos nuevas técnicas para solventar algunas de las limitaciones existentes. Presentamos formalmente el concepto de recuperación de información con escritura mixta, el cual trata las dificultades de los sistemas de recuperación de información cuando los textos contienen escrituras en distintos alfabetos debido a razones tecnológicas y socioculturales. Las palabras en escritura mixta son representadas en un espacio de características finito y reducido, compuesto por n-gramas de caracteres. Proponemos los auto-codificadores de vistas-múltiples (CAE, por sus siglas en inglés) para modelar dichas palabras en un espacio abstracto, y esta técnica produce resultados de vanguardia. En este sentido, estudiamos varios modelos para la recuperación de información entre lenguas diferentes (CLIR, por sus siglas en inglés) y proponemos un modelo basado en redes neuronales composicionales (XCNN, por sus siglas en inglés), el cual supera las limitaciones de los métodos existentes. El método de XCNN propuesto produce mejores resultados en diferentes tareas de CLIR tales como la recuperación de información ad-hoc, la identificación de oraciones equivalentes en lenguas distintas y la detección de plagio entre lenguas diferentes. Para tal efecto, realizamos pruebas experimentales para dichas tareas sobre conjuntos de datos disponibles públicamente, presentando los resultados y análisis correspondientes. En esta disertación, también exploramos un método eficiente para utilizar similitud semántica de contextos en el proceso de selección léxica en traducción automática. Específicamente, proponemos características extraídas de los contextos disponibles en las oraciones fuentes mediante el uso de auto-codificadores. El uso de las características propuestas demuestra mejoras estadísticamente significativas sobre sistemas de traducción robustos para las tareas de traducción entre inglés y español, e inglés e hindú. Finalmente, exploramos métodos para evaluar la calidad de las representaciones de datos de texto generadas por los auto-codificadores, a la vez que analizamos las propiedades de sus arquitecturas. Como resultado, proponemos dos nuevas métricas para cuantificar la calidad de las reconstrucciones generadas por los auto-codificadores: el índice de preservación de estructura (SPI, por sus siglas en inglés) y el índice de acumulación de similitud (SAI, por sus siglas en inglés). También presentamos el concepto de dimensión crítica de cuello de botella (CBD, por sus siglas en inglés), por debajo de la cual la información estructural se deteriora. Mostramos que, interesantemente, la CBD está relacionada con la perplejidad de la lengua.En aquesta dissertació estudiem els problemes de vistes-múltiples relacionats amb la recuperació d'informació utilitzant tècniques de representació en espais de baixa dimensionalitat. Estudiem les tècniques existents i en proposem unes de noves per solucionar algunes de les limitacions existents. Presentem formalment el concepte de recuperació d'informació amb escriptura mixta, el qual tracta les dificultats dels sistemes de recuperació d'informació quan els textos contenen escriptures en diferents alfabets per motius tecnològics i socioculturals. Les paraules en escriptura mixta són representades en un espai de característiques finit i reduït, composat per n-grames de caràcters. Proposem els auto-codificadors de vistes-múltiples (CAE, per les seves sigles en anglès) per modelar aquestes paraules en un espai abstracte, i aquesta tècnica produeix resultats d'avantguarda. En aquest sentit, estudiem diversos models per a la recuperació d'informació entre llengües diferents (CLIR , per les sevas sigles en anglès) i proposem un model basat en xarxes neuronals composicionals (XCNN, per les sevas sigles en anglès), el qual supera les limitacions dels mètodes existents. El mètode de XCNN proposat produeix millors resultats en diferents tasques de CLIR com ara la recuperació d'informació ad-hoc, la identificació d'oracions equivalents en llengües diferents, i la detecció de plagi entre llengües diferents. Per a tal efecte, realitzem proves experimentals per aquestes tasques sobre conjunts de dades disponibles públicament, presentant els resultats i anàlisis corresponents. En aquesta dissertació, també explorem un mètode eficient per utilitzar similitud semàntica de contextos en el procés de selecció lèxica en traducció automàtica. Específicament, proposem característiques extretes dels contextos disponibles a les oracions fonts mitjançant l'ús d'auto-codificadors. L'ús de les característiques proposades demostra millores estadísticament significatives sobre sistemes de traducció robustos per a les tasques de traducció entre anglès i espanyol, i anglès i hindú. Finalment, explorem mètodes per avaluar la qualitat de les representacions de dades de text generades pels auto-codificadors, alhora que analitzem les propietats de les seves arquitectures. Com a resultat, proposem dues noves mètriques per quantificar la qualitat de les reconstruccions generades pels auto-codificadors: l'índex de preservació d'estructura (SCI, per les seves sigles en anglès) i l'índex d'acumulació de similitud (SAI, per les seves sigles en anglès). També presentem el concepte de dimensió crítica de coll d'ampolla (CBD, per les seves sigles en anglès), per sota de la qual la informació estructural es deteriora. Mostrem que, de manera interessant, la CBD està relacionada amb la perplexitat de la llengua.Gupta, PA. (2017). Cross-view Embeddings for Information Retrieval [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/78457TESI

    Reporter fired for plagiarism : a forensic linguistic analysis of news plagiarism

    Get PDF
    O plágio tem sido tradicionalmente classificado como um ato imoral e violador das normas éticas, mais do que uma ação ilegal (Garner 2009; Goldstein 2003), e o plágio jornalístico não é exceção. Como referem Coulthard & Johnson (2007), a reutilização de texto por jornalistas, sem atribuição ou com atribuição de autoria inadequada, não é normalmente considerada plágio. A isto acresce o facto de as convenções relativas à reutilização de notícias das agências não serem universais. Porém, as graves consequências inerentes à má prática jornalística (como o caso de Jayson Blair, do The New York Times) mostram que as implicações não se limitam à esfera da ética, mas, pelo contrário, possuem impacto legal, incluindo processos de demissão. Um dos problemas, no entanto, consiste em provar determinada reutilização textual como plágio. Este estudo apresenta os resultados de uma análise linguística forense que pode ser utilizada para provar casos de suspeita de plágio ou para iniciar a investigação de textos insuspeitos. Com o objetivo de identificar os mecanismos utilizados e como pelos jornalistas para comporem os seus próprios textos a partir das notícias das agências, este trabalho compara notícias publicadas na secção Mundo de jornais de referência portugueses com possíveis fontes publicadas em inglês. Os resultados da análise mostram que: (a) a atribuição de autoria é, frequentemente, inadequada, mesmo quando os jornais de referência citam as suas fontes (normalmente, conhecidas agências internacionais); (b) nem sempre existe uma correspondência direta com uma única fonte entre a versão plagiadora e a versão plagiada (indicando reutilização de texto de diferentes media e websites internacionais); e (c) as notícias são plagiadas a partir de textos publicados noutras línguas, constituindo plágio translingue. Conclui-se que a análise linguística forense possui potencial de prova e de investigação em casos de plágio e violação de direito de autor, não só monolingue, mas também translingue

    Detecting plagiarism in the forensic linguistics turn

    Get PDF
    This study investigates plagiarism detection, with an application in forensic contexts. Two types of data were collected for the purposes of this study. Data in the form of written texts were obtained from two Portuguese Universities and from a Portuguese newspaper. These data are analysed linguistically to identify instances of verbatim, morpho-syntactical, lexical and discursive overlap. Data in the form of survey were obtained from two higher education institutions in Portugal, and another two in the United Kingdom. These data are analysed using a 2 by 2 between-groups Univariate Analysis of Variance (ANOVA), to reveal cross-cultural divergences in the perceptions of plagiarism. The study discusses the legal and social circumstances that may contribute to adopting a punitive approach to plagiarism, or, conversely, reject the punishment. The research adopts a critical approach to plagiarism detection. On the one hand, it describes the linguistic strategies adopted by plagiarists when borrowing from other sources, and, on the other hand, it discusses the relationship between these instances of plagiarism and the context in which they appear. A focus of this study is whether plagiarism involves an intention to deceive, and, in this case, whether forensic linguistic evidence can provide clues to this intentionality. It also evaluates current computational approaches to plagiarism detection, and identifies strategies that these systems fail to detect. Specifically, a method is proposed to translingual plagiarism. The findings indicate that, although cross-cultural aspects influence the different perceptions of plagiarism, a distinction needs to be made between intentional and unintentional plagiarism. The linguistic analysis demonstrates that linguistic elements can contribute to finding clues for the plagiarist’s intentionality. Furthermore, the findings show that translingual plagiarism can be detected by using the method proposed, and that plagiarism detection software can be improved using existing computer tools