77 research outputs found

    Papers in Southeast Asian Linguistics No. 9: Language policy, language planning and sociolinguistics in South-East Asia

    Get PDF

    Towards Reliable and Inclusive Natural Language Generation

    Get PDF
    Natural language generation (NLG) is an important subfield of natural language processing (NLP) that produces natural language output. Despite notable advancements made by large-scale pre-trained language models in NLG, there remain several unresolved challenges. This thesis aims to enhance NLG from two significant aspects: reliability and inclusiveness. For reliability, on the one hand, we introduce novel training objectives that improve the alignment of language generation models with desired model behaviors. To improve the answerability of model-generated questions, we use a question answering model to provide additional rewards to a question generation model, encouraging the production of more answerable questions. In addition, we propose to train language models with a mixture of forward and reverse cross-entropies, demonstrating that the resulting models yield better generated text without complex decoding strategies. On the other hand, we propose novel evaluation methods to assess the performance of NLG models accurately and comprehensively. By combining human and automatic evaluations, we strike a balance between reliability and reproducibility. We delve into the unexplored issue of unfaithfulness in extractive summaries and conclude that extractive summarization does not guarantee faithfulness. For inclusiveness, we extend the coverage of NLG techniques to low-resource or endangered languages. We develop the first machine translation system for supporting translation between Cherokee, an endangered Native American language, and English, and we propose a roadmap for utilizing NLP to support language revitalization efforts. Additionally, we investigate the underrepresentation of low-resource languages during multilingual tokenization, a crucial data preprocessing step in training multilingual NLG models, and we present best practices for training multilingual tokenizers. Overall, this thesis works towards enhancing the trustworthiness of NLG models in practice and facilitating support for a more diverse range of languages worldwide.Doctor of Philosoph

    Cappadocian kinship

    Get PDF
    Cappadocian kinship systems are very interesting from a sociolinguistic and anthropological perspective because of the mixture of inherited Greek and borrowed Turkish kinship terms. Precisely because the number of Turkish kinship terms differs from one variety to another, it is necessary to talk about Cappadocian kinship systems in the plural rather than about the Cappadocian kinship system in the singular. Although reference will be made to other Cappadocian varieties, this paper will focus on the kinship systems of Mišotika and Aksenitika, the two Central Cappadocian dialects still spoken today in several communities in Greece. Particular attention will be given to the use of borrowed Turkish kinship terms, which sometimes seem to co-exist together with their inherited Greek counterparts, e.g. mána vs. néne ‘mother’, ailfó/aelfó vs. γardáš ‘brother’ etc. In the final part of the paper some kinship terms with obscure or hitherto unknown etymology will be discussed, e.g. káka ‘grandmother’, ižá ‘aunt’, lúva ‘uncle (father’s brother)’ etc

    Procceding 2rd International Seminar on Linguistics

    Get PDF

    DiverCity - Global Cities as a Literary Phenomenon: Toronto, New York, and Los Angeles in a Globalizing Age

    Get PDF
    Based on the structured analysis of selected North American novels, this work examines global cities as a literary phenomenon ("DiverCity"). By analyzing Dionne Brand's Toronto, "What We All Long For" (2005), Chang-rae Lee's New York, "Native Speaker" (1995), and Karen Tei Yamashita's Los Angeles, "Tropic of Orange" (1997), the author provides the connecting link for exploring the triad of globalization and its effects, global cities as cultural nodal points, and cultural diversity in a globalizing age as a literary phenomenon. Thus, she contributes to a global, interdisciplinary, and multi-perspectival understanding of literature, culture, and society

    College of Arts and Sciences

    Full text link
    Cornell University Courses of Study Vol. 89 1997/9

    Why you do not adore you in Hungarian

    Get PDF
    This paper provides an overview of the pronominal coding of local coreference relations in Hungarian. In Hungarian, unlike in English, personal pronouns do not normally take local antecedents even if favourable pragmatic conditions are available. The paper argues that complex forms of the reflexive anaphor are used for the coding of local coreference, and they outcompete, as it were, personal pronouns in this function
    corecore