2,148 research outputs found

    A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

    Get PDF
    Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic

    Loan Phonology

    Get PDF
    For many different reasons, speakers borrow words from other languages to fill gaps in their own lexical inventory. The past ten years have been characterized by a great interest among phonologists in the issue of how the nativization of loanwords occurs. The general feeling is that loanword nativization provides a direct window for observing how acoustic cues are categorized in terms of the distinctive features relevant to the L1 phonological system as well as for studying L1 phonological processes in action and thus to the true synchronic phonology of L1. The collection of essays presented in this volume provides an overview of the complex issues phonologists face when investigating this phenomenon and, more generally, the ways in which unfamiliar sounds and sound sequences are adapted to converge with the native languageโ€™s sound pattern. This book is of interest to theoretical phonologists as well as to linguists interested in language contact phenomena

    Affinity Prober๋ฅผ ํ™œ์šฉํ•œ BERT ์–ธ์–ด ์ง€์‹์˜ ๋ ˆ์ด์–ด ๋ณ„ ํƒ์นจ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์ธ๋ฌธ๋Œ€ํ•™ ์–ธ์–ดํ•™๊ณผ, 2023. 8. ์‹ ํšจํ•„.Transformer(Vawsani et al., 2017)์˜ ๋“ฑ์žฅ ์ดํ›„ Self Attention ๊ธฐ์ œ๋ฅผ ์‚ฌ์šฉํ•œ ๋‹ค์–‘ํ•œ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ(Pre-trained Language Model)์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฏธ์„ธ์กฐ์ •(fine-tuning)์„ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฌธ์ œ์—์„œ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์™”๋‹ค. ์–ธ์–ดํ•™ ๋ถ„์•ผ์—์„œ๋Š” ์–ธ์–ด ๋ชจ๋ธ์˜ ๋‚ด์žฌ์  ์–ธ์–ด ์ง€์‹์„ ํƒ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ํ†ต์‚ฌ๋ก , ์˜๋ฏธ๋ก , ์–ธ์–ด ์Šต๋“ ๋“ฑ์˜ ์ด๋ก  ๋ฐ ์‹คํ—˜ ์–ธ์–ดํ•™ ์ ‘๊ทผ๋ฒ•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ Jang et al (2022)์—์„œ ์ œ์•ˆํ•œ ์–ธ์–ด ์ง€์‹ ํƒ์นจ ๋ฐฉ๋ฒ•๋ก ์ธ Affinity Prober์˜ ์‚ฌ์šฉ ๋ฒ”์ฃผ๋ฅผ ํ™•์žฅ์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด self-attention mechanism์—์„œ ์–ดํ…์…˜ ์Šค์ฝ”์–ด ๊ฐ’์„ ๋ณด์กดํ•˜๋ฉฐ ํ† ํฐ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํ•ด์„ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ ADTRAS ์•Œ๊ณ ๋ฆฌ์ฆ˜ (An Algorithm for Decrypting Token Relationships within Attention Scores)์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ADTRAS ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ™œ์šฉํ•˜์—ฌ ์ฒซ ๋ฒˆ์งธ ์‹คํ—˜์—์„œ GLUE ๋ฒค์น˜๋งˆํฌ ๋‚ด์˜ ํ†ต์‚ฌ-์˜๋ฏธ์  ๊ธฐ๋Šฅ์„ ์š”๊ตฌํ•˜๋Š” 6๊ฐ€์ง€ ํƒœ์Šคํฌ์— ๊ฐ๊ฐ ํ›ˆ๋ จ๋œ BERT ๋ชจํ˜•์˜ ๋ ˆ์ด์–ด ํŒจํ„ด์„ ๋ถ„์„ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด BERT ๋ชจํ˜•์ด ํ† ํฐ ๊ด€๊ณ„์˜ ์œ ์˜๋ฏธํ•œ ๋ณ€ํ™”๋ฅผ ํฌ์ฐฉํ•˜๊ณ , ADTRAS ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ™œ์šฉํ•˜์—ฌ BERT ์–ดํ…์…˜ ๋ณ€ํ™”๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ BERT ๋ชจ๋ธ์ด ์Šค์Šค๋กœ ์–ดํœ˜ ๋ฒ”์ฃผ(Lexical Category)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ’ˆ์‚ฌ ์ •๋ณด๋ฅผ ํ•™์Šตํ•œ๋‹ค๋Š” ์‹ค์ฆ์ ์ธ ์ฆ๊ฑฐ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ๋˜ํ•œ ์–ดํœ˜ ๋ฒ”์ฃผ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ BERT ๋ ˆ์ด์–ด์˜ ๋ถ„๋ช…ํ•œ ์–ธ์–ดํ•™์  ํŠน์ง•์„ ์ผ๋ฐ˜ํ™”ํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ ์‹คํ—˜์œผ๋กœ๋Š” Affinity Prober๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ†ต์‚ฌ์  ์–ธ์–ดํ˜„์ƒ์—์„œ์˜ ์ตœ์†Œ์Œ ๋ฌธ์žฅ์„ ์ฒ˜๋ฆฌํ•˜๋Š” BERT์˜ ํŠน์ง•์„ ๋ถ„์„ํ•œ๋‹ค. ์ด ์‹คํ—˜์€ ์‚ฌ์šฉ๋œ 15๊ฐ€์ง€์˜ ํ†ต์‚ฌ์  ์–ธ์–ดํ˜„์ƒ์ด BERT ๋ชจ๋ธ์—์„œ ์ฒ˜๋ฆฌ๋˜๋Š” ๊ณผ์ •์„ Affinity Prober๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํƒ๊ตฌํ•˜์—ฌ ๋ ˆ์ด์–ด ๋ณ„ ํŒจํ„ด์„ ๋ถ„์„ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋กœ ์ด ๋„ค ๊ฐ€์ง€์˜ ํŒจํ„ด์ด ๊ด€์ฐฐ๋˜์—ˆ๋Š”๋ฐ, ๋ณธ ๋…ผ๋ฌธ์€ ๊ด€์ฐฐ๋œ ํŒจํ„ด์ด ๊ฐ๊ฐ ์œ ์‚ฌํ•œ ์–ธ์–ดํ˜„์ƒ ๋ณ„๋กœ ๋ฌถ์ธ๋‹ค๊ณ  ์ฃผ์žฅํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ํŒจํ„ด์€ Passive์™€ Ellipsis N-bar์™€ ๊ด€๋ จ๋œ ์–ธ์–ดํ˜„์ƒ๋“ค์ด ์ฃผ๋ฅผ ์ด๋ฃจ๋ฉฐ, ๋‘ ๋ฒˆ์งธ ํŒจํ„ด์€ Island Effects, ์„ธ ๋ฒˆ์งธ ํŒจํ„ด์€ Movement์—์„œ์˜ Syntactic Constraints์—์„œ์˜ ์–ธ์–ดํ˜„์ƒ, ๋งˆ์ง€๋ง‰์œผ๋กœ ๋„ค ๋ฒˆ์งธ ํŒจํ„ด์—์„œ๋Š” Verb Predicate Types๊ณผ ๋…ผํ•ญ ๊ตฌ์กฐ์—์„œ์˜ ์–ธ์–ดํ˜„์ƒ๋“ค๋กœ ๋‚˜ํƒ€๋‚œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฐ ๋ ˆ์ด์–ด ๋ณ„ ํŒจํ„ด์ด ADTRAS ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ์˜ ๊ฒฐ๊ณผ์™€ ์ผ์น˜ํ•œ๋‹ค๋Š” ์ ์—์„œ ๋ณธ ์‹คํ—˜์„ ํ†ตํ•ด ๋„์ถœ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋’ท๋ฐ›์นจํ•œ๋‹ค. ์š”์•ฝํ•˜์ž๋ฉด, ๋ณธ ๋…ผ๋ฌธ์€ ADTRAS ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•˜๊ณ , Jang et al (2022)์—์„œ ์ œ์•ˆํ•œ Affinity Prober๋ฅผ ํ™•์žฅํ•˜์—ฌ ์—ฐ๊ตฌ์— ํ™œ์šฉํ•˜์˜€๋‹ค. ์ด ๊ณผ์ •์—์„œ ํ†ต์‚ฌ์  ์–ธ์–ดํ˜„์ƒ์˜ BERT ๋ ˆ์ด์–ด ๋ณ„ ํŒจํ„ด์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ถ”์ถœํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์„ค๋ช…ํ•˜๊ณ ์ž ๋…ธ๋ ฅํ•˜์˜€๋‹ค.This paper presents a comprehensive investigation into the linguistic knowledge embedded within BERT, a pre-trained language model based on the Transformer architecture. We reinforce and expand upon the methodology proposed by Jang et al (2022) by introducing the ADTRAS algorithm (An Algorithm for Decrypting Token Relationships within Attention Scores), which decrypts token relationships within BERT's attention scores to analyze patterns at each layer. Our experiments using ADTRAS algorithm demonstrate that BERT autonomously learns part-of-speech information by leveraging lexical categories. We also provide insights into the general tendencies of BERT's layers when processing content words and function words. Additionally, we introduce the Classification of Sentence Sequencing (CSS) as a Finetuning Strategy, enabling indirect learning from minimal pairs, and leverage the Affinity Prober to examine syntactic linguistic phenomena using the BLiMP dataset. By tracing patterns and clustering similar phenomena, we enhance our understanding of BERT's interpretation of linguistic structures. Furthermore, we establish in detail the attributes of BERT layers related to lexical categories by connecting the general tendencies of the layers generalized by the ADTRAS algorithm with the results obtained through the Affinity Prober. Our study makes several contributions. First, we introduce the ADTRAS algorithm, which enables a comprehensive analysis of BERT's linguistic knowledge. Second, we provide experimental evidence demonstrating BERT's ability to learn part-of-speech information. Third, we offer insights into the tendencies observed in different layers of BERT. Fourth, we propose the CSS Finetuning Strategy, which allows for indirect learning from minimal pairs. Fifth, we successfully cluster syntactic phenomena using the Affinity Prober. Finally, we uncover the general attention tendency of BERT towards lexical categories.Chapter 1. Introduction 1 Chapter 2. Related Work 5 2.1. Unveiling Linguistic Insights: The Probing Classifier Framework 6 2.2. The Interplay of Syntactic Tree and Neural Networks 7 2.3. BERT and Linguistics 7 Chapter 3. Generalization of Layer-Wise Attention Using ADTRAS Algorithm 10 3.1. Binary Categorization of Part-of-Speech in Sentences: Content Words and Function Words 10 3.2. ADTRAS Algorithm 13 3.3. The General Language Understanding Evaluation (GLUE) Benchmark 16 3.3.1 The Corpus of Linguistic Acceptability (CoLA) 16 3.3.2 The Microsoft Research Paraphrase Corpus (MRPC) 17 3.3.3 The Stanford Sentiment Treebank 2.0 (SST-2) 17 3.3.4 The Quora Question Pairs (QQP) 18 3.3.5 The Multi-Genre Natural Language Inference (MNLI) 18 3.3.6 The Words in Context (WiC) 19 3.3.7 Summary 19 3.4. Evaluating Attention Variations in Lexical Categories on NLU tasks 20 3.4.1 Intrinsic Learning of Lexical Categories in BERT for Downstream Tasks 21 3.4.2 Generalization of Layer-Wise Attention in Fine-Tuned BERT Models 23 Chapter 4. Probing Intrinsic Linguistic Knowledges of Deep Learning-based Language Model using Affinity Prober 23 4.1. Jang et al (2022) 24 4.2. Affinity Prober 26 4.2.1 Multi-Head Attention on Transformer Architecture 26 4.2.2 Affinity Relationship 27 4.2.3 Probabilistic Distribution of Categorized Affnity Relationships 28 4.2.4 The Algorithm of Affinity Prober on BERT 29 Chapter 5. The Benchmark of Linguistic Minimal Pairs (BLiMP) 33 5.1. Adjunct Island 34 5.2. Animate Subject 35 5.2.1. Animate Subject Passive 35 5.2.2. Animate Subject Trans 35 5.3. Causative 36 5.4. Complex NP Island 37 5.5. Coordinate Structure Constraint 38 5.5.1. Left Branch 38 5.5.2. Object Extraction 39 5.6. Drop Argument 40 5.7. Ellipsis N-bar 41 5.8. Inchoative 42 5.9. Intransitive 43 5.10. Transitive 44 5.11. Left Branch Island 44 5.11.1. Echo Question 44 5.11.2. Simple Question 45 5.12. Passive 46 5.13. Sentential Subject Island 47 5.14. Wh Island 48 5.15. Wh Questions 49 5.15.1. Object Gap 49 5.15.2. Subject Gap 50 Chapter 6. Experiment 51 6.1. Finetuning Strategy 51 6.2. Result: Clustering of Similar Linguistic Phenomena 53 6.2.1. Group I: Passive and Ellipsis N-bar 54 6.2.2. Group II: Island Effects 58 6.2.3. Group III: Syntactic Constraints on Movement 62 6.2.4. Group IV: Verbal Predicate Types and Argument Structures 66 6.3. Summary 72 Chapter 7. Conclusion 73 Reference 76 Appendix. 79 Abstract (In Korean) 98์„

    Biased learning of phonological alternations

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Linguistics and Philosophy, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 198-203).What is the initial state of the grammar when children begin to figure out patterns of phonological alternations? This thesis documents the developmental stages of children acquiring Korean verb and noun paradigms, and provides a unified account using the initial Output-Output correspondence (OO-CORR) bias. It is claimed that OO-CORR constraints are undominated in the initial state but the empirical evidence for this claim is primarily anecdotal. I provide experimental support for the claim. In the acquisition literature, children's innovative forms are normally attributed to imperfect or incomplete mastery of the adult grammar. The evidence presented in this thesis suggests that children in some intermediate stages are able to produce adult forms when the syntactic context demands it, but otherwise they avoid doing so if the forms involve phonological alternations. The specific way that children satisfy OO-CORR constraints is to inflect forms so that the output is faithful to one specific slot of the paradigm, the base form. I show that children select the base form by considering both phonological informativeness and the availability of the surface forms, and try to achieve paradigm uniformity on the basis of the base form in their production. Assuming an initial bias to output-output correspondence constraints, I model the attested learning trajectories. The observed intermediate stages are correctly predicted, showing that the learning of certain alternations is facilitated when they share structural changes with other alternations. Artificial grammar learning experiments confirm that featural overlap of alternations facilitates the learning of the patterns. I propose a learning model that incorporates the attested learning biases.by Young Ah Do.Ph.D

    The Development Of Glide Deletion In Seoul Korean: A Corpus And Articulatory Study

    Get PDF
    This dissertation investigates the pathways and causes of the development of glide deletion in Seoul Korean. Seoul provides fertile ground for studies of linguistic innovation in an urban setting since it has seen rapid historical, social and demographic changes in the twentieth century. The phenomenon under investigation is the variable deletion of the labiovelar glide /w/ found to be on the rise in Seoul Korean (Silva, 1991; Kang, 1997). I present two studies addressing variation and change at two different levels: a corpus study tracking the development of /w/-deletion at the phonological level and an articulatory study examining the phonetic aspect of this change. The corpus data are drawn from the sociolinguistic interviews with 48 native Seoul Koreans between 2015 and 2017. A trend comparison with the data from an earlier study of /w/- deletion (Kang, 1997) reveals that /w/-deletion in postconsonantal position has begun to retreat, while non-postconsonantal /w/-deletion has been rising vigorously. More importantly, the effect of preceding segment that used to be the strongest constraint on /w/-deletion has weakened over time. I conclude that /w/-deletion in Seoul Korean is being reanalyzed with the structural details being diluted over time. I analyze this weakening of the original pattern as the result of linguistic diffusion induced by a great influx of migrants into Seoul after the Korean War (1950-1953). In an articulatory study, ultrasound data of tongue movements and video data of lip rounding for the production of /w/ for three native Seoul Koreans in their 20s, 30s and 50s were analyzed using Optical Flow Analysis. I find that /w/ in Seoul Korean is subject to both gradient reduction and categorical deletion and that younger speakers exhibit a significantly larger articulatory gestures for /w/ after a bilabial than older generation, which is consistent with the pattern of phonological change found in the corpus study. This dissertation demonstrates the importance of using both corpus and articulatory data in the investigation of a change, finding the coexistence of gradient and categorical effects in segmental deletion processes. Finally, it advances our understanding of the outcome of migration-induced dialect contact in contemporary urban settings

    Getting Past the Language Gap: Innovations in Machine Translation

    Get PDF
    In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT

    Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

    Get PDF
    This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue itโ€™s easier than ever to do so: this document is accessible on the โ€œinformation superhighwayโ€. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authorsโ€™ abstracts in the web version of this report. The abstracts describe the researchersโ€™ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn
    • โ€ฆ
    corecore