6,211 research outputs found

    Linguistic-based evaluation criteria to identify statistical machine translation errors

    Get PDF
    Machine translation evaluation methods are highly necessary in order to analyze the performance of translation systems. Up to now, the most traditional methods are the use of automatic measures such as BLEU or the quality perception performed by native human evaluations. In order to complement these traditional procedures, the current paper presents a new human evaluation based on the expert knowledge about the errors encountered at several linguistic levels: orthographic, morphological, lexical, semantic and syntactic. The results obtained in these experiments show that some linguistic errors could have more influence than other at the time of performing a perceptual evaluation.Postprint (published version

    Description of the Chinese-to-Spanish rule-based machine translation system developed with a hybrid combination of human annotation and statistical techniques

    Get PDF
    Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair. This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules. The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMTโ€™s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.Peer ReviewedPostprint (author's final draft

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation

    Full text link
    We present SimLex-999, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways. First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly quantifies similarity rather than association or relatedness, so that pairs of entities that are associated but not actually similar [Freud, psychology] have a low rating. We show that, via this focus on similarity, SimLex-999 incentivizes the development of models with a different, and arguably wider range of applications than those which reflect conceptual association. Second, SimLex-999 contains a range of concrete and abstract adjective, noun and verb pairs, together with an independent rating of concreteness and (free) association strength for each pair. This diversity enables fine-grained analyses of the performance of models on concepts of different types, and consequently greater insight into how architectures can be improved. Further, unlike existing gold standard evaluations, for which automatic approaches have reached or surpassed the inter-annotator agreement ceiling, state-of-the-art models perform well below this ceiling on SimLex-999. There is therefore plenty of scope for SimLex-999 to quantify future improvements to distributional semantic models, guiding the development of the next generation of representation-learning architectures

    Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop

    Full text link
    The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether interpretable knowledge can be decoded from intermediate representations acquired by neural networks, proposing modifications to neural network architectures to make their knowledge state or generated output more explainable, and examining the performance of networks on simplified or formal languages. Here we review a number of representative studies in each category

    Towards human linguistic machine translation evaluation

    Get PDF
    When evaluating machine translation outputs, linguistics is usually taken into account implicitly. Annotators have to decide whether a sentence is better than another or not, using, for example, adequacy and fluency criteria or, as recently proposed, editing the translation output so that it has the same meaning as a reference translation, and it is understandable. Therefore, the important fields of linguistics of meaning (semantics) and grammar (syntax) are indirectly considered. In this study, we propose to go one step further towards a linguistic human evaluation. The idea is to introduce linguistics implicitly by formulating precise guidelines. These guidelines strictly mark the difference between the sub-fields of linguistics such as: morphology, syntax, semantics, and orthography. We show our guidelines have a high inter-annotation agreement and wide-error coverage. Additionally, we examine how the linguistic human evaluation data correlate with: among different types of machine translation systems (rule and statistical-based); and with adequacy and fluency.Peer ReviewedPostprint (published version

    CAPT๋ฅผ ์œ„ํ•œ ๋ฐœ์Œ ๋ณ€์ด ๋ถ„์„ ๋ฐ CycleGAN ๊ธฐ๋ฐ˜ ํ”ผ๋“œ๋ฐฑ ์ƒ์„ฑ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์ธ๋ฌธ๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ธ์ง€๊ณผํ•™์ „๊ณต,2020. 2. ์ •๋ฏผํ™”.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies. This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system. The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.์™ธ๊ตญ์–ด๋กœ์„œ์˜ ํ•œ๊ตญ์–ด ๊ต์œก์— ๋Œ€ํ•œ ๊ด€์‹ฌ์ด ๊ณ ์กฐ๋˜์–ด ํ•œ๊ตญ์–ด ํ•™์Šต์ž์˜ ์ˆ˜๊ฐ€ ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์Œ์„ฑ์–ธ์–ด์ฒ˜๋ฆฌ ๊ธฐ์ˆ ์„ ์ ์šฉํ•œ ์ปดํ“จํ„ฐ ๊ธฐ๋ฐ˜ ๋ฐœ์Œ ๊ต์œก(Computer-Assisted Pronunciation Training; CAPT) ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์— ๋Œ€ํ•œ ์—ฐ๊ตฌ ๋˜ํ•œ ์ ๊ทน์ ์œผ๋กœ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ํ˜„์กดํ•˜๋Š” ํ•œ๊ตญ์–ด ๋งํ•˜๊ธฐ ๊ต์œก ์‹œ์Šคํ…œ์€ ์™ธ๊ตญ์ธ์˜ ํ•œ๊ตญ์–ด์— ๋Œ€ํ•œ ์–ธ์–ดํ•™์  ํŠน์ง•์„ ์ถฉ๋ถ„ํžˆ ํ™œ์šฉํ•˜์ง€ ์•Š๊ณ  ์žˆ์œผ๋ฉฐ, ์ตœ์‹  ์–ธ์–ด์ฒ˜๋ฆฌ ๊ธฐ์ˆ  ๋˜ํ•œ ์ ์šฉ๋˜์ง€ ์•Š๊ณ  ์žˆ๋Š” ์‹ค์ •์ด๋‹ค. ๊ฐ€๋Šฅํ•œ ์›์ธ์œผ๋กœ์จ๋Š” ์™ธ๊ตญ์ธ ๋ฐœํ™” ํ•œ๊ตญ์–ด ํ˜„์ƒ์— ๋Œ€ํ•œ ๋ถ„์„์ด ์ถฉ๋ถ„ํ•˜๊ฒŒ ์ด๋ฃจ์–ด์ง€์ง€ ์•Š์•˜๋‹ค๋Š” ์ , ๊ทธ๋ฆฌ๊ณ  ๊ด€๋ จ ์—ฐ๊ตฌ๊ฐ€ ์žˆ์–ด๋„ ์ด๋ฅผ ์ž๋™ํ™”๋œ ์‹œ์Šคํ…œ์— ๋ฐ˜์˜ํ•˜๊ธฐ์—๋Š” ๊ณ ๋„ํ™”๋œ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ์ ์ด ์žˆ๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ CAPT ๊ธฐ์ˆ  ์ „๋ฐ˜์ ์œผ๋กœ๋Š” ์‹ ํ˜ธ์ฒ˜๋ฆฌ, ์šด์œจ ๋ถ„์„, ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•๊ณผ ๊ฐ™์€ ํŠน์ง• ์ถ”์ถœ์— ์˜์กดํ•˜๊ณ  ์žˆ์–ด์„œ ์ ํ•ฉํ•œ ํŠน์ง•์„ ์ฐพ๊ณ  ์ด๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ถ”์ถœํ•˜๋Š” ๋ฐ์— ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ๋…ธ๋ ฅ์ด ํ•„์š”ํ•œ ์‹ค์ •์ด๋‹ค. ์ด๋Š” ์ตœ์‹  ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์–ธ์–ด์ฒ˜๋ฆฌ ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•จ์œผ๋กœ์จ ์ด ๊ณผ์ • ๋˜ํ•œ ๋ฐœ์ „์˜ ์—ฌ์ง€๊ฐ€ ๋งŽ๋‹ค๋Š” ๋ฐ”๋ฅผ ์‹œ์‚ฌํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ์—ฐ๊ตฌ๋Š” ๋จผ์ € CAPT ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์— ์žˆ์–ด ๋ฐœ์Œ ๋ณ€์ด ์–‘์ƒ๊ณผ ์–ธ์–ดํ•™์  ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•˜์˜€๋‹ค. ์™ธ๊ตญ์ธ ํ™”์ž๋“ค์˜ ๋‚ญ๋…์ฒด ๋ณ€์ด ์–‘์ƒ๊ณผ ํ•œ๊ตญ์–ด ์›์–ด๋ฏผ ํ™”์ž๋“ค์˜ ๋‚ญ๋…์ฒด ๋ณ€์ด ์–‘์ƒ์„ ๋Œ€์กฐํ•˜๊ณ  ์ฃผ์š”ํ•œ ๋ณ€์ด๋ฅผ ํ™•์ธํ•œ ํ›„, ์ƒ๊ด€๊ด€๊ณ„ ๋ถ„์„์„ ํ†ตํ•˜์—ฌ ์˜์‚ฌ์†Œํ†ต์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ค‘์š”๋„๋ฅผ ํŒŒ์•…ํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ์ข…์„ฑ ์‚ญ์ œ์™€ 3์ค‘ ๋Œ€๋ฆฝ์˜ ํ˜ผ๋™, ์ดˆ๋ถ„์ ˆ ๊ด€๋ จ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ ํ”ผ๋“œ๋ฐฑ ์ƒ์„ฑ์— ์šฐ์„ ์ ์œผ๋กœ ๋ฐ˜์˜ํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์ด ํ™•์ธ๋˜์—ˆ๋‹ค. ๊ต์ •๋œ ํ”ผ๋“œ๋ฐฑ์„ ์ž๋™์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์€ CAPT ์‹œ์Šคํ…œ์˜ ์ค‘์š”ํ•œ ๊ณผ์ œ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด ๊ณผ์ œ๊ฐ€ ๋ฐœํ™”์˜ ์Šคํƒ€์ผ ๋ณ€ํ™”์˜ ๋ฌธ์ œ๋กœ ํ•ด์„์ด ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ๋ณด์•˜์œผ๋ฉฐ, ์ƒ์„ฑ์  ์ ๋Œ€ ์‹ ๊ฒฝ๋ง (Cycle-consistent Generative Adversarial Network; CycleGAN) ๊ตฌ์กฐ์—์„œ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. GAN ๋„คํŠธ์›Œํฌ์˜ ์ƒ์„ฑ๋ชจ๋ธ์€ ๋น„์›์–ด๋ฏผ ๋ฐœํ™”์˜ ๋ถ„ํฌ์™€ ์›์–ด๋ฏผ ๋ฐœํ™” ๋ถ„ํฌ์˜ ๋งคํ•‘์„ ํ•™์Šตํ•˜๋ฉฐ, Cycle consistency ์†์‹คํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ๋ฐœํ™”๊ฐ„ ์ „๋ฐ˜์ ์ธ ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•จ๊ณผ ๋™์‹œ์— ๊ณผ๋„ํ•œ ๊ต์ •์„ ๋ฐฉ์ง€ํ•˜์˜€๋‹ค. ๋ณ„๋„์˜ ํŠน์ง• ์ถ”์ถœ ๊ณผ์ •์ด ์—†์ด ํ•„์š”ํ•œ ํŠน์ง•๋“ค์ด CycleGAN ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ๋ฌด๊ฐ๋… ๋ฐฉ๋ฒ•์œผ๋กœ ์Šค์Šค๋กœ ํ•™์Šต๋˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ, ์–ธ์–ด ํ™•์žฅ์ด ์šฉ์ดํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. ์–ธ์–ดํ•™์  ๋ถ„์„์—์„œ ๋“œ๋Ÿฌ๋‚œ ์ฃผ์š”ํ•œ ๋ณ€์ด๋“ค ๊ฐ„์˜ ์šฐ์„ ์ˆœ์œ„๋Š” Auxiliary Classifier CycleGAN ๊ตฌ์กฐ์—์„œ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด์˜ CycleGAN์— ์ง€์‹์„ ์ ‘๋ชฉ์‹œ์ผœ ํ”ผ๋“œ๋ฐฑ ์Œ์„ฑ์„ ์ƒ์„ฑํ•จ๊ณผ ๋™์‹œ์— ํ•ด๋‹น ํ”ผ๋“œ๋ฐฑ์ด ์–ด๋–ค ์œ ํ˜•์˜ ์˜ค๋ฅ˜์ธ์ง€ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ด๋Š” ๋„๋ฉ”์ธ ์ง€์‹์ด ๊ต์ • ํ”ผ๋“œ๋ฐฑ ์ƒ์„ฑ ๋‹จ๊ณ„๊นŒ์ง€ ์œ ์ง€๋˜๊ณ  ํ†ต์ œ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค๋Š” ๋ฐ์— ๊ทธ ์˜์˜๊ฐ€ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ 27๊ฐœ์˜ ๋ชจ๊ตญ์–ด๋ฅผ ๊ฐ–๋Š” 217๋ช…์˜ ์œ ์˜๋ฏธ ์–ดํœ˜ ๋ฐœํ™” 65,100๊ฐœ๋กœ ํ”ผ๋“œ๋ฐฑ ์ž๋™ ์ƒ์„ฑ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ , ๊ฐœ์„  ์—ฌ๋ถ€ ๋ฐ ์ •๋„์— ๋Œ€ํ•œ ์ง€๊ฐ ํ‰๊ฐ€๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€์„ ๋•Œ ํ•™์Šต์ž ๋ณธ์ธ์˜ ๋ชฉ์†Œ๋ฆฌ๋ฅผ ์œ ์ง€ํ•œ ์ฑ„ ๊ต์ •๋œ ๋ฐœ์Œ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์ „ํ†ต์ ์ธ ๋ฐฉ๋ฒ•์ธ ์Œ๋†’์ด ๋™๊ธฐ์‹ ์ค‘์ฒฉ๊ฐ€์‚ฐ (Pitch-Synchronous Overlap-and-Add) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ์ƒ๋Œ€ ๊ฐœ์„ ๋ฅ  16.67%์ด ํ™•์ธ๋˜์—ˆ๋‹ค.Chapter 1. Introduction 1 1.1. Motivation 1 1.1.1. An Overview of CAPT Systems 3 1.1.2. Survey of existing Korean CAPT Systems 5 1.2. Problem Statement 7 1.3. Thesis Structure 7 Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9 2.1. Comparison between Korean and Chinese 11 2.1.1. Phonetic and Syllable Structure Comparisons 11 2.1.2. Phonological Comparisons 14 2.2. Related Works 16 2.3. Proposed Analysis Method 19 2.3.1. Corpus 19 2.3.2. Transcribers and Agreement Rates 22 2.4. Salient Pronunciation Variations 22 2.4.1. Segmental Variation Patterns 22 2.4.1.1. Discussions 25 2.4.2. Phonological Variation Patterns 26 2.4.1.2. Discussions 27 2.5. Summary 29 Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30 3.1. Related Works 31 3.1.1. Criteria used in L2 Speech 31 3.1.2. Criteria used in L2 Korean Speech 32 3.2. Proposed Human Evaluation Method 36 3.2.1. Reading Prompt Design 36 3.2.2. Evaluation Criteria Design 37 3.2.3. Raters and Agreement Rates 40 3.3. Linguistic Factors Affecting L2 Korean Accentedness 41 3.3.1. Pearsons Correlation Analysis 41 3.3.2. Discussions 42 3.3.3. Implications for Automatic Feedback Generation 44 3.4. Summary 45 Chapter 4. Corrective Feedback Generation for CAPT 46 4.1. Related Works 46 4.1.1. Prosody Transplantation 47 4.1.2. Recent Speech Conversion Methods 49 4.1.3. Evaluation of Corrective Feedback 50 4.2. Proposed Method: Corrective Feedback as a Style Transfer 51 4.2.1. Speech Analysis at Spectral Domain 53 4.2.2. Self-imitative Learning 55 4.2.3. An Analogy: CAPT System and GAN Architecture 57 4.3. Generative Adversarial Networks 59 4.3.1. Conditional GAN 61 4.3.2. CycleGAN 62 4.4. Experiment 63 4.4.1. Corpus 64 4.4.2. Baseline Implementation 65 4.4.3. Adversarial Training Implementation 65 4.4.4. Spectrogram-to-Spectrogram Training 66 4.5. Results and Evaluation 69 4.5.1. Spectrogram Generation Results 69 4.5.2. Perceptual Evaluation 70 4.5.3. Discussions 72 4.6. Summary 74 Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75 5.1. Linguistic Class Selection 75 5.2. Auxiliary Classifier CycleGAN Design 77 5.3. Experiment and Results 80 5.3.1. Corpus 80 5.3.2. Feature Annotations 81 5.3.3. Experiment Setup 81 5.3.4. Results 82 5.4. Summary 84 Chapter 6. Conclusion 86 6.1. Thesis Results 86 6.2. Thesis Contributions 88 6.3. Recommendations for Future Work 89 Bibliography 91 Appendix 107 Abstract in Korean 117 Acknowledgments 120Docto
    • โ€ฆ
    corecore