193 research outputs found

    ๋‹จ์–ด์ž„๋ฒ ๋”ฉ์„ ์ด์šฉํ•œ ์ผ๋ณธ์–ด์™€ ํ•œ๊ตญ์–ด์—์„œ์˜ ์˜์–ด ์™ธ๋ž˜์–ด ์˜๋ฏธ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ธ๋ฌธ๋Œ€ํ•™ ์–ธ์–ดํ•™๊ณผ, 2021. 2. ์‹ ํšจํ•„.์ „ ์„ธ๊ณ„์ ์œผ๋กœ ํ™œ๋ฐœํ•œ ๋ฌธํ™” ๊ต๋ฅ˜๊ฐ€ ์ด๋ฃจ์–ด์ง์— ๋”ฐ๋ผ ์™ธ๋ž˜์–ด๊ฐ€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š”๋ฐ, ์™ธ๋ž˜์–ด์˜ ์ˆ˜์šฉ ๊ณผ์ •์—์„œ ๋‹ค์–‘ํ•œ ์–ธ์–ด์  ํ˜„์ƒ์ด ์ผ์–ด๋‚œ๋‹ค. ์™ธ๋ž˜์–ด๊ฐ€ ์ˆ˜์šฉ๋จ์— ๋”ฐ๋ผ ์›๋ž˜ ์ฐจ์šฉ์ฃผ์— ์กด์žฌํ–ˆ๋˜ ๋‹จ์–ด๊ฐ€ ์‚ฌ๋ผ์ง€๊ธฐ๋„ ํ•˜๊ณ , ์ฐจ์šฉ์–ด์˜ ์ ‘๋ฏธ์‚ฌ์™€ ๋‹จ์–ด๊ฐ€ ์ฐจ์šฉ์ฃผ์˜ ๋‹จ์–ด์™€ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋‹จ์–ด๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ๋„ ํ•˜๋ฉฐ, ์ฐจ์šฉ์–ด์˜ ์ „์น˜์‚ฌ๊ฐ€ ์™ธ๋ž˜์–ด๋กœ์„œ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ๋˜๊ธฐ๋„ ํ•œ๋‹ค. ๋˜ํ•œ, ์™ธ๋ž˜์–ด ์ž์ฒด๋Š” ์ฐจ์šฉ์ฃผ์˜ ์–ธ์–ด์  ์ œ์•ฝ์œผ๋กœ ์ธํ•ด ์™ธ๋ž˜์–ด์˜ ์ •์ฐฉ ๊ณผ์ •์—์„œ ํ˜•ํƒœ, ์Œ์šด ๋ฐ ์˜๋ฏธ ๋ณ€ํ™”๋ฅผ ๊ฒช๋Š”๋‹ค. ์ด์™€ ๊ฐ™์ด, ์™ธ๋ž˜์–ด์˜ ์ˆ˜์šฉ ๊ณผ์ •์—์„œ ์ฐจ์šฉ์ฃผ์™€ ์ฐจ์šฉ์–ด์˜ ๋‹ค์–‘ํ•œ ๋ณ€ํ™”๊ฐ€ ์ผ์–ด๋‚˜๊ธฐ ๋•Œ๋ฌธ์— ์™ธ๋ž˜์–ด๋Š” ์—ญ์‚ฌ์–ธ์–ดํ•™์˜ ํ˜•ํƒœ๋ก , ์Œ์šด๋ก , ์˜๋ฏธ๋ก ๊ณผ ๊ฐ™์€ ์—ฌ๋Ÿฌ ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•˜๊ฒŒ ์—ฐ๊ตฌ๋˜๋Š” ์ฃผ์ œ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์™ธ๋ž˜์–ด๋Š” ์ฃผ๋กœ ์ฐจ์šฉ์ฃผ์˜ ๋‹จ์–ด๋กœ๋Š” ํ‘œํ˜„ํ•  ์ˆ˜ ์—†๋Š” ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ์™ธ๊ตญ ์ œํ’ˆ๋ช…์ด๋‚˜ ๊ฐœ๋…์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ํ•œํŽธ์œผ๋กœ๋Š” ์ด๋ฏธ ๊ณ ์œ ์–ด๋กœ ์กด์žฌํ•˜๋Š” ๋‹จ์–ด๋ฅผ ์ข€ ๋” ๊ณ ๊ธ‰์Šค๋Ÿฝ๊ณ  ํ•™์ˆ ์ ์ธ ์ด๋ฏธ์ง€๋กœ ๋ฐ”๊พธ๊ธฐ ์œ„ํ•ด ์™ธ๋ž˜์–ด๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•˜๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ์™ธ๋ž˜์–ด์˜ ์‚ฌํšŒ์–ธ์–ดํ•™์  ์—ญํ• ์€ ์ตœ๊ทผ ํŠนํžˆ ์ฃผ๋ชฉ์„ ๋ฐ›๊ณ  ์žˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ์™ธ๋ž˜์–ด ์„ ํ–‰์—ฐ๊ตฌ๋Š” ์™ธ๋ž˜์–ด์˜ ๋งŽ์€ ์˜ˆ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ์–ธ์–ด๋ณ€ํ™” ํŒจํ„ด์„ ์ •๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ง„ํ–‰๋˜์—ˆ๋‹ค. ์ตœ๊ทผ ๋ง๋ญ‰์น˜ ๊ธฐ๋ฐ˜์˜ ์ •๋Ÿ‰์  ์—ฐ๊ตฌ์—์„œ๋Š” ๋‹จ์–ด ๊ธธ์ด์™€ ๊ฐ™์€ ์–ธ์–ดํ•™์ ์ธ ์š”์ธ๋“ค์ด ์™ธ๋ž˜์–ด๊ฐ€ ์ฐจ์šฉ์ฃผ์— ์„ฑ๊ณต์ ์œผ๋กœ ์ •์ฐฉํ•˜๋Š” ๊ณผ์ •์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ํ†ต๊ณ„์ ์œผ๋กœ ์—ฐ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋งŽ์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๋‹จ์–ด์˜ ๋นˆ๋„๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ๋Š” ๋‹จ์–ด์˜ ๋ณต์žกํ•œ ์˜๋ฏธ ์ •๋ณด๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ๋ฐ์—๋Š” ์–ด๋ ค์›€์ด ์žˆ์–ด ์™ธ๋ž˜์–ด ์˜๋ฏธ ํ˜„์ƒ์— ๋Œ€ํ•œ ์ •๋Ÿ‰์  ๋ถ„์„์—ฐ๊ตฌ๋Š” ์•„์ง ์ง„ํ–‰๋˜์ง€ ์•Š์•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์™ธ๋ž˜์–ด์™€ ๊ด€๋ จ๋œ ์˜๋ฏธ ํ˜„์ƒ์„ ์ •๋Ÿ‰์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•œ ๋‹จ์–ด์ž„๋ฒ ๋”ฉ(Word Embedding) ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๋ฐฉ๋ฒ•์€ ๋”ฅ ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•๊ณผ ์–ธ์–ด ๋น…๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ์–ด์˜ ์˜๋ฏธ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ๋ฒกํ„ฐ ๊ฐ’์œผ๋กœ ํšจ๊ณผ์ ์œผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์™ธ๋ž˜์–ด์™€ ๊ด€๋ จ๋œ ์˜๋ฏธ ํ˜„์ƒ์˜ ์„ธ ๊ฐ€์ง€ ์ฃผ์ œ, ์–ดํœ˜ ๊ฒฝ์Ÿ, ์˜๋ฏธ์  ์ ์‘, ์‚ฌํšŒ์  ์˜๋ฏธ ๊ธฐ๋Šฅ๊ณผ ๋ฌธํ™”์  ๊ฒฝํ–ฅ ๋ณ€ํ™”์— ์ดˆ์ ์„ ๋งž์ถ”์–ด ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์™ธ๋ž˜์–ด์™€ ์ฐจ์šฉ์ฃผ์˜ ๋™์˜์–ด ๊ฐ„์˜ ์–ดํœ˜๊ฒฝ์Ÿ์— ์ค‘์ ์„ ๋‘”๋‹ค. ๋นˆ๋„๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ์–ดํœ˜ ๊ฒฝ์Ÿ์˜ ์œ ํ˜•(๋‹จ์–ด ๋Œ€์ฒด ๋˜๋Š” ์˜๋ฏธ ๋ถ„ํ™”)์„ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์—†๋‹ค. ์–ดํœ˜ ๊ฒฝ์Ÿ์˜ ์œ ํ˜•์„ ํŒ๋‹จํ•˜๋ ค๋ฉด ์™ธ๋ž˜์–ด์™€ ์ฐจ์šฉ์ฃผ ๋™์˜์–ด ๊ฐ„์˜ ๋ฌธ๋งฅ ๊ณต์œ  ์ƒํƒœ๋ฅผ ํŒŒ์•…ํ•ด์•ผ ํ•œ๋‹ค. ๋ฌธ๋งฅ ๊ณต์œ  ์ƒํƒœ๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ธฐํ•˜ํ•™์  ๊ฐœ๋…์„ ์ ์šฉํ•œ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐํ•˜ํ•™์  ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์€ ์™ธ๋ž˜์–ด์™€ ์ˆ˜์šฉ์–ธ์–ด์˜ ๋™์˜์–ด ์‚ฌ์ด์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์–ดํœ˜ ๊ฒฝ์Ÿ์„ ์ •๋Ÿ‰์ ์œผ๋กœ ํŒ๋‹จํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ผ๋ณธ์–ด์™€ ํ•œ๊ตญ์–ด์—์„œ์˜ ์˜์–ด ์™ธ๋ž˜์–ด์˜ ์˜๋ฏธ ์ ์‘์— ์ค‘์ ์„ ๋‘”๋‹ค. ์˜์–ด ์™ธ๋ž˜์–ด๋Š” ์ฐจ์šฉ์ฃผ์— ์ •์ฐฉํ•˜๋Š” ๊ณผ์ •์„ ํ†ตํ•ด ์˜๋ฏธ ์ ์‘์„ ๊ฒช๋Š”๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์™ธ๋ž˜์–ด์™€ ์˜์–ด ๊ณ ์œ ์–ด์™€์˜ ์˜๋ฏธ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ๋ณ€ํ™˜ ํ–‰๋ ฌ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ์˜์–ด ์™ธ๋ž˜์–ด์˜ ์ผ๋ณธ์–ด์™€ ํ•œ๊ตญ์–ด์—์„œ์˜ ์˜๋ฏธ ์ ์‘ ์ฐจ์ด๋ฅผ ๋ถ„์„ํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์˜์–ด ๋‹จ์–ด์˜ ๋‹ค์˜์„ฑ์ด ์˜๋ฏธ์ ์‘์— ์ฃผ๋Š” ์˜ํ–ฅ์„ ํ†ต๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ผ๋ณธ๊ณผ ํ•œ๊ตญ์˜ ์ตœ์‹  ๋ฌธํ™”์  ๊ฒฝํ–ฅ์„ ๋ฐ˜์˜ํ•˜๋Š” ์™ธ๋ž˜์–ด์˜ ์‚ฌํšŒ ์˜๋ฏธ์  ์—ญํ• ์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค. ์ผ๋ณธ๊ณผ ํ•œ๊ตญ ์‚ฌํšŒ์˜ ๋ฏธ๋””์–ด์—์„œ๋Š” ์ƒˆ๋กœ์šด ๋ฌธํ™”์ ์ธ ๊ฒฝํ–ฅ์ด๋‚˜ ์ด์Šˆ๊ฐ€ ์ƒ๊ฒผ์„ ๋•Œ ์™ธ๋ž˜์–ด๋ฅผ ์ž์ฃผ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ์™ธ๋ž˜์–ด๊ฐ€ ์ผ๋ณธ๊ณผ ํ•œ๊ตญ์˜ ๋ฌธํ™”์  ๊ฒฝํ–ฅ์„ ๋ฐ˜์˜ํ•˜๋Š” ์—ญํ• ์„ ๊ฐ€์งˆ ๊ฒƒ์ด ์˜ˆ์ƒ๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ์™ธ๋ž˜์–ด๊ฐ€ ๋ฌธํ™”์  ๊ฒฝํ–ฅ์˜ ๋ณ€ํ™”๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” ์ง€ํ‘œ๋กœ์„œ์˜ ์—ญํ• ์„ ํ•œ๋‹ค๋Š” ๊ฐ€์„ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๊ฐ€์„ค์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ฌธ๋งฅ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ(BERT)์„ ์‚ฌ์šฉํ•˜๊ณ  ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ์™ธ๋ž˜์–ด์˜ ๋ฌธ๋งฅ ๋ณ€ํ™”๋ฅผ ์ถ”์ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์™ธ๋ž˜์–ด์˜ ๋ฌธ๋งฅ ๋ณ€ํ™” ์ถ”์ ์„ ํ†ตํ•ด ๋ฌธํ™”์  ๊ฒฝํ–ฅ์˜ ๋ณ€ํ™”๋ฅผ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์ผ๋ณธ์–ด์™€ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด๊ฒƒ์€ ์ „์‚ฐ ๋‹ค๊ตญ์–ด ๋Œ€์กฐ ์–ธ์–ด์—ฐ๊ตฌ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๊ธฐ๋ฐ˜์˜ ์˜๋ฏธ ๋ถ„์„ ๋ฐฉ๋ฒ•์€ ๋‹ค์–ธ์–ด ๊ณ„์‚ฐ์˜๋ฏธ๋ก  ๋ฐ ๊ณ„์‚ฐ์‚ฌํšŒ์–ธ์–ดํ•™์˜ ๋ฐœ์ „์— ๋งŽ์€ ๊ธฐ์—ฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.Through cultural exchanges with foreign countries, a lot of foreign words have entered another country with a foreign culture. These foreign words, loanwords, have broadly prevailed in languages all over the world. Historical linguistics has actively studied the loanword because loanword can trigger the linguistic change within the recipient language. Loanwords affect existing words and grammar: native words become obsolete, foreign suffixes and words coin new words and phrases by combining with the native words in the recipient language, and foreign prepositions are used in the recipient language. Loanwords themselves also undergo language changes-morphological, phonological, and semantic changes-because of linguistic constraints of recipient languages through the process of integration and adaptation in the recipient language. Several fields of linguistics-morphology, phonology, and semantics-have studied these changes caused by the invasion of loanwords. Mainly loanwords introduce to the recipient language a completely new foreign product or concept that can not be expressed by the recipient language words. However, people often use loanwords for giving prestigious, luxurious, and academic images. These sociolinguistic roles of loanwords have recently received particular attention in sociolinguistics and pragmatics. Most previous works of loanwords have gathered many examples of loanwords and summarized the linguistic change patterns. Recently, corpus-based quantitative studies have started to statistically reveal several linguistic factors such as the word length influencing the successful integration and adaptation of loanwords in the recipient language. However, these frequency-based researches have difficulties quantifying the complex semantic information. Thus, the quantitative analysis of the loanword semantic phenomena has remained undeveloped. This research sheds light on the quantitative analysis of the semantic phenomena of loanwords using the Word Embedding method. Word embedding can effectively convert semantic contextual information of words to vector values with deep learning methods and big language data. This study suggests several quantitative methods for analyzing the semantic phenomena related to the loanword. This dissertation focuses on three topics of semantic phenomena related to the loanword: Lexical competition, Semantic adaptation, and Social semantic function and the cultural trend change. The first study focuses on the lexical competition between the loanword and the native synonym. Frequency can not distinguish the types of a lexical competition: Word replacement or Semantic differentiation. Judging the type of lexical competition requires to know the context sharing condition between loanwords and the native synonyms. We apply the geometrical concept to modeling the context sharing condition. This geometrical word embedding-based model quantitatively judges what lexical competitions happen between the loanwords and the native synonyms. The second study focus on the semantic adaptation of English loanwords in Japanese and Korean. The original English loanwords undergo semantic change (semantic adaptation) through the process of integration and adaptation in the recipient language. This study applies the transformation matrix method to compare the semantic difference between the loanwords and the original English words. This study extends this transformation method for a contrastive study of the semantic adaptation of English loanwords in Japanese and Korean. The third study focuses on the social semantic role of loanwords reflecting the current cultural trend in Japanese and Korean. Japanese and Korean society frequently use loanwords when new trends or issues happened. Loanwords seem to work as signals alarming the cultural trend in Japanese and Korean. Thus, we propose the hypothesis that loanwords have a role as an indicator of the cultural trend change. This study suggests the tracking method of the contextual change of loanwords through time with the pre-trained contextual embedding model (BERT) for verifying this hypothesis. This word embedding-based method can detect the cultural trend change through the contextual change of loanwords. Throughout these studies, we used our methods in Japanese and Korean data. This shows the possibility for the computational multilingual contrastive linguistic study. These word embedding-based semantic analysis methods will contribute a lot to the development of computational semantics and computational sociolinguistics in various languages.Abstract i Contents iv List of Tables viii List of Figures xi 1 Introduction 1 1.1 Overview of Loanword Study 1 1.2 Research Topics in this Dissertation 6 1.2.1 Lexical Competition between Loanword and Native Synonym 6 1.2.2 Semantic Adaptation of Loanwords 8 1.2.3 Social Semantic Function and the Cultural Trend Change 11 1.3 Methodological Background 14 1.3.1 The Vector Space Model 14 1.3.2 The Bag of Words Model 15 1.3.3 Neural Network and Neural Probabilistic Language Model 15 1.3.4 Distributional Model and Word2vec 18 1.3.5 The Contextual Word Embedding and BERT 21 1.4 Summary of this Chapter 23 2 Word Embeddings for Lexical Changes Caused by Lexical Competition between Loanwords and Native Words 25 2.1 Overview 25 2.2 Related Works 28 2.2.1 Lexical Competition in Loanword 28 2.2.2 Word Embedding Model and Semantic Change 30 2.3 Selection of Loanword and Korean Synonym Pairs 31 2.3.1 Viable Loanwords 31 2.3.2 Previous Approach: The Relative Frequency 31 2.3.3 New Approach: The Proportion Test 32 2.3.4 Technical Challenges for Performing the Proportion Test 32 2.3.5 Filtering Procedures 34 2.3.6 Handling Errors 35 2.3.7 Proportion Test and Questionnaire Survey 36 2.4 Analysis of Lexical Competition 38 2.4.1 The Geometrical Model for Analyzing the Lexical Competition 39 2.4.2 Word Embedding Model for Analyzing Lexical Competition 44 2.4.3 Result and Discussion 44 2.5 Conclusion and Future Work 48 3 Applying Word Embeddings to Measure the Semantic Adaptation of English Loanwords in Japanese and Korean 51 3.1 Overview 51 3.2 Methodology 54 3.3 Data and Experiment 55 3.4 Result and Discussion 58 3.4.1 Japanese 59 3.4.2 Korean 63 3.4.3 Comparison of Cosine Similarities of English Loanwords in Japanese and Korean 68 3.4.4 The Relationship Between the Number of Meanings and Cosine Similarities 75 3.5 Conclusion and Future Works 77 4 Detection of the Contextual Change of Loanwords and the Cultural Trend Change in Japanese and Korean through Pre-trained BERT Language Models 78 4.1 Overview 78 4.2 Related Work 81 4.2.1 Loanwords and Cultural Trend Change 81 4.2.2 Word Embeddings and Semantic Change 81 4.2.3 Contextualized Embedding and Diachronic Semantic Representation 82 4.3 The Framework 82 4.3.1 Sense Representation 82 4.3.2 Tracking the Contextual Changes 85 4.3.3 Evaluation of Frame Work 86 4.3.4 Discussion for Framework 89 4.4 The Cultural Trend Change Analysis through Loanword Contextual Change Detection 89 4.4.1 Methodology 89 4.4.2 Result and Discussion 91 4.5 Conclusion and Future Work 96 5 Conclusion and Future Works 97 5.1 Summary 97 5.2 Future Works 99 5.2.1 Revealing Statistical Law 99 5.2.2 Computational Contrastive Linguistic Study 100 5.2.3 Application to Other Semantics Tasks 100 A List of Loanword Having One Synset and One Definition in Korean CoreNet in Chapter 2 112 Abstract (In Korean) 118Docto

    CONSONANT CLUSTERS IN INDONESIAN LOANWORDS

    Get PDF
    This paper investigates two types of loanwords in Indonesian from a list published by NUSA in 1997 and the online version of Kamus Besar Bahasa Indonesia (KBBI, 2019):ย  those of Sanskrit origin, and of European origins. When languages borrow words from one another, they may employ various strategies in dealing with unfamiliar sounds and/or sound combinations. Overall, the study is conducted by means of descriptive qualitative method, having a focus on corpus research. Specifically, this research is concerned with the handling of syllable-initial consonant clusters that is not present in native Indonesian words. The two different patterns dealing with consonant clusters in loanwords are 1) The tendency for consonant cluster preservation in European loanwords; and 2) The tendency to insert a vowel sound to break up consonant clusters in Sanskrit loanwords. It happens due to the differences in the time frame and scope of Sanskrit and European language influences in Indonesia.ย The results show that onset consonant clusters have become a definite marker of loanwords in Indonesia

    Pitch Accent in Korean

    Get PDF
    Typologically, pitch-accent languages stand between stress languages like Spanish and tone languages like Shona, and share properties of both. In a stress language typically just one syllable per word is accented and bears the major stress (cf. Spanish sรกbana โ€˜sheetโ€™, sabรกna โ€˜plainโ€™, Panamรก). In a tone language the number of distinctions grows geometrically with the size of the word. So in Shona, which contrasts high vs. low tone, trisyllabic words have eight possible pitch patterns. In a canonical pitch-accent language such as Japanese, just one syllable (or mora) per word is singled out as distinctive, as in Spanish. But each syllable in the word is assigned a high or low tone (as in Shona); however, this assignment is predictable based on the location of the accented syllableKeywords: tonal accent, diachrony, phonetic realization, compounds, phonological phrases, loanwords, frequency, reconstructio

    Loan Phonology

    Get PDF
    For many different reasons, speakers borrow words from other languages to fill gaps in their own lexical inventory. The past ten years have been characterized by a great interest among phonologists in the issue of how the nativization of loanwords occurs. The general feeling is that loanword nativization provides a direct window for observing how acoustic cues are categorized in terms of the distinctive features relevant to the L1 phonological system as well as for studying L1 phonological processes in action and thus to the true synchronic phonology of L1. The collection of essays presented in this volume provides an overview of the complex issues phonologists face when investigating this phenomenon and, more generally, the ways in which unfamiliar sounds and sound sequences are adapted to converge with the native languageโ€™s sound pattern. This book is of interest to theoretical phonologists as well as to linguists interested in language contact phenomena

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages. A flow-based separation criterion and domain-specific directionality detection criteria are developed to make existing causal inference algorithms more robust against imperfect cognacy data, giving rise to two new algorithms. The Phylogenetic Lexical Flow Inference (PLFI) algorithm requires lexical features of proto-languages to be reconstructed in advance, but yields fully general phylogenetic networks, whereas the more complex Contact Lexical Flow Inference (CLFI) algorithm treats proto-languages as hidden common causes, and only returns hypotheses of historical contact situations between attested languages. The algorithms are evaluated both against a large lexical database of Northern Eurasia spanning many language families, and against simulated data generated by a new model of language contact that builds on the opening and closing of directional contact channels as primary evolutionary events. The algorithms are found to infer the existence of contacts very reliably, whereas the inference of directionality remains difficult. This currently limits the new algorithms to a role as exploratory tools for quickly detecting salient patterns in large lexical datasets, but it should soon be possible for the framework to be enhanced e.g. by confidence values for each directionality decision

    The Impact of Ideology on Lexical Borrowing in Arabic: A Synergy of Corpus Linguistics and CDA

    Get PDF
    Lexical borrowing is a natural outcome of language contact and one source of neologisms. The traditional view of lexical borrowing explains it as motivated mainly by lexical need or prestige where loans in the recipient language have more or less similar if not identical meanings with the borrowing language. Linguistic adaptation has been often seen grammatically based where grammarians or linguists assume the major task of nativizing foreign terms. This is typical in many studies on linguistic borrowing in Arabic while a secondary attention is given to semantic, sociolinguistic, and educational perspectives. The present study approached lexical borrowing as more language usersโ€™ task emphasizing their role in meaning construction. Three English loanwords in Arabic (agenda, liberal, lobby) were studied in naturally occurring language to see if their meanings and co-occurrence patterns correspond to their equivalents in English and, thus, agree with the notion of lexical need to linguistic borrowing. Some of the meanings of the loans fall under the domain of sociopolitics which is a fertile site believed to show ideological impact. Using two analytical frameworks of Sinclair (2005, 1998) and Van Dijk (2014, 2016b, 2016a), the three loanwords were investigated from corpus linguistics and CDA angles. The findings revealed different co-occurrence patterns in Arabic characterized by negative associations than in English. Negative associations were motivated by (religious, political, linguistic) ideological stances often implied in the connotations and attitudinal meanings of real language use. Ideological influence was also reproduced in Arabic dictionaries where some loanwords or their meanings are vi absent or excluded though used in formal settings. The connection between dictionary making and learning as influenced by dominant ideology was also explored

    Phonology modulates the illusory vowels in perceptual illusions: Evidence from Mandarin and English

    Get PDF
    Native speakers perceive illusory vowels when presented with sound sequences that do not respect the phonotactic constraints of their language (Dupoux, Kakehi, Hirose, Pallier, & Mehler, 1999; Kabak & Idsardi, 2007). There is, however, less work on the quality of the illusory vowel. Recently, it has been claimed that the quality of the illusory vowel is also modulated by the phonology of the language, and that the phenomenon of illusory vowels can be understood as a result of the listener reverse inferring the best parse of the underlying representation given their native language phonology and the acoustics of the input stream (Durvasula & Kahng, 2015). The view predicts that listeners are likely to hear different illusory vowels in different phonological contexts. In support of this prediction, we show through two perceptual experiments that Mandarin Chinese speakers (but not American English speakers) perceive different illusory vowels in different phonotactic contexts. Specifically, when presented with phonotactically illegal alveopalatal coda consonants, Mandarin speakers perceived an illusory /i/, but in illegal alveolar stop coda contexts, they perceived a /?/

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages

    The laws of "LOL": Computational approaches to sociolinguistic variation in online discussions

    Get PDF
    When speaking or writing, a person often chooses one form of language over another based on social constraints, including expectations in a conversation, participation in a global change, or expression of underlying attitudes. Sociolinguistic variation (e.g. choosing "going" versus "goin'") can reveal consistent social differences such as dialects and consistent social motivations such as audience design. While traditional sociolinguistics studies variation in spoken communication, computational sociolinguistics investigates written communication on social media. The structured nature of online discussions and the diversity of language patterns allow computational sociolinguists to test highly specific hypotheses about communication, such different configurations of listener "audience." Studying communication choices in online discussions sheds light on long-standing sociolinguistic questions that are hard to tackle, and helps social media platforms anticipate their members' complicated patterns of participation in conversations. To that end, this thesis explores open questions in sociolinguistic research by quantifying language variation patterns in online discussions. I leverage the "birds-eye" view of social media to focus on three major questions in sociolinguistics research relating to authors' participation in online discussions. First, I test the role of conversation expectations in the context of content bans and crisis events, and I show that authors vary their language to adjust to audience expectations in line with community standards and shared knowledge. Next, I investigate language change in online discussions and show that language structure, more than social context, explains word adoption. Lastly, I investigate the expression of social attitudes among multilingual speakers, and I find that such attitudes can explain language choice when the attitudes have a clear social meaning based on the discussion context. This thesis demonstrates the rich opportunities that social media provides for addressing sociolinguistic questions and provides insight into how people adapt to the communication affordances in online platforms.Ph.D

    Variation in the use of innovative katakana in a Japanese corpus

    Get PDF
    This study assesses writerโ€™s use of innovative katakana forms in texts in a corpus of written Japanese and examines the effects of linguistic and social factors on the use of innovative katakana. Focusing specifically on innovative katakana that represent sequences including /w/ and /v/ phones, one of the aims of the study is to investigate whether the presence of /w/ in the native phonological system encourages the use of innovative /w/ forms in sequences that do not appear in native lexical items. This is in contrast to forms containing /v/ which lack native counterparts in any context in the Japanese phonological system. Another objective of this paper is to investigate whether the likelihood of using innovative katakana is affected by position within a word. Also, this research applied the framework of variationist sociolinguistics to identify which social factors significantly affect the innovative writing behavior. To answers these questions, this study uses the data collected from the Chunagon database for descriptive analysis and multivariate analysis. The Chunagon corpus is a written corpus compiled by the National Institution of Japanese Language and Linguistics which contains approximately 100 million words and includes texts published between the 1970's and the 2000's. The results show that the presence of native phones triggers a higher usage of innovative katakana in loanword forms containing /w/ as compared to /v/ forms. The findings also indicate that innovative /w/ and /v/ forms occurred more often in word-initial position than in medial and final positions. Concerning the social factors, the multivariate results show that there is no effect for age and gender for the /w/ variable but for the /v/ variable, there is an effect of gender. Only the /v/ variable showed expected innovative preference in informal (webs and books) genres whereas the /w/ variable showed an unanticipated higher innovative usage in a formal register which comprises governmental and legal genres. In sum, this study has presented some significant findings of showing linguistic reasons (the presence of native phone and the positional effect) that would enhance the use of innovative katakana in written texts
    • โ€ฆ
    corecore