714 research outputs found

    ๋‹จ์–ด์ž„๋ฒ ๋”ฉ์„ ์ด์šฉํ•œ ์ผ๋ณธ์–ด์™€ ํ•œ๊ตญ์–ด์—์„œ์˜ ์˜์–ด ์™ธ๋ž˜์–ด ์˜๋ฏธ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ธ๋ฌธ๋Œ€ํ•™ ์–ธ์–ดํ•™๊ณผ, 2021. 2. ์‹ ํšจํ•„.์ „ ์„ธ๊ณ„์ ์œผ๋กœ ํ™œ๋ฐœํ•œ ๋ฌธํ™” ๊ต๋ฅ˜๊ฐ€ ์ด๋ฃจ์–ด์ง์— ๋”ฐ๋ผ ์™ธ๋ž˜์–ด๊ฐ€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š”๋ฐ, ์™ธ๋ž˜์–ด์˜ ์ˆ˜์šฉ ๊ณผ์ •์—์„œ ๋‹ค์–‘ํ•œ ์–ธ์–ด์  ํ˜„์ƒ์ด ์ผ์–ด๋‚œ๋‹ค. ์™ธ๋ž˜์–ด๊ฐ€ ์ˆ˜์šฉ๋จ์— ๋”ฐ๋ผ ์›๋ž˜ ์ฐจ์šฉ์ฃผ์— ์กด์žฌํ–ˆ๋˜ ๋‹จ์–ด๊ฐ€ ์‚ฌ๋ผ์ง€๊ธฐ๋„ ํ•˜๊ณ , ์ฐจ์šฉ์–ด์˜ ์ ‘๋ฏธ์‚ฌ์™€ ๋‹จ์–ด๊ฐ€ ์ฐจ์šฉ์ฃผ์˜ ๋‹จ์–ด์™€ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋‹จ์–ด๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ๋„ ํ•˜๋ฉฐ, ์ฐจ์šฉ์–ด์˜ ์ „์น˜์‚ฌ๊ฐ€ ์™ธ๋ž˜์–ด๋กœ์„œ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ๋˜๊ธฐ๋„ ํ•œ๋‹ค. ๋˜ํ•œ, ์™ธ๋ž˜์–ด ์ž์ฒด๋Š” ์ฐจ์šฉ์ฃผ์˜ ์–ธ์–ด์  ์ œ์•ฝ์œผ๋กœ ์ธํ•ด ์™ธ๋ž˜์–ด์˜ ์ •์ฐฉ ๊ณผ์ •์—์„œ ํ˜•ํƒœ, ์Œ์šด ๋ฐ ์˜๋ฏธ ๋ณ€ํ™”๋ฅผ ๊ฒช๋Š”๋‹ค. ์ด์™€ ๊ฐ™์ด, ์™ธ๋ž˜์–ด์˜ ์ˆ˜์šฉ ๊ณผ์ •์—์„œ ์ฐจ์šฉ์ฃผ์™€ ์ฐจ์šฉ์–ด์˜ ๋‹ค์–‘ํ•œ ๋ณ€ํ™”๊ฐ€ ์ผ์–ด๋‚˜๊ธฐ ๋•Œ๋ฌธ์— ์™ธ๋ž˜์–ด๋Š” ์—ญ์‚ฌ์–ธ์–ดํ•™์˜ ํ˜•ํƒœ๋ก , ์Œ์šด๋ก , ์˜๋ฏธ๋ก ๊ณผ ๊ฐ™์€ ์—ฌ๋Ÿฌ ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•˜๊ฒŒ ์—ฐ๊ตฌ๋˜๋Š” ์ฃผ์ œ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์™ธ๋ž˜์–ด๋Š” ์ฃผ๋กœ ์ฐจ์šฉ์ฃผ์˜ ๋‹จ์–ด๋กœ๋Š” ํ‘œํ˜„ํ•  ์ˆ˜ ์—†๋Š” ์™„์ „ํžˆ ์ƒˆ๋กœ์šด ์™ธ๊ตญ ์ œํ’ˆ๋ช…์ด๋‚˜ ๊ฐœ๋…์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ํ•œํŽธ์œผ๋กœ๋Š” ์ด๋ฏธ ๊ณ ์œ ์–ด๋กœ ์กด์žฌํ•˜๋Š” ๋‹จ์–ด๋ฅผ ์ข€ ๋” ๊ณ ๊ธ‰์Šค๋Ÿฝ๊ณ  ํ•™์ˆ ์ ์ธ ์ด๋ฏธ์ง€๋กœ ๋ฐ”๊พธ๊ธฐ ์œ„ํ•ด ์™ธ๋ž˜์–ด๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•˜๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ์™ธ๋ž˜์–ด์˜ ์‚ฌํšŒ์–ธ์–ดํ•™์  ์—ญํ• ์€ ์ตœ๊ทผ ํŠนํžˆ ์ฃผ๋ชฉ์„ ๋ฐ›๊ณ  ์žˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ์™ธ๋ž˜์–ด ์„ ํ–‰์—ฐ๊ตฌ๋Š” ์™ธ๋ž˜์–ด์˜ ๋งŽ์€ ์˜ˆ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ์–ธ์–ด๋ณ€ํ™” ํŒจํ„ด์„ ์ •๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ง„ํ–‰๋˜์—ˆ๋‹ค. ์ตœ๊ทผ ๋ง๋ญ‰์น˜ ๊ธฐ๋ฐ˜์˜ ์ •๋Ÿ‰์  ์—ฐ๊ตฌ์—์„œ๋Š” ๋‹จ์–ด ๊ธธ์ด์™€ ๊ฐ™์€ ์–ธ์–ดํ•™์ ์ธ ์š”์ธ๋“ค์ด ์™ธ๋ž˜์–ด๊ฐ€ ์ฐจ์šฉ์ฃผ์— ์„ฑ๊ณต์ ์œผ๋กœ ์ •์ฐฉํ•˜๋Š” ๊ณผ์ •์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€ ํ†ต๊ณ„์ ์œผ๋กœ ์—ฐ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๋งŽ์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๋‹จ์–ด์˜ ๋นˆ๋„๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ๋Š” ๋‹จ์–ด์˜ ๋ณต์žกํ•œ ์˜๋ฏธ ์ •๋ณด๋ฅผ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ๋ฐ์—๋Š” ์–ด๋ ค์›€์ด ์žˆ์–ด ์™ธ๋ž˜์–ด ์˜๋ฏธ ํ˜„์ƒ์— ๋Œ€ํ•œ ์ •๋Ÿ‰์  ๋ถ„์„์—ฐ๊ตฌ๋Š” ์•„์ง ์ง„ํ–‰๋˜์ง€ ์•Š์•˜๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์™ธ๋ž˜์–ด์™€ ๊ด€๋ จ๋œ ์˜๋ฏธ ํ˜„์ƒ์„ ์ •๋Ÿ‰์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•œ ๋‹จ์–ด์ž„๋ฒ ๋”ฉ(Word Embedding) ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๋ฐฉ๋ฒ•์€ ๋”ฅ ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•๊ณผ ์–ธ์–ด ๋น…๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹จ์–ด์˜ ์˜๋ฏธ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ๋ฒกํ„ฐ ๊ฐ’์œผ๋กœ ํšจ๊ณผ์ ์œผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์™ธ๋ž˜์–ด์™€ ๊ด€๋ จ๋œ ์˜๋ฏธ ํ˜„์ƒ์˜ ์„ธ ๊ฐ€์ง€ ์ฃผ์ œ, ์–ดํœ˜ ๊ฒฝ์Ÿ, ์˜๋ฏธ์  ์ ์‘, ์‚ฌํšŒ์  ์˜๋ฏธ ๊ธฐ๋Šฅ๊ณผ ๋ฌธํ™”์  ๊ฒฝํ–ฅ ๋ณ€ํ™”์— ์ดˆ์ ์„ ๋งž์ถ”์–ด ์—ฐ๊ตฌ๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์™ธ๋ž˜์–ด์™€ ์ฐจ์šฉ์ฃผ์˜ ๋™์˜์–ด ๊ฐ„์˜ ์–ดํœ˜๊ฒฝ์Ÿ์— ์ค‘์ ์„ ๋‘”๋‹ค. ๋นˆ๋„๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ์–ดํœ˜ ๊ฒฝ์Ÿ์˜ ์œ ํ˜•(๋‹จ์–ด ๋Œ€์ฒด ๋˜๋Š” ์˜๋ฏธ ๋ถ„ํ™”)์„ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์—†๋‹ค. ์–ดํœ˜ ๊ฒฝ์Ÿ์˜ ์œ ํ˜•์„ ํŒ๋‹จํ•˜๋ ค๋ฉด ์™ธ๋ž˜์–ด์™€ ์ฐจ์šฉ์ฃผ ๋™์˜์–ด ๊ฐ„์˜ ๋ฌธ๋งฅ ๊ณต์œ  ์ƒํƒœ๋ฅผ ํŒŒ์•…ํ•ด์•ผ ํ•œ๋‹ค. ๋ฌธ๋งฅ ๊ณต์œ  ์ƒํƒœ๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ธฐํ•˜ํ•™์  ๊ฐœ๋…์„ ์ ์šฉํ•œ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐํ•˜ํ•™์  ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์€ ์™ธ๋ž˜์–ด์™€ ์ˆ˜์šฉ์–ธ์–ด์˜ ๋™์˜์–ด ์‚ฌ์ด์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์–ดํœ˜ ๊ฒฝ์Ÿ์„ ์ •๋Ÿ‰์ ์œผ๋กœ ํŒ๋‹จํ•จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ผ๋ณธ์–ด์™€ ํ•œ๊ตญ์–ด์—์„œ์˜ ์˜์–ด ์™ธ๋ž˜์–ด์˜ ์˜๋ฏธ ์ ์‘์— ์ค‘์ ์„ ๋‘”๋‹ค. ์˜์–ด ์™ธ๋ž˜์–ด๋Š” ์ฐจ์šฉ์ฃผ์— ์ •์ฐฉํ•˜๋Š” ๊ณผ์ •์„ ํ†ตํ•ด ์˜๋ฏธ ์ ์‘์„ ๊ฒช๋Š”๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์™ธ๋ž˜์–ด์™€ ์˜์–ด ๊ณ ์œ ์–ด์™€์˜ ์˜๋ฏธ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ๋ณ€ํ™˜ ํ–‰๋ ฌ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ์˜์–ด ์™ธ๋ž˜์–ด์˜ ์ผ๋ณธ์–ด์™€ ํ•œ๊ตญ์–ด์—์„œ์˜ ์˜๋ฏธ ์ ์‘ ์ฐจ์ด๋ฅผ ๋ถ„์„ํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์˜์–ด ๋‹จ์–ด์˜ ๋‹ค์˜์„ฑ์ด ์˜๋ฏธ์ ์‘์— ์ฃผ๋Š” ์˜ํ–ฅ์„ ํ†ต๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ผ๋ณธ๊ณผ ํ•œ๊ตญ์˜ ์ตœ์‹  ๋ฌธํ™”์  ๊ฒฝํ–ฅ์„ ๋ฐ˜์˜ํ•˜๋Š” ์™ธ๋ž˜์–ด์˜ ์‚ฌํšŒ ์˜๋ฏธ์  ์—ญํ• ์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค. ์ผ๋ณธ๊ณผ ํ•œ๊ตญ ์‚ฌํšŒ์˜ ๋ฏธ๋””์–ด์—์„œ๋Š” ์ƒˆ๋กœ์šด ๋ฌธํ™”์ ์ธ ๊ฒฝํ–ฅ์ด๋‚˜ ์ด์Šˆ๊ฐ€ ์ƒ๊ฒผ์„ ๋•Œ ์™ธ๋ž˜์–ด๋ฅผ ์ž์ฃผ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ์™ธ๋ž˜์–ด๊ฐ€ ์ผ๋ณธ๊ณผ ํ•œ๊ตญ์˜ ๋ฌธํ™”์  ๊ฒฝํ–ฅ์„ ๋ฐ˜์˜ํ•˜๋Š” ์—ญํ• ์„ ๊ฐ€์งˆ ๊ฒƒ์ด ์˜ˆ์ƒ๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด๋Ÿฌํ•œ ์™ธ๋ž˜์–ด๊ฐ€ ๋ฌธํ™”์  ๊ฒฝํ–ฅ์˜ ๋ณ€ํ™”๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” ์ง€ํ‘œ๋กœ์„œ์˜ ์—ญํ• ์„ ํ•œ๋‹ค๋Š” ๊ฐ€์„ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๊ฐ€์„ค์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ฌธ๋งฅ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ(BERT)์„ ์‚ฌ์šฉํ•˜๊ณ  ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ์™ธ๋ž˜์–ด์˜ ๋ฌธ๋งฅ ๋ณ€ํ™”๋ฅผ ์ถ”์ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์™ธ๋ž˜์–ด์˜ ๋ฌธ๋งฅ ๋ณ€ํ™” ์ถ”์ ์„ ํ†ตํ•ด ๋ฌธํ™”์  ๊ฒฝํ–ฅ์˜ ๋ณ€ํ™”๋ฅผ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ์ผ๋ณธ์–ด์™€ ํ•œ๊ตญ์–ด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์ด๊ฒƒ์€ ์ „์‚ฐ ๋‹ค๊ตญ์–ด ๋Œ€์กฐ ์–ธ์–ด์—ฐ๊ตฌ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ ๊ธฐ๋ฐ˜์˜ ์˜๋ฏธ ๋ถ„์„ ๋ฐฉ๋ฒ•์€ ๋‹ค์–ธ์–ด ๊ณ„์‚ฐ์˜๋ฏธ๋ก  ๋ฐ ๊ณ„์‚ฐ์‚ฌํšŒ์–ธ์–ดํ•™์˜ ๋ฐœ์ „์— ๋งŽ์€ ๊ธฐ์—ฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.Through cultural exchanges with foreign countries, a lot of foreign words have entered another country with a foreign culture. These foreign words, loanwords, have broadly prevailed in languages all over the world. Historical linguistics has actively studied the loanword because loanword can trigger the linguistic change within the recipient language. Loanwords affect existing words and grammar: native words become obsolete, foreign suffixes and words coin new words and phrases by combining with the native words in the recipient language, and foreign prepositions are used in the recipient language. Loanwords themselves also undergo language changes-morphological, phonological, and semantic changes-because of linguistic constraints of recipient languages through the process of integration and adaptation in the recipient language. Several fields of linguistics-morphology, phonology, and semantics-have studied these changes caused by the invasion of loanwords. Mainly loanwords introduce to the recipient language a completely new foreign product or concept that can not be expressed by the recipient language words. However, people often use loanwords for giving prestigious, luxurious, and academic images. These sociolinguistic roles of loanwords have recently received particular attention in sociolinguistics and pragmatics. Most previous works of loanwords have gathered many examples of loanwords and summarized the linguistic change patterns. Recently, corpus-based quantitative studies have started to statistically reveal several linguistic factors such as the word length influencing the successful integration and adaptation of loanwords in the recipient language. However, these frequency-based researches have difficulties quantifying the complex semantic information. Thus, the quantitative analysis of the loanword semantic phenomena has remained undeveloped. This research sheds light on the quantitative analysis of the semantic phenomena of loanwords using the Word Embedding method. Word embedding can effectively convert semantic contextual information of words to vector values with deep learning methods and big language data. This study suggests several quantitative methods for analyzing the semantic phenomena related to the loanword. This dissertation focuses on three topics of semantic phenomena related to the loanword: Lexical competition, Semantic adaptation, and Social semantic function and the cultural trend change. The first study focuses on the lexical competition between the loanword and the native synonym. Frequency can not distinguish the types of a lexical competition: Word replacement or Semantic differentiation. Judging the type of lexical competition requires to know the context sharing condition between loanwords and the native synonyms. We apply the geometrical concept to modeling the context sharing condition. This geometrical word embedding-based model quantitatively judges what lexical competitions happen between the loanwords and the native synonyms. The second study focus on the semantic adaptation of English loanwords in Japanese and Korean. The original English loanwords undergo semantic change (semantic adaptation) through the process of integration and adaptation in the recipient language. This study applies the transformation matrix method to compare the semantic difference between the loanwords and the original English words. This study extends this transformation method for a contrastive study of the semantic adaptation of English loanwords in Japanese and Korean. The third study focuses on the social semantic role of loanwords reflecting the current cultural trend in Japanese and Korean. Japanese and Korean society frequently use loanwords when new trends or issues happened. Loanwords seem to work as signals alarming the cultural trend in Japanese and Korean. Thus, we propose the hypothesis that loanwords have a role as an indicator of the cultural trend change. This study suggests the tracking method of the contextual change of loanwords through time with the pre-trained contextual embedding model (BERT) for verifying this hypothesis. This word embedding-based method can detect the cultural trend change through the contextual change of loanwords. Throughout these studies, we used our methods in Japanese and Korean data. This shows the possibility for the computational multilingual contrastive linguistic study. These word embedding-based semantic analysis methods will contribute a lot to the development of computational semantics and computational sociolinguistics in various languages.Abstract i Contents iv List of Tables viii List of Figures xi 1 Introduction 1 1.1 Overview of Loanword Study 1 1.2 Research Topics in this Dissertation 6 1.2.1 Lexical Competition between Loanword and Native Synonym 6 1.2.2 Semantic Adaptation of Loanwords 8 1.2.3 Social Semantic Function and the Cultural Trend Change 11 1.3 Methodological Background 14 1.3.1 The Vector Space Model 14 1.3.2 The Bag of Words Model 15 1.3.3 Neural Network and Neural Probabilistic Language Model 15 1.3.4 Distributional Model and Word2vec 18 1.3.5 The Contextual Word Embedding and BERT 21 1.4 Summary of this Chapter 23 2 Word Embeddings for Lexical Changes Caused by Lexical Competition between Loanwords and Native Words 25 2.1 Overview 25 2.2 Related Works 28 2.2.1 Lexical Competition in Loanword 28 2.2.2 Word Embedding Model and Semantic Change 30 2.3 Selection of Loanword and Korean Synonym Pairs 31 2.3.1 Viable Loanwords 31 2.3.2 Previous Approach: The Relative Frequency 31 2.3.3 New Approach: The Proportion Test 32 2.3.4 Technical Challenges for Performing the Proportion Test 32 2.3.5 Filtering Procedures 34 2.3.6 Handling Errors 35 2.3.7 Proportion Test and Questionnaire Survey 36 2.4 Analysis of Lexical Competition 38 2.4.1 The Geometrical Model for Analyzing the Lexical Competition 39 2.4.2 Word Embedding Model for Analyzing Lexical Competition 44 2.4.3 Result and Discussion 44 2.5 Conclusion and Future Work 48 3 Applying Word Embeddings to Measure the Semantic Adaptation of English Loanwords in Japanese and Korean 51 3.1 Overview 51 3.2 Methodology 54 3.3 Data and Experiment 55 3.4 Result and Discussion 58 3.4.1 Japanese 59 3.4.2 Korean 63 3.4.3 Comparison of Cosine Similarities of English Loanwords in Japanese and Korean 68 3.4.4 The Relationship Between the Number of Meanings and Cosine Similarities 75 3.5 Conclusion and Future Works 77 4 Detection of the Contextual Change of Loanwords and the Cultural Trend Change in Japanese and Korean through Pre-trained BERT Language Models 78 4.1 Overview 78 4.2 Related Work 81 4.2.1 Loanwords and Cultural Trend Change 81 4.2.2 Word Embeddings and Semantic Change 81 4.2.3 Contextualized Embedding and Diachronic Semantic Representation 82 4.3 The Framework 82 4.3.1 Sense Representation 82 4.3.2 Tracking the Contextual Changes 85 4.3.3 Evaluation of Frame Work 86 4.3.4 Discussion for Framework 89 4.4 The Cultural Trend Change Analysis through Loanword Contextual Change Detection 89 4.4.1 Methodology 89 4.4.2 Result and Discussion 91 4.5 Conclusion and Future Work 96 5 Conclusion and Future Works 97 5.1 Summary 97 5.2 Future Works 99 5.2.1 Revealing Statistical Law 99 5.2.2 Computational Contrastive Linguistic Study 100 5.2.3 Application to Other Semantics Tasks 100 A List of Loanword Having One Synset and One Definition in Korean CoreNet in Chapter 2 112 Abstract (In Korean) 118Docto

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Advancing natural language processing in political science

    Get PDF

    Developing natural language processing instruments to study sociotechnical systems

    Get PDF
    Identifying temporal linguistic patterns and tracing social amplification across communities has always been vital to understanding modern sociotechnical systems. Now, well into the age of information technology, the growing digitization of text archives powered by machine learning systems has enabled an enormous number of interdisciplinary studies to examine the coevolution of language and culture. However, most research in that domain investigates formal textual records, such as books and newspapers. In this work, I argue that the study of conversational text derived from social media is just as important. I present four case studies to identify and investigate societal developments in longitudinal social media streams with high temporal resolution spanning over 100 languages. These case studies show how everyday conversations on social media encode a unique perspective that is often complementary to observations derived from more formal texts. This unique perspective improves our understanding of modern sociotechnical systems and enables future research in computational linguistics, social science, and behavioral science

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-ยญโ€it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall โ€œCavallerizza Realeโ€. The CLiC-ยญโ€it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    The Democratization of News - Analysis and Behavior Modeling of Users in the Context of Online News Consumption

    Get PDF
    Die Erfindung des Internets ebnete den Weg fรผr die Demokratisierung von Information. Die Tatsache, dass Nachrichten fรผr die breite ร–ffentlichkeit zugรคnglicher wurden, barg wichtige politische Versprechen, wie zum Beispiel das Erreichen von zuvor uninformierten und daher oft inaktiven Bรผrgern. Diese konnten sich nun dank des Internets tagesaktuell รผber das politische Geschehen informieren und selbst politisch engagieren. Wรคhrend viele Politiker und Journalisten ein Jahrzehnt lang mit dieser Entwicklung zufrieden waren, รคnderte sich die Situation mit dem Aufkommen der sozialen Online-Netzwerke (OSN). Diese OSNs sind heute nahezu allgegenwรคrtig โ€“ so beziehen inzwischen 67%67\% der Amerikaner zumindest einen Teil ihrer Nachrichten รผber die sozialen Medien. Dieser Trend hat die Kosten fรผr die Verรถffentlichung von Inhalten weiter gesenkt. Dies sah zunรคchst nach einer positiven Entwicklung aus, stellt inzwischen jedoch ein ernsthaftes Problem fรผr Demokratien dar. Anstatt dass eine schier unendliche Menge an leicht zugรคnglichen Informationen uns klรผger machen, wird die Menge an Inhalten zu einer Belastung. Eine ausgewogene Nachrichtenauswahl muss einer Flut an Beitrรคgen und Themen weichen, die durch das digitale soziale Umfeld des Nutzers gefiltert werden. Dies fรถrdert die politische Polarisierung und ideologische Segregation. Mehr als die Hรคlfte der OSN-Nutzer trauen zudem den Nachrichten, die sie lesen, nicht mehr (54%54\% machen sich Sorgen wegen Falschnachrichten). In dieses Bild passt, dass Studien berichten, dass Nutzer von OSNs dem Populismus extrem linker und rechter politischer Akteure stรคrker ausgesetzt sind, als Personen ohne Zugang zu sozialen Medien. Um die negativen Effekt dieser Entwicklung abzumildern, trรคgt meine Arbeit zum einen zum Verstรคndnis des Problems bei und befasst sich mit Grundlagenforschung im Bereich der Verhaltensmodellierung. AbschlieรŸend beschรคftigen wir uns mit der Gefahr der Beeinflussung der Internetnutzer durch soziale Bots und prรคsentieren eine auf Verhaltensmodellierung basierende Lรถsung. Zum besseren Verstรคndnis des Nachrichtenkonsums deutschsprachiger Nutzer in OSNs, haben wir deren Verhalten auf Twitter analysiert und die Reaktionen auf kontroverse - teils verfassungsfeindliche - und nicht kontroverse Inhalte verglichen. Zusรคtzlich untersuchten wir die Existenz von Echokammern und รคhnlichen Phรคnomenen. Hinsichtlich des Nutzerverhaltens haben wir uns auf Netzwerke konzentriert, die ein komplexeres Nutzerverhalten zulassen. Wir entwickelten probabilistische Verhaltensmodellierungslรถsungen fรผr das Clustering und die Segmentierung von Zeitserien. Neben den Beitrรคgen zum Verstรคndnis des Problems haben wir Lรถsungen zur Erkennung automatisierter Konten entwickelt. Diese Bots nehmen eine wichtige Rolle in der frรผhen Phase der Verbreitung von Fake News ein. Unser Expertenmodell - basierend auf aktuellen Deep-Learning-Lรถsungen - identifiziert, z. B., automatisierte Accounts anhand ihres Verhaltens. Meine Arbeit sensibilisiert fรผr diese negative Entwicklung und befasst sich mit der Grundlagenforschung im Bereich der Verhaltensmodellierung. Auch wird auf die Gefahr der Beeinflussung durch soziale Bots eingegangen und eine auf Verhaltensmodellierung basierende Lรถsung prรคsentiert

    Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companiesโ€™ Customers

    Get PDF
    The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining
    • โ€ฆ
    corecore