24 research outputs found

    Multi-label Patent Classification with Attention Mechanism

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2019. 2. ์กฐ์„ฑ์ค€.์ „ ์„ธ๊ณ„์ ์œผ๋กœ ์ง€์  ์žฌ์‚ฐ๊ถŒ์— ๊ด€ํ•œ ํŠนํ—ˆ ์ถœ์›์€ ๊ณ„์†ํ•ด์„œ ์ฆ๊ฐ€ํ•˜๋Š” ์ถ”์„ธ์ด๋‹ค. ํ•˜์ง€๋งŒ ํŠนํ—ˆ ์‹ฌ์‚ฌ๋Š” ์—ฌ์ „ํžˆ ์†Œ์ˆ˜์˜ ์ „๋ฌธ์ ์ธ ์ง€์‹์„ ๊ฐ–์ถ˜ ์‹ฌ์‚ฌ๊ด€๋“ค์— ์˜์กดํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ํŠนํ—ˆ์ฒญ์˜ ๋“ฑ๋ก ์Šน์ธ์„ ๋ฐ›๋Š”๋ฐ๊นŒ์ง€ ๊ธด ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ํŠนํ—ˆ ์ •๋ณด๋ฅผ ๊ธฐ์ˆ ์  ๋ถ„์•ผ์— ๋”ฐ๋ผ ์ž๋™์ ์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํžˆ ์ด๋ฃจ์–ด์ ธ ์™”๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ตœ๊ทผ ์ปดํ“จํ„ฐ ๋น„์ „์— ์ด์–ด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์—์„œ๋„ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” ๋”ฅ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ํŠนํ—ˆ ๋ฌธ์„œ์˜ ๋‹ค์ค‘ ๋ ˆ์ด๋ธ” ๋ถ„๋ฅ˜ ๋ฌธ์ œ์— ์ ‘๊ทผํ•˜๊ณ ์ž ํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ GRU ๊ธฐ๋ฐ˜์˜ ๋ฌธ์„œ ์ธ์ฝ”๋”์™€ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ™œ์šฉํ•˜์—ฌ ํŠนํ—ˆ ๋ฌธ์„œ์˜ ๊ตญ์ œํŠนํ—ˆ๋ถ„๋ฅ˜(IPC) ์ฝ”๋“œ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ชจ๋ธ์˜ ํ•™์Šต๊ณผ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ์•ž์„  ์—ฐ๊ตฌ์—์„œ ์‚ฌ์šฉํ•œ ํŠนํ—ˆ ๋ฌธ์„œ ๋ฐ์ดํ„ฐ์…‹ USPTO-2M์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ •๋ฐ€๋„(Precision), ์žฌํ˜„์œจ(Recall), F ์ ์ˆ˜๋ฅผ ํ†ตํ•ด ํ‰๊ฐ€ํ•œ๋‹ค. ๋˜ํ•œ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ํŠนํ—ˆ ๋ฌธ์„œ์˜ ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ๋‹จ์–ด๋ณ„ ์˜ํ–ฅ๋ ฅ์„ ๋ถ„์„ํ•˜์—ฌ ํ‚ค์›Œ๋“œ๋ฅผ ํƒ์ƒ‰ํ•œ๋‹ค. ํŠนํžˆ ํŠนํ—ˆ ๋ฌธ์„œ์˜ ๋‹จ์–ด๋ณ„ ์–ดํ…์…˜ ์Šค์ฝ”์–ด์˜ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•ด ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ๊ธฐ์—ฌ๋„๋ฅผ ๋‹จ์–ด ๋‹จ์œ„๋กœ ๋น„๊ตํ•˜๊ณ  ๋น„์ค‘์ด ๋†’์€ ๋‹จ์–ด๋ฅผ ํ‚ค์›Œ๋“œ๋กœ ์„ ๋ณ„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ–ฅํ›„ ํŠนํ—ˆ ๋ถ„์„์ด๋‚˜ ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰์—์„œ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์˜์˜๋ฅผ ๊ฐ–๋Š”๋‹ค.Recently, the growth of number of patent application is unprecedented globally. Meanwhile the patent examination is still strongly dependent on manual works by few patent experts, which slows the overall patent registration process. Therefore, an automatic patent classification algorithm is necessary. In this paper, we propose an effective multi-label patent classification algorithm based on the GRU encoder and attention mechanism. We use the USPTO-2M data set, which consists of about 2 million US patent documents, to train our patent classification model. Precision, recall, and F score are used to evaluate our model on multi-label patent classification task. By visualizing the attention scores, we could identify and analyze keywords from each patent document which determine the context and IPC codes for subclass level.์ดˆ๋ก i ๋ชฉ์ฐจ iii ํ‘œ ๋ชฉ์ฐจ iv ๊ทธ๋ฆผ ๋ชฉ์ฐจ v ์ œ 1์žฅ ์„œ๋ก  1 1.1 ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋™๊ธฐ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 ์—ฐ๊ตฌ ๋ชฉ์  ๋ฐ ๋ฌธ์ œ ์ •์˜ . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 ๋…ผ๋ฌธ ๊ตฌ์„ฑ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 ์ œ 2์žฅ ๊ด€๋ จ ์—ฐ๊ตฌ 4 2.1 ํŠนํ—ˆ ๋ฌธ์„œ์˜ ํŠน์ง• . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 ๋‹จ์–ด ์ž„๋ฒ ๋”ฉ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 ๋ฌธ์„œ ๋ถ„๋ฅ˜ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 ํŠนํ—ˆ ๋ถ„๋ฅ˜ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 ์ œ 3์žฅ ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ• 11 3.1 ์–ดํ…์…˜ ๊ธฐ๋ฐ˜ ํŠนํ—ˆ ๋ฌธ์„œ ๋ถ„๋ฅ˜ ๋ชจ๋ธ . . . . . . . . . . . . . . . . . . . . . 12 3.1.1 GRU ๊ธฐ๋ฐ˜ ๋‹จ์–ด ์‹œํ€€์Šค ์ธ์ฝ”๋” . . . . . . . . . . . . . . . . . . 12 3.1.2 ์–ดํ…์…˜ ๊ธฐ๋ฐ˜ ํŠนํ—ˆ ๋ฌธ์„œ ์ธ์ฝ”๋” . . . . . . . . . . . . . . . . . . 13 3.1.3 ํŠนํ—ˆ ๋ฌธ์„œ ๋ถ„๋ฅ˜ . . . . . . . . . . . . . . . . . . . . . . . . . . 15 ์ œ 4์žฅ ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ฐ ๋ถ„์„ 17 4.1 ๋ฐ์ดํ„ฐ์…‹ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 ํ‰๊ฐ€ ๋ฐฉ๋ฒ• . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 ๋ชจ๋ธ์˜ ๊ตฌ์„ฑ ๋ฐ ํ•™์Šต . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4 ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ถ„์„ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.5 ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ™œ์šฉํ•œ ํ‚ค์›Œ๋“œ ํƒ์ƒ‰ . . . . . . . . . . . . . . . . . 22 ์ œ 5์žฅ ๊ฒฐ๋ก  33 ์ฐธ๊ณ ๋ฌธํ—Œ 35 Abstract 43Maste
    corecore