1 research outputs found

    User Profiling with Installed Applications on Smartphone

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์‚ฐ์—…๊ณตํ•™๊ณผ, 2017. 2. ๋ฐ•์ข…ํ—Œ.๊ฐœ์ธํ™” ๊ธฐ๊ธฐ์ธ ์Šค๋งˆํŠธ ํฐ์˜ ์‚ฌ์šฉ์ด ๋ณดํŽธํ™” ๋จ์— ๋”ฐ๋ผ ๊ฐœ์ธํ™” ์„œ๋น„์Šค์— ๋Œ€ํ•œ ์š”๊ตฌ๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์ธ๊ตฌํ†ต๊ณ„ํ•™ ์ •๋ณด๋Š” ๊ฐœ์ธํ™” ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•  ๋•Œ ์œ ์šฉํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ •๋ณด์ด๋‹ค. ๋”ฐ๋ผ์„œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ†ต๊ณ„ ํ•™์Šต์„ ์ด์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์ธ๊ตฌํ†ต๊ณ„ํ•™ ์ •๋ณด๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์–ด์™”๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์‚ฌ์šฉ์ž์˜ ๊ด€์‹ฌ์‚ฌ์™€ ์ƒํ™œ์Šต๊ด€์„ ๋ฐ˜์˜ํ•˜๊ณ  ์žˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์‚ฌ์šฉ์ž๋กœ๋ถ€ํ„ฐ ๊ถŒํ•œ์„ ํš๋“ํ•˜์ง€ ์•Š๊ณ  ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ˆœ๊ฐ„์ ์œผ๋กœ ์ „์ฒด ๋ชฉ๋ก์„ ์ˆ˜์ง‘ํ•  ์ˆ˜ ์žˆ์–ด ์ˆ˜์ง‘ ๋น„์šฉ์„ ์ตœ์†Œํ™” ํ•  ์ˆ˜ ์žˆ๋Š” ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ค์น˜๋ชฉ๋ก์„ ์ด์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์„ฑ๋ณ„, ์—ฐ๋ น, ์—ฐ์• ์ƒํƒœ, ๊ฑฐ์ฃผํ˜•ํƒœ, ๋™๊ฑฐ์—ฌ๋ถ€, ์ˆ˜์ž…์ˆ˜์ค€, ์ง€์ถœ์ˆ˜์ค€, ์‹ ์žฅ, ์ฒด์ค‘, ์ข…๊ต, ์ด์ˆ˜ํ•™๊ธฐ, ๋‹จ๊ณผ๋Œ€ํ•™์„ ์ถ”๋ก ํ•œ๋‹ค. ์ถ”๋ก  ๊ณผ์ •์—์„œ ์Šค๋งˆํŠธํฐ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ค์น˜๋ชฉ๋ก๊ณผ ์Šคํ† ์–ด์—์„œ ํš๋“๊ฐ€๋Šฅํ•œ ๋ฉ”ํƒ€์ •๋ณด์ธ ์นดํ…Œ๊ณ ๋ฆฌ์™€ ์„ค๋ช…๊ธ€์„ ์ด์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋„ค๊ฐ€์ง€ ์š”์ธ ๋ฒกํ„ฐ๋ฅผ ๋งŒ๋“ค์–ด ์‚ฌ์šฉํ•œ๋‹ค. ํŠนํžˆ, ์ธ๊ณต ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜์˜ ํ…์ŠคํŠธ ์ž„๋ฒ ๋”ฉ ๋ฐฉ๋ฒ•๋ก ์ธ Doc2Vec์„ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ค๋ช…๊ธ€์— ์ ์šฉํ•œ ์š”์ธ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋˜ํ•œ, ๋„ค๊ฐ€์ง€ ์š”์ธ ๋ฒกํ„ฐ์— ๋‹ด๊ธด ์ •๋ณด๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ์ด์šฉ ํ•˜๊ธฐ์œ„ํ•ด ๊ฐ๊ฐ์˜ ์š”์ธ๋ฒกํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ถ”๋ก ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์•™์ƒ๋ธ”ํ•œ ๊ฒฝ์šฐ์˜ ์„ฑ๋Šฅ์„ ์‚ดํŽด๋ณด๊ณ , ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์„ ํƒ์ ์œผ๋กœ ์‚ฌ์šฉํ•ด๊ฐ€๋ฉด์„œ ์ถ”๋ก ํ•˜๋Š” ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์ธ๊ตฌํ†ต๊ณ„ํ•™ ์ •๋ณด ํ•ญ๋ชฉ ๋ณ„๋กœ ๋ชจ๋“  ์š”์ธ ๋ฒกํ„ฐ์™€ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ ํƒ ๋ฐฉ๋ฒ•๋ก ์„ ์กฐํ•ฉํ•˜์—ฌ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ์ตœ์ข…์„ฑ๋Šฅ์œผ๋กœ ๋„์ถœํ•˜๊ณ  ์ถ”๋ก  ํšจ๊ณผ๋ฅผ ๋น„๊ตํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ๋‹จ์ผ ์š”์ธ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ์—์„œ๋Š” ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ค๋ช…๊ธ€์— Doc2Vec ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ๋งŒ๋“  ์š”์ธ ๋ฒกํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ๊ฐ€ ์ „๋ฐ˜์ ์œผ๋กœ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ, ๊ฐ ์š”์ธ ๋ฒกํ„ฐ๋ฅผ ์ด์šฉํ•ด ์ถ”๋ก ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์•™์ƒ๋ธ” ํ•œ ๊ฒฐ๊ณผ๋กœ ์„ฑ๋ณ„, ์—ฐ์• ์ƒํƒœ, ์‹ ์žฅ, ์ฒด์ค‘ ํ•ญ๋ชฉ์—์„œ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ, ์ธ๊ตฌํ†ต๊ณ„ํ•™ ์ •๋ณด ํ•ญ๋ชฉ๋ณ„๋กœ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ ํƒ ๊ธฐ์ค€์ด ๋‹ค๋ฅด๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€์œผ๋ฉฐ, ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ค์น˜๋ชฉ๋ก์„ ์ด์šฉํ•˜๋ฉด ๋‹ค๋ฅธ ํ•ญ๋ชฉ๋“ค์— ๋น„ํ•ด ์„ฑ๋ณ„, ๋‹จ๊ณผ๋Œ€ํ•™, ์—ฐ์• ์ƒํƒœ, ์†Œ๋“์ˆ˜์ค€์„ ์ถ”๋ก ํ•˜์˜€์„ ๋•Œ ๊ทธ ํšจ๊ณผ๊ฐ€ ํฌ๋‹ค๋Š” ์‚ฌ์‹ค์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค.Needs for customized services are increasing as a smart phone, which is a personalized device, has been used generally. Demographic information is useful information for customized services, so demographic inference based various data using statistical learning has been actively researched. This study conducts experiments of gender, age, relationship status, residential type, living together or not, income, outcome, height, weight, religion, semester and college inference with a list of installed applications which is differed by users interest and lifestyle and can be accessed easily as a snapshot without explicit permission. Four feature vectors are used for demographic inference, including vectors utilizing application category or description which can be collected from application market. Especially, one of feature vectors is generated by applying Doc2Vec, a text embedding method based on neural network, to application description. An ensemble method is used to make use of information from four feature vector all together. Application selection method is also used to obtain better performances than could be obtained by using all applications on the list. At last, the performances are optimized with types of feature vector and application selection method, used to compare the effects of inference with installed applications among different demographic targets. As a result, overall performances by using the feature vector generated by applying Doc2Vec to application description were excellent and performances in gender, relationship status, height and weight was improved by using the ensemble method. In addition, it was found that application selection method which can improve performance is different by demographic targets and the effects of gender, college, relationship status and income inference are greater than other targets based on installed applications.1. ์„œ๋ก  1 1.1. ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ 1 1.2. ์—ฐ๊ตฌ ๋ชฉ์  4 1.3. ์—ฐ๊ตฌ ๋‚ด์šฉ 5 2. ๋ฐฐ๊ฒฝ ์ด๋ก  ๋ฐ ๊ด€๋ จ ์—ฐ๊ตฌ 7 2.1. Word2Vec/Doc2Vec 7 2.2. ์ธ๊ตฌํ†ต๊ณ„ํ•™ ์ •๋ณด ์ถ”๋ก  10 3. ์‹คํ—˜ ์„ค๊ณ„ 13 3.1. ์‹คํ—˜ ๋ฐ์ดํ„ฐ 14 3.1.1. ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ค์น˜๋ชฉ๋ก 14 3.1.2. ์ธ๊ตฌํ†ต๊ณ„ํ•™ ์ •๋ณด 16 3.1.3. ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฉ”ํƒ€์ •๋ณด 18 3.2. ์š”์ธ ๋ฒกํ„ฐ 21 3.2.1. ๋‹จ์ˆœ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ชฉ๋ก 21 3.2.2. ์นดํ…Œ๊ณ ๋ฆฌ ๋น„์œจ 211 3.2.3. ์„ค๋ช…๊ธ€ ๋‹จ์–ด TF-IDF 21 3.2.4. ์„ค๋ช…๊ธ€ Doc2Vec 22 3.3. ์•™์ƒ๋ธ” 24 3.4. ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ ํƒ 25 3.4.1. ์„ค์น˜ ๋น„์œจ์— ๋”ฐ๋ฅธ ์„ ํƒ 25 3.4.2. ๊ตฌ๋ถ„๋ ฅ์— ๋”ฐ๋ฅธ ์„ ํƒ 26 3.5. ์‹คํ—˜ ํ™˜๊ฒฝ ๋ฐ ํ‰๊ฐ€ ์ง€ํ‘œ 27 3.5.1. ์‹คํ—˜ ํ™˜๊ฒฝ 27 3.5.2. ํ‰๊ฐ€ ์ง€ํ‘œ 27 4. ์‹คํ—˜ ๊ฒฐ๊ณผ 29 4.1. ์„ค๋ช…๊ธ€ Doc2Vec ์š”์ธ ๋ฒกํ„ฐ ํ•™์Šต ๋ฐ ์ฐจ์› ์ตœ์ ํ™” 29 4.1.1. ํ•™์Šต ๊ฒฐ๊ณผ 29 4.1.2. ์ฐจ์› ์ตœ์ ํ™” 30 4.2. ์š”์ธ ๋ฒกํ„ฐ์— ๋”ฐ๋ฅธ ์ถ”๋ก  ์„ฑ๋Šฅ ๋น„๊ต 33 4.3. ์•™์ƒ๋ธ” ๊ฒฐ๊ณผ 36 4.4. ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ ํƒ์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™” 38 4.4.1. ์„ค์น˜ ๋น„์œจ์— ๋”ฐ๋ฅธ ์„ ํƒ 38 4.4.2. ๊ตฌ๋ถ„๋ ฅ์— ๋”ฐ๋ฅธ ์„ ํƒ 43 4.5. ์ธ๊ตฌํ†ต๊ณ„ํ•™ ์ •๋ณด ํ•ญ๋ชฉ๊ฐ„ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์„ค์น˜๋ชฉ๋ก์„ ์ด์šฉํ•œ ์ถ”๋ก  ํšจ๊ณผ ๋น„๊ต 47 4.5.1. ์ตœ์ข… ์„ฑ๋Šฅ 47 4.5.2. ๋ฒ ์ด์Šค๋ผ์ธ ๋Œ€๋น„ ์„ฑ๋Šฅ ๊ฐœ์„  ๋น„์œจ ๋น„๊ต 48 5. ๊ฒฐ๋ก  51 5.1. ์š”์•ฝ ๋ฐ ์—ฐ๊ตฌ ์˜์˜ 51 5.2. ํ–ฅํ›„ ๋ฐœ์ „ ๋ฐฉํ–ฅ 23 ์ฐธ๊ณ  ๋ฌธํ—Œ 54 Abstract 58Maste
    corecore