635 research outputs found

    Extracting Scales of Measurement Automatically from Biomedical Text with Special Emphasis on Comparative and Superlative Scales

    Get PDF
    Abstract In this thesis, the focus is on the topic of โ€œExtracting Scales of Measurement Automatically from Biomedical Text with Special Emphasis on Comparative and Superlative Scales.โ€ Comparison sentences, when considered as a critical part of scales of measurement, play a highly significant role in the process of gathering information from a large number of biomedical research papers. A comparison sentence is defined as any sentence that contains two or more entities that are being compared. This thesis discusses several different types of comparison sentences such as gradable comparisons and non-gradable comparisons. The main goal is extracting comparison sentences automatically from the full text of biomedical articles. Therefore, the thesis presents a Java program that could be used to analyze biomedical text to identify comparison sentences by matching the sentences in the text to 37 syntactic and semantic features. These features or qualities would be helpful to extract comparative sentences from any biomedical text. Two machine learning techniques are used with the 37 roles to assess the curated dataset. The results of this study are compared with earlier studies

    The Business Impact of Social Media - Sentiment Analysis Approach -

    Get PDF
    ์ด ์—ฐ๊ตฌ์˜ ๋ชฉ์ ์€ ์†Œ์…œ ๋ฏธ๋””์–ด์—์„œ ์ถ”์ถœ๋œ 7๊ฐœ์˜ ๊ฐ์„ฑ ๋„๋ฉ”์ธ์ด ์ž๋™์ฐจ ์‹œ์žฅ ์ ์œ ์œจ ์˜ˆ์ธก์— ๋Œ€ํ•œ ๊ฐ์„ฑ ๋ถ„์„ ์‹คํ—˜์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋กœ์„œ ์ ํ•ฉํ•œ ์ง€์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ์„ ํ™•์ธํ•˜๊ณ  ๊ณ ๊ฐ๋“ค์˜ ์˜๊ฒฌ์ด ๊ธฐ์—…์˜ ์„ฑ๊ณผ์— ์–ด๋–ป๊ฒŒ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ง€์— ๋Œ€ํ•˜์—ฌ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ด๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ์ด3๋‹จ๊ณ„์— ๊ฑธ์ณ์„œ ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๊ฐ์„ฑ์‚ฌ์ „ ๊ตฌ์ถ•์˜ ๋‹จ๊ณ„๋กœ์„œ 2013๋…„ 1์›” 1์ผ๋ถ€ํ„ฐ 2015๋…„ 12์›” 31์ผ๊นŒ์ง€ ๋ฏธ๊ตญ ๋‚ด 26๊ฐœ์˜ ์ž๋™์ฐจ ์ œ์กฐ ํšŒ์‚ฌ์˜ ๊ณ ๊ฐ์˜ ์†Œ๋ฆฌ (VOC: Voice of the Customer) ์ด 45,447๊ฐœ๋ฅผ ์ž๋™์ฐจ ์ปค๋ฎค๋‹ˆํ‹ฐ๋กœ๋ถ€ํ„ฐ ํฌ๋กค๋ง (crawling)ํ•˜์—ฌ POS (Part-of-Speech) ์ฆ‰ ํ’ˆ์‚ฌ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ํƒœ๊น… (tagging)๊ณผ์ •์„ ๊ฑฐ์ณ ๋ถ€์ •์ , ๊ธ์ •์  ๊ฐ์„ฑ์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์ธก์ •ํ•˜์—ฌ ๊ฐ์„ฑ์‚ฌ์ „์„ ๊ตฌ์ถ•ํ•˜์˜€๊ณ , ์ด์— ๋Œ€ํ•œ ๊ทน์„ฑ์„ ์ธก์ •ํ•˜์—ฌ 7๊ฐœ์˜ ๊ฐ์„ฑ๋„๋ฉ”์ธ์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ ๋ถ„์„์˜ ๋‹จ๊ณ„๋กœ์„œ ์ž๊ธฐ์ƒ๊ด€๊ด€๊ณ„๋ถ„์„ (Auto-correlation Analysis)๊ณผ ์ฃผ์„ฑ๋ถ„๋ถ„์„ (PCA: Principal Component Analysis)์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๊ฐ€ ์‹คํ—˜์— ์ ํ•ฉํ•œ์ง€๋ฅผ ๊ฒ€์ฆํ•˜์˜€๋‹ค. ์„ธ ๋ฒˆ์งธ ๋‹จ๊ณ„์—์„œ๋Š” 2๊ฐœ์˜ ์„ ํ˜•ํšŒ๊ท€๋ถ„์„ ๋ชจ๋ธ๋กœ 7๊ฐœ์˜ ๊ฐ์„ฑ์˜์—ญ์ด ๋ฏธ๊ตญ๋‚ด ์ž๋™์ฐจ ์ œ์กฐ ํšŒ์‚ฌ ์ค‘ GM, ํฌ๋“œ, FCA, ํญ์Šค๋ฐ”๊ฒ ๋“ฑ ์ด 4๊ฐœ์˜ ์ž๋™์ฐจ ์ƒ์‚ฐ ๊ธฐ์—…์„ ์„ ์ •ํ•˜์—ฌ ์ด๋“ค ๊ธฐ์—…์˜ ์„ฑ๊ณผ ์ฆ‰, ์ž๋™์ฐจ ์‹œ์žฅ์ ์œ ์œจ์— ์–ด๋–ค ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ  ์žˆ๋Š” ์ง€ ์‹คํ—˜ํ•˜์˜€๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ์šฐ๋ฆฌ๋Š” 4,815๊ฐœ์˜ ๋ถ€์ •์ ์ธ ์–ดํœ˜๋“ค๊ณผ 2,021๊ฐœ์˜ ๊ธ์ •์ ์ธ ๊ฐ์„ฑ์–ดํœ˜๋“ค์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐ์„ฑ์‚ฌ์ „์„ ๊ตฌ์ถ•ํ•˜์˜€์œผ๋ฉฐ, ๊ตฌ์ถ•๋œ ๊ฐ์„ฑ์‚ฌ์ „์„ ๋ฐ”ํƒ•์œผ๋กœ, ์ถ”์ถœ๋˜๊ณ  ๋ถ„๋ฅ˜๋œ ๋ถ€์ •์ ์ด๊ณ  ๊ธ์ •์ ์ธ ์–ดํœ˜๋“ค์„ ์ž๋™์ฐจ ์‚ฐ์—…์— ๊ด€๋ จ๋œ ์–ดํœ˜๋“ค๊ณผ ์กฐํ•ฉํ•˜์˜€๊ณ , ์ž๊ธฐ์ƒ๊ด€๋ถ„์„๊ณผ PCA (์ฃผ์„ฑ๋ถ„ ๋ถ„์„)๋ฅผ ํ†ตํ•ด ๊ฐ์„ฑ์˜ ํŠน์„ฑ์„ ์กฐ์‚ฌํ•˜์˜€๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด, ์ž๊ธฐ์ƒ๊ด€๋ถ„์„์— ์˜ํ•ด์„œ ๊ฐ์„ฑ ๋ฐ์ดํ„ฐ์— ์–ด๋–ค ์ผ์ •ํ•œ ํŒจํ„ด์ด ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์ด ๋ฐœ๊ฒฌ๋˜์—ˆ๊ณ , ๊ฐ๊ฐ์˜ ๊ฐ์„ฑ ์˜์—ญ์˜ ๊ฐ์„ฑ์ด ์ž๊ธฐ์ƒ๊ด€์„ฑ์ด ์žˆ์œผ๋ฉฐ, ๊ฐ์„ฑ์˜ ์‹œ๊ณ„์—ด์„ฑ ๋˜ํ•œ ๊ด€์ฐฐ๋˜์—ˆ๋‹ค. PCA์— ์˜ํ•œ ๊ฒฐ๊ณผ๋กœ์„œ, 7๊ฐœ ๊ฐ์„ฑ์˜์—ญ์ด ๋ถ€์ •์„ฑ, ๊ธ์ •์„ฑ, ์ค‘๋ฆฝ์„ฑ์„ ์ฃผ์„ฑ๋ถ„์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ž๊ธฐ์ƒ๊ด€๋ถ„์„๊ณผ PCA๋ฅผ ํ†ตํ•œ VOC ๊ฐ์„ฑ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ์„ ๋ฐ”ํƒ•์œผ๋กœ 2๊ฐœ์˜ ์„ ํ˜•ํšŒ๊ท€๋ถ„์„ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜์—ฌ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์€ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์—์„œ ๋ถ€์ •์  ๊ฐ์„ฑ์˜ Sadness, Anger, Fear์™€ ๊ธ์ •์  ๊ฐ์„ฑ๋„๋ฉ”์ธ์ธ Delight, Satisfaction์„ ๋…๋ฆฝ๋ณ€์ˆ˜๋กœ ์„ ์ •ํ•˜๊ณ , ์‹œ์žฅ์ ์œ ์œจ์„ ์ข…์†๋ณ€์ˆ˜๋กœ ์„ ์ •ํ•˜์—ฌ ์‹คํ–‰ํ•˜์˜€๊ณ  ๋‘ ๋ฒˆ์งธ ๋ชจ๋ธ์€ ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์— ์ฃผ์„ฑ๋ถ„์ด ์ค‘๋ฆฝ์„ฑ์œผ๋กœ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ Shame, Frustration์„ ๋…๋ฆฝ๋ณ€์ˆ˜์— ์ถ”๊ฐ€ํ•˜์—ฌ ์ค‘๋ฆฝ์„ฑ์„ ๋ ๊ณ  ์žˆ๋Š” ๊ฐ์„ฑ์ด ์‹œ์žฅ ์ ์œ ์œจ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๊ณ  ์žˆ๋Š” ์ง€๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค. ๋ถ„์„ ๊ฒฐ๊ณผ, ๊ฐ ๊ธฐ์—… ๋งˆ๋‹ค ์‹œ์žฅ์ ์œ ์œจ์— ์œ ์˜๋ฏธํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฐ์„ฑ๋“ค์ด ์กด์žฌํ•˜๊ณ  ๋ชจ๋ธ 1๊ณผ, ๋ชจ๋ธ 2์—์„œ์˜ ๊ฐ์„ฑ ์˜ํ–ฅ๋ ฅ์ด ์ฐจ์ด๊ฐ€ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด, ๋ฐ์ดํ„ฐ ์ƒ์— ๋‚˜ํƒ€๋‚œ ์ •๋ณด๋ฅผ ๊ฐ€์ง„ ๊ฐ์„ฑ์ด ๊ณผ๊ฑฐ ๊ฐ’์— ๊ธฐ์ดˆํ•˜์—ฌ ์ž๋™์ฐจ ์‹œ์žฅ์—์„œ ๋ณ€ํ™”๋ฅผ ์ˆ˜๋ฐ˜ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๊ฐ€ ์‹œ์žฅ ๋ฐ์ดํ„ฐ์˜ ๊ฐ€์šฉ์„ฑ์„ ์ ์šฉํ•˜๋ ค๊ณ  ํ•  ๋•Œ, ์ž๋™์ฐจ ์‹œ์žฅ ๊ด€๋ จ ์ •๋ณด๋‚˜ ๊ฐ์„ฑ์˜ ์ž๊ธฐ์ƒ๊ด€์„ฑ์„ ์ž˜ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๊ฐ์ • ๋ถ„์„์— ๋Œ€ํ•œ ์—ฐ๊ตฌ์— ํฐ ๊ธฐ์—ฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์‹ค์ œ ์‹œ์žฅ์—์„œ์˜ ๋น„์ง€๋‹ˆ์Šค ์„ฑ๊ณผ์—๋„ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.List of Tables iv List of Figures v Abstract 1 1. Introduction 1.1 Back Ground 3 1.2 Necessity of Study 6 1.3 Purpose & Questions 8 1.4 Structure 9 2. Literature Reviews of VOC Analysis 2.1 Importance of VOC 11 2.2 Data Mining 15 2.2.1 Concept & Functionalities 15 2.2.2 Methodologies of Data mining 20 2.3 Text Mining 24 2.4 Sentiment Analysis 26 2.5 Research Trend in Korea 30 3. Methodology 3.1 Research Flow 32 3.2 Proposed Methodologies 34 3.2.1 Sentiment Analysis 34 3.2.2 Auto-correlation Analysis 37 3.2.3 Principal Component Analysis (PCA) 38 3.2.4 Linear Regression 40 4. Experiment & Analysis 4.1 Phase I: Constructing Sentiment Lexicon & 7 Sentiment Domains 43 4.1.1 The Subject of Analysis & Crawling Data 43 4.1.2 Extracting POS Information 44 4.1.3 Review Extracting POS Information 46 4.2 Phase II : Reliability Analysis 49 4.2.1 Auto-correlation Analysis of Sentiment 51 4.2.2 Principal Component Analysis of Sentiment 55 4.3 Phase III : Influence on Automotive Market Share 58 4.3.1 Linear Regression Model 58 4.3.2 Definition of Variables 60 4.3.3 The Result of Linear Regression Analysis 62 5. Conclusion 5.1 Summary of Study 73 5.2 Managerial Implication and Limitation 75 5.3 Future Study 77 References 79Docto

    Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

    Get PDF
    This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue itโ€™s easier than ever to do so: this document is accessible on the โ€œinformation superhighwayโ€. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authorsโ€™ abstracts in the web version of this report. The abstracts describe the researchersโ€™ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn

    ุจู†ุงุก ุฃุฏุงุฉ ุชูุงุนู„ูŠุฉ ู…ุชุนุฏุฏุฉ ุงู„ู„ุบุงุช ู„ุงุณุชุฑุฌุงุน ุงู„ู…ุนู„ูˆู…ุงุช

    Get PDF
    The growing requirement on the Internet have made users access to the information expressed in a language other than their own , which led to Cross lingual information retrieval (CLIR) .CLIR is established as a major topic in Information Retrieval (IR). One approach to CLIR uses different methods of translation to translate queries to documents and indexes in other languages. As queries submitted to search engines suffer lack of untranslatable query keys (i.e., words that the dictionary is missing) and translation ambiguity, which means difficulty in choosing between alternatives of translation. Our approach in this thesis is to build and develop the software tool (MORTAJA-IR-TOOL) , a new tool for retrieving information using programming JAVA language with JDK 1.6. This tool has many features, which is develop multiple systematic languages system to be use as a basis for translation when using CLIR, as well as the process of stemming the words entered in the query process as a stage preceding the translation process. The evaluation of the proposed methodology translator of the query comparing it with the basic translation that uses readable dictionary automatically the percentage of improvement is 8.96%. The evaluation of the impact of the process of stemming the words entered in the query on the quality of the output process in the retrieval of matched data in other process the rate of improvement is 4.14%. Finally the rated output of the merger between the use of stemming methodology proposed and translation process (MORTAJA-IR-TOOL) which concluded that the proportion of advanced in the process of improvement in data rate of retrieval is 15.86%. Keywords: Cross lingual information retrieval, CLIR, Information Retrieval, IR, Translation, stemming.ุงู„ุงุญุชูŠุงุฌุงุช ุงู„ู…ุชู†ุงู…ูŠุฉ ุนู„ู‰ ุดุจูƒุฉ ุงู„ุฅู†ุชุฑู†ุช ุฌุนู„ุช ุงู„ู…ุณุชุฎุฏู…ูŠู† ู„ู‡ู… ุญู‚ ุงู„ูˆุตูˆู„ ุฅู„ู‰ ุงู„ู…ุนู„ูˆู…ุงุช ุจู„ุบุฉ ุบูŠุฑ ู„ุบุชู‡ู… ุงู„ุงุตู„ูŠุฉุŒ ู…ู…ุง ูŠู‚ูˆุฏู†ุง ุงู„ู‰ ู…ุตุทู„ุญ ุนุจูˆุฑ ุงู„ู„ุบุงุช ู„ุงุณุชุฑุฌุงุน ุงู„ู…ุนู„ูˆู…ุงุช (CLIR). CLIR ุฃู†ุดุฆุช ูƒู…ูˆุถูˆุน ุฑุฆูŠุณูŠ ููŠ "ุงุณุชุฑุฌุงุน ุงู„ู…ุนู„ูˆู…ุงุช" (IR). ู†ู‡ุฌ ูˆุงุญุฏ ู„ CLIR ูŠุณุชุฎุฏู… ุฃุณุงู„ูŠุจ ู…ุฎุชู„ูุฉ ู„ู„ุชุฑุฌู…ุฉ ูˆู…ู†ู‡ุง ู„ุชุฑุฌู…ุฉ ุงู„ุงุณุชุนู„ุงู…ุงุช ูˆุชุฑุฌู…ุฉ ุงู„ูˆุซุงุฆู‚ ูˆุงู„ูู‡ุงุฑุณ ููŠ ู„ุบุงุช ุฃุฎุฑู‰. ุงู„ุงุณุชูุณุงุฑุงุช ูˆุงู„ุงุณุชุนู„ุงู…ุงุช ุงู„ู…ู‚ุฏู…ุฉ ู„ู…ุญุฑูƒุงุช ุงู„ุจุญุซ ุชุนุงู†ูŠ ู…ู† ุนุฏู… ูˆุฌูˆุฏ ุชุฑุฌู…ู‡ ู„ู…ูุงุชูŠุญ ุงู„ุงุณุชุนู„ุงู… (ุฃูŠ ุฃู† ุงู„ุนุจุงุฑุฉ ู…ูู‚ูˆุฏุฉ ู…ู† ุงู„ู‚ุงู…ูˆุณ) ูˆุงูŠุถุง ุชุนุงู†ูŠ ู…ู† ุบู…ูˆุถ ุงู„ุชุฑุฌู…ุฉุŒ ู…ู…ุง ูŠุนู†ูŠ ุตุนูˆุจุฉ ููŠ ุงู„ุงุฎุชูŠุงุฑ ุจูŠู† ุจุฏุงุฆู„ ุงู„ุชุฑุฌู…ุฉ. ููŠ ู†ู‡ุฌู†ุง ููŠ ู‡ุฐู‡ ุงู„ุงุทุฑูˆุญุฉ ุชู… ุจู†ุงุก ูˆุชุทูˆูŠุฑ ุงู„ุฃุฏุงุฉ ุงู„ุจุฑู…ุฌูŠุฉ (MORTAJA-IR-TOOL) ุฃุฏุงุฉ ุฌุฏูŠุฏุฉ ู„ุงุณุชุฑุฌุงุน ุงู„ู…ุนู„ูˆู…ุงุช ุจุงุณุชุฎุฏุงู… ู„ุบุฉ ุงู„ุจุฑู…ุฌุฉ JAVA ู…ุน JDK 1.6ุŒ ูˆุชู…ุชู„ูƒ ู‡ุฐู‡ ุงู„ุฃุฏุงุฉ ุงู„ุนุฏูŠุฏ ู…ู† ุงู„ู…ูŠุฒุงุชุŒ ุญูŠุซ ุชู… ุชุทูˆูŠุฑ ู…ู†ุธูˆู…ุฉ ู…ู†ู‡ุฌูŠุฉ ู…ุชุนุฏุฏุฉ ุงู„ู„ุบุงุช ู„ุงุณุชุฎุฏุงู…ู‡ุง ูƒุฃุณุงุณ ู„ู„ุชุฑุฌู…ุฉ ุนู†ุฏ ุงุณุชุฎุฏุงู… CLIRุŒ ูˆูƒุฐู„ูƒ ุนู…ู„ูŠุฉ ุชุฌุฐูŠุฑ ู„ู„ูƒู„ู…ุงุช ุงู„ู…ุฏุฎู„ุฉ ููŠ ุนู…ู„ูŠุฉ ุงู„ุงุณุชุนู„ุงู… ูƒู…ุฑุญู„ุฉ ุชุณุจู‚ ุนู…ู„ูŠุฉ ุงู„ุชุฑุฌู…ุฉ. ูˆุชู… ุชู‚ูŠูŠู… ุงู„ุชุฑุฌู…ุฉ ุงู„ู…ู†ู‡ุฌูŠุฉ ุงู„ู…ู‚ุชุฑุญุฉ ู„ู„ุงุณุชุนู„ุงู… ูˆู…ู‚ุงุฑู†ุชู‡ุง ู…ุน ุงู„ุชุฑุฌู…ุฉ ุงู„ุฃุณุงุณูŠุฉ ุงู„ุชูŠ ุชุณุชุฎุฏู… ู‚ุงู…ูˆุณ ู…ู‚ุฑูˆุก ุงู„ูŠุง ูƒุฃุณุงุณ ู„ู„ุชุฑุฌู…ุฉ ููŠ ุชุฌุฑุจุฉ ุชุฑูƒุฒ ุนู„ู‰ ุงู„ู…ุณุชุฎุฏู… ูˆูƒุงู†ุช ู†ุณุจุฉ ุงู„ุชุญุณูŠู† 8.96% , ูˆูƒุฐู„ูƒ ูŠุชู… ุชู‚ูŠูŠู… ู…ุฏู‰ ุชุฃุซูŠุฑ ุนู…ู„ูŠุฉ ุชุฌุฐูŠุฑ ุงู„ูƒู„ู…ุงุช ุงู„ู…ุฏุฎู„ุฉ ููŠ ุนู…ู„ูŠุฉ ุงู„ุงุณุชุนู„ุงู… ุนู„ู‰ ุฌูˆุฏุฉ ุงู„ู…ุฎุฑุฌุงุช ููŠ ุนู…ู„ูŠุฉ ุงุณุชุฑุฌุงุน ุงู„ุจูŠุงู†ุงุช ุงู„ู…ุชุทุงุจู‚ุฉ ุจุงู„ู„ุบุฉ ุงู„ุงุฎุฑู‰ ูˆูƒุงู†ุช ู†ุณุจุฉ ุงู„ุชุญุณูŠู† 4.14% , ูˆููŠ ุงู„ู†ู‡ุงูŠุฉ ุชู… ุชู‚ูŠูŠู… ู†ุงุชุฌ ุนู…ู„ูŠุฉ ุงู„ุฏู…ุฌ ุจูŠู† ุงุณุชุฎุฏุงู… ุงู„ุชุฌุฐูŠุฑ ูˆุงู„ุชุฑุฌู…ุฉ ุงู„ู…ู†ู‡ุฌูŠุฉ ุงู„ู…ู‚ุชุฑุญุฉ (MORTAJA-IR-TOOL) ูˆุงู„ุชูŠ ุฎู„ุตุช ุงู„ู‰ ู†ุณุจุฉ ู…ุชู‚ุฏู…ุฉ ููŠ ุนู…ู„ูŠุฉ ุงู„ุชุญุณูŠู† ููŠ ู†ุณุจุฉ ุงู„ุจูŠุงู†ุงุช ุงู„ู…ุฑุฌุนุฉ ูˆูƒุงู†ุช 15.86%

    Automatic Personality Prediction; an Enhanced Method Using Ensemble Modeling

    Full text link
    Human personality is significantly represented by those words which he/she uses in his/her speech or writing. As a consequence of spreading the information infrastructures (specifically the Internet and social media), human communications have reformed notably from face to face communication. Generally, Automatic Personality Prediction (or Perception) (APP) is the automated forecasting of the personality on different types of human generated/exchanged contents (like text, speech, image, video, etc.). The major objective of this study is to enhance the accuracy of APP from the text. To this end, we suggest five new APP methods including term frequency vector-based, ontology-based, enriched ontology-based, latent semantic analysis (LSA)-based, and deep learning-based (BiLSTM) methods. These methods as the base ones, contribute to each other to enhance the APP accuracy through ensemble modeling (stacking) based on a hierarchical attention network (HAN) as the meta-model. The results show that ensemble modeling enhances the accuracy of APP

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Communityโ€™s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Conselleriฬa de Cultura, Educacioฬn e Ordenacioฬn Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Inฬƒigo Garciaฬ -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio
    • โ€ฆ
    corecore