3 research outputs found

    ๊ณ ์œ ๋ช…์‚ฌ ์ •๊ทœํ™” ๊ธฐ๋ฒ•์„ ์ด์šฉํ•œ ์ง€์‹ ๊ทธ๋ž˜ํ”„ ๊ตฌ์ถ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2023. 2. ์กฐ์„ฑ์ค€.Text mining aims to extract the information from documents to derive valuable insights. The knowledge graph provides richer information from various documents. Past literature responded for such needs by building technology trees or concept network from the bibliographic information of the documents, or by relying on text mining techniques in order to extract keywords and/or phrases. In this paper, we propose a framework for building a knowledge graph using named entities. The knowledge graph construction framework in this paper satisfies the following conditions: (1) extracting the named entity in the completed form, (2) Building datasets that can be trained and be evaluated by the named entity normalization models in various domains such as finance and technical documents in addition to bio-informatics, where existing NEN research has been active, (3) creating the better performing named entity normalization model, and (4) constructing the knowledge graph by grouping named entities with the same meaning that appear in various forms.ํ…์ŠคํŠธ ๋งˆ์ด๋‹์€ ๋‹ค์–‘ํ•œ ์ธ์‚ฌ์ดํŠธ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด ๋ฌธ์„œ์—์„œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๋ฌธ์„œ์˜ ์ •๋ณด๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ์‹ ์ค‘ ํ•˜๋‚˜์ธ ์ง€์‹ ๊ทธ๋ž˜ํ”„๋Š” ๋‹ค์–‘ํ•œ ๋ฌธ์„œ์—์„œ ๋”์šฑ ํ’๋ถ€ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ํ…์ŠคํŠธ ๋งˆ์ด๋‹ ๊ธฐ๋ฒ•์„ ์ด์šฉํ•˜์—ฌ ๋ฌธ์„œ์˜ ์ •๋ณด๋“ค๋กœ ๊ธฐ์ˆ  ํŠธ๋ฆฌ ๋˜๋Š” ๊ฐœ๋… ๋„คํŠธ์›Œํฌ๋ฅผ ๊ตฌ์ถ•ํ•˜๊ฑฐ๋‚˜ ํ‚ค์›Œ๋“œ ๋ฐ ๊ตฌ๋ฌธ์„ ์ถ”์ถœํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์— ์„œ๋Š” ๊ณ ์œ ๋ช…์‚ฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ง€์‹ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ตฌ์ถ•ํ•˜๊ธฐ ์œ„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ์ง€์‹ ๊ทธ๋ž˜ํ”„ ๊ตฌ์ถ• ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•œ๋‹ค. (1) ๊ณ ์œ ๋ช…์‚ฌ๋ฅผ ์‚ฌ๋žŒ์ด ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์šด ํ˜•ํƒœ๋กœ ์ถ”์ถœํ•œ๋‹ค. (2) ๊ธฐ์กด ๊ณ ์œ ๋ช…์‚ฌ ์ •๊ทœํ™” ์—ฐ๊ตฌ๊ฐ€ ํ™œ๋ฐœํ–ˆ๋˜ ์ƒ๋ฌผ์ •๋ณดํ•™ ์™ธ์— ๊ธˆ์œต ๋ฌธ์„œ, ๋ฐ˜๋„์ฒด ๊ด€๋ จ ํŠนํ—ˆ ๋ฌธ์„œ์—์„œ ์ถ”์ถœํ•œ ๊ณ ์œ ๋ช…์‚ฌ๋กœ ๊ณ ์œ ๋ช…์‚ฌ ์ •๊ทœํ™” ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ•œ๋‹ค. (3) ๋” ๋‚˜์€ ์„ฑ๋Šฅ์˜ ๊ณ ์œ ๋ช…์‚ฌ ์ •๊ทœํ™” ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. (4) ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ๋™์ผํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ ๊ณ ์œ ๋ช…์‚ฌ๋ฅผ ๊ทธ๋ฃนํ™”ํ•˜์—ฌ ์ง€์‹ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ตฌ์ถ•ํ•œ๋‹ค.Chapter 1 Introduction 1 Chapter 2 Literature review 5 2.1 Named entity normalization dataset 5 2.2 Named entity normalization 6 2.3 Knowledge graph construction 9 Chapter 3 Dictionary construction for named entity normalization 11 3.1 Background 11 3.2 Dictionary construction methods 12 3.2.1 Finance named entity normalization dataset 12 3.2.2 Patent named entity normalization dataset 18 3.3 Chapter summary 24 Chapter 4 Named entity normalization model using edge weight updating neural network 26 4.1 Background 26 4.2 Proposed model 28 4.2.1 Ground truth entity graph construction 31 4.2.2 Similarity-based entity graph construction 32 4.2.3 Edge weight updating neural network training 35 4.2.4 Edge weight updating neural network inferencing 38 4.3 Experiment results 39 4.3.1 Datasets 39 4.3.2 Experiment settings: named entity normalization in bioinformatics 40 4.3.3 Experiment Settings: Named Entity Normalization in Finance 42 4.4 Results 44 4.4.1 Quantitative Analysis: Bioinformatics 45 4.4.2 QuantitativeAnalysis:Finance 46 4.4.3 QualitativeAnalysis 47 4.5 Chapter summary 51 Chapter 5 Building knowledge graph using named entity recognition and normalization models 53 5.1 Background 53 5.2 Proposed model 55 5.2.1 Named entity normalization 56 5.2.2 Construction of the semiconductor-related patent knowledge graph 61 5.3 Experiment results 62 5.3.1 Comparison models 62 5.3.2 Parameters ettings 64 5.4 Results 64 5.4.1 Quantitative evaluations 64 5.4.2 Qualitative evaluations 70 5.4.3 Knowledge graph visualization and exemplary investigation 71 5.5 Chapter summary 75 Chapter 6 Conclusion 77 6.1 Contributions 77 6.2 Future work 78 Bibliography 79 ๊ตญ๋ฌธ์ดˆ๋ก 92 ๊ฐ์‚ฌ์˜ ๊ธ€ 93๋ฐ•

    Using a Human Drug Network for generating novel hypotheses about drugs

    No full text
    Analyzing different drugs for various purposes is an important issue in the area of computational biology. We categorize the previous computational studies into Individual and Network approaches. While Individual approach focuses on one specific drug without considering its relationship with other drugs, Network approach considers also the drugs relationships. In this paper, we apply the previous Network approach for discovering the relationships among diseases on drug data. We construct a Human Drug Network (HDN) for 200 different drugs based on functional and structural information available in the PPI network. For evaluating our proposed HDN, first, we analyzed the literature to prove that the proposed HDN is biologically meaningful. Second, we used the HDN to augment the initial prior knowledge of different drugs. As an example of prior knowledge, we considered the initial seed proteins (a set of proteins which are previously known to be drug targets) of each drug. We clustered the HDN nodes using the Markov CLustering Algorithm (MCL) and then, we augmented the seed proteins of each drug based on the cluster it belongs to. In the end, we concluded that our proposed HDN enables us to generate novel Hypotheses (in terms of potential drug target proteins) and produce complementary results comparing to existing methods.status: publishe
    corecore