Search CORE

438 research outputs found

약물 감시를 위한 비정형 텍스트 내 임상 정보 추출 연구

Author: 김시언
Publication venue: 서울대학교 대학원
Publication date: 01/02/2023
Field of study

학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 응용바이오공학과, 2023. 2. 이형기.Pharmacovigilance is a scientific activity to detect, evaluate and understand the occurrence of adverse drug events or other problems related to drug safety. However, concerns have been raised over the quality of drug safety information for pharmacovigilance, and there is also a need to secure a new data source to acquire drug safety information. On the other hand, the rise of pre-trained language models based on a transformer architecture has accelerated the application of natural language processing (NLP) techniques in diverse domains. In this context, I tried to define two problems in pharmacovigilance as an NLP task and provide baseline models for the defined tasks: 1) extracting comprehensive drug safety information from adverse drug events narratives reported through a spontaneous reporting system (SRS) and 2) extracting drug-food interaction information from abstracts of biomedical articles. I developed annotation guidelines and performed manual annotation, demonstrating that strong NLP models can be trained to extracted clinical information from unstructrued free-texts by fine-tuning transformer-based language models on a high-quality annotated corpus. Finally, I discuss issues to consider when when developing annotation guidelines for extracting clinical information related to pharmacovigilance. The annotated corpora and the NLP models in this dissertation can streamline pharmacovigilance activities by enhancing the data quality of reported drug safety information and expanding the data sources.약물 감시는 약물 부작용 또는 약물 안전성과 관련된 문제의 발생을 감지, 평가 및 이해하기 위한 과학적 활동이다. 그러나 약물 감시에 사용되는 의약품 안전성 정보의 보고 품질에 대한 우려가 꾸준히 제기되었으며, 해당 보고 품질을 높이기 위해서는 안전성 정보를 확보할 새로운 자료원이 필요하다. 한편 트랜스포머 아키텍처를 기반으로 사전훈련 언어모델이 등장하면서 다양한 도메인에서 자연어처리 기술 적용이 가속화되었다. 이러한 맥락에서 본 학위 논문에서는 약물 감시를 위한 다음 2가지 정보 추출 문제를 자연어처리 문제 형태로 정의하고 관련 기준 모델을 개발하였다: 1) 수동적 약물 감시 체계에 보고된 이상사례 서술자료에서 포괄적인 약물 안전성 정보를 추출한다. 2) 영문 의약학 논문 초록에서 약물-식품 상호작용 정보를 추출한다. 이를 위해 안전성 정보 추출을 위한 어노테이션 가이드라인을 개발하고 수작업으로 어노테이션을 수행하였다. 결과적으로 고품질의 자연어 학습데이터를 기반으로 사전학습 언어모델을 미세 조정함으로써 비정형 텍스트에서 임상 정보를 추출하는 강력한 자연어처리 모델 개발이 가능함을 확인하였다. 마지막으로 본 학위 논문에서는 약물감시와 관련된임상 정보 추출을 위한 어노테이션 가이드라인을 개발할 때 고려해야 할 주의 사항에 대해 논의하였다. 본 학위 논문에서 소개한 자연어 학습데이터와 자연어처리 모델은 약물 안전성 정보의 보고 품질을 향상시키고 자료원을 확장하여 약물 감시 활동을 보조할 것으로 기대된다.Chapter 1 1 1.1 Contributions of this dissertation 2 1.2 Overview of this dissertation 2 1.3 Other works 3 Chapter 2 4 2.1 Pharmacovigilance 4 2.2 Biomedical NLP for pharmacovigilance 6 2.2.1 Pre-trained language models 6 2.2.2 Corpora to extract clinical information for pharmacovigilance 9 Chapter 3 11 3.1 Motivation 12 3.2 Proposed Methods 14 3.2.1 Data source and text corpus 15 3.2.2 Annotation of ADE narratives 16 3.2.3 Quality control of annotation 17 3.2.4 Pretraining KAERS-BERT 18 3.2.6 Named entity recognition 20 3.2.7 Entity label classification and sentence extraction 21 3.2.8 Relation extraction 21 3.2.9 Model evaluation 22 3.2.10 Ablation experiment 23 3.3 Results 24 3.3.1 Annotated ICSRs 24 3.3.2 Corpus statistics 26 3.3.3 Performance of NLP models to extract drug safety information 28 3.3.4 Ablation experiment 31 3.4 Discussion 33 3.5 Conclusion 38 Chapter 4 39 4.1 Motivation 39 4.2 Proposed Methods 43 4.2.1 Data source 44 4.2.2 Annotation 45 4.2.3 Quality control of annotation 49 4.2.4 Baseline model development 49 4.3 Results 50 4.3.1 Corpus statistics 50 4.3.2 Annotation Quality 54 4.3.3 Performance of baseline models 55 4.3.4 Qualitative error analysis 56 4.4 Discussion 59 4.5 Conclusion 63 Chapter 5 64 5.1 Issues around defining a word entity 64 5.2 Issues around defining a relation between word entities 66 5.3 Issues around defining entity labels 68 5.4 Issues around selecting and preprocessing annotated documents 68 Chapter 6 71 6.1 Dissertation summary 71 6.2 Limitation and future works 72 6.2.1 Development of end-to-end information extraction models from free-texts to database based on existing structured information 72 6.2.2 Application of in-context learning framework in clinical information extraction 74 Chapter 7 76 7.1 Annotation Guideline for "Extraction of Comprehensive Drug Safety Information from Adverse Event Narratives Reported through Spontaneous Reporting System" 76 7.2 Annotation Guideline for "Extraction of Drug-Food Interactions from the Abtracts of Biomedical Articles" 100박

SNU Open Repository and Archive

Visual Question Answering: A SURVEY

Author: El-Naggar Gehad Assem
Publication venue: Arab Journals Platform
Publication date: 18/07/2023
Field of study

Visual Question Answering (VQA) has been an emerging field in computer vision and natural language processing that aims to enable machines to understand the content of images and answer natural language questions about them. Recently, there has been increasing interest in integrating Semantic Web technologies into VQA systems to enhance their performance and scalability. In this context, knowledge graphs, which represent structured knowledge in the form of entities and their relationships, have shown great potential in providing rich semantic information for VQA. This paper provides an abstract overview of the state-of-the-art research on VQA using Semantic Web technologies, including knowledge graph based VQA, medical VQA with semantic segmentation, and multi-modal fusion with recurrent neural networks. The paper also highlights the challenges and future directions in this area, such as improving the accuracy of knowledge graph based VQA, addressing the semantic gap between image content and natural language, and designing more effective multimodal fusion strategies. Overall, this paper emphasizes the importance and potential of using Semantic Web technologies in VQA and encourages further research in this exciting area

Arab Journals Platform

토큰 단위 분류모델을 위한 중요 토큰 포착 및 시퀀스 인코더 설계 방법

Author: 강태관
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 정교민.With the development of internet, a great of volume of data have accumulated over time. Therefore, dealing long sequential data can become a core problem in web services. For example, streaming services such as YouTube, Netflx and Tictoc have used the user's viewing history sequence to recommend videos that users may like. Such systems have replaced the user's viewed video with each item or token to predict what item or token will be viewed next. These tasks have been defined as Token-Level Classification (TLC) tasks. Given the sequence of tokens, TLC identifies the labels of tokens in the required portion of this sequence. As mentioned above, TLC can be applied to various recommendation Systems. In addition, most of Natural Language Processing (NLP) tasks can also be formulated as TLC problem. For example, sentence and each word within the sentence can be expressed as token-level sequence. In particular, in the case of information extraction, it can be changed to a TLC task that distinguishes whether a specific word span in the sentence is information. The characteristics of TLC datasets are that they are very sparse and long. Therefore, it is a very important problem to extract only important information from the sequences and properly encode them. In this thesis, we propose the method to solve the two academic questions of TLC in Recommendation Systems and information extraction: 1) How to capture important tokens from the token sequence and 2) How to encode a token sequence into model. As deep neural networks (DNNs) have shown outstanding performance in various web application tasks, we design the RNN and Transformer-based model for recommendation systems, and information extractions. In this dissertation, we propose novel models that can extract important tokens for recommendation systems and information extraction systems. In recommendation systems, we design a BART-based system that can capture important portion of token sequence through self-attention mechanisms and consider both bidirectional and left-to-right directional information. In information systems, we present relation network-based models to focus important parts such as opinion target and neighbor words.인터넷의 발달로, 많은 양의 데이터가 시간이 지남에 따라 축적되었다. 이로인해 긴 순차적 데이터를 처리하는 것은 웹 서비스의 핵심 문제가 되었다. 예를 들어, 유튜브, 넷플릭스, 틱톡과 같은 스트리밍 서비스는 사용자의 시청 기록 시퀀스를 사용하여 사용자가 좋아할 만한 비디오를 추천한다. 이러한 시스템은 다음에 어떤 항목이나 토큰을 볼 것인지를 예측하기 위해 사용자가 본 비디오를 각 항목 또는 토큰으로 대체하여 사용할 수 있다. 이러한 작업은 토큰 수준 분류(TLC) 작업으로 정의한다. 토큰 시퀀스가 주어지면, TLC는 이 시퀀스의 필요한 부분에서 토큰의 라벨을 식별한다. 이렇게와 같이, TLC는 다양한 추천 시스템에 적용될 수 있다. 또한 대부분의 자연어 처리(NLP) 작업은 TLC 문제로 공식화될 수 있다. 예를 들어, 문장과 문장 내의 각 단어는 토큰 레벨 시퀀스로 표현될 수 있다. 특히 정보 추출의 경우 문장의 특정 단어 간격이 정보인지 여부를 구분하는 TLC 작업으로 바뀔 수 있다. TLC 데이터 세트의 특징은 매우 희박(Sparse)하고 길다는 것이다. 따라서 시퀀스에서 중요한 정보만 추출하여 적절히 인코딩하는 것은 매우 중요한 문제이다. 본 논문에서는 권장 시스템과 정보 추출에서 TLC의 두 가지 학문적 질문- 1) 토큰 시퀀스에서 중요한 토큰을 캡처하는 방법 및 2) 토큰 시퀀스를 모델로 인코딩하는 방법 을 해결하는 방법을 제안한다. 심층 신경망(DNN)이 다양한 웹 애플리케이션 작업에서 뛰어난 성능을 보여 왔기 때문에 추천 시스템 및 정보 추출을 위한 RNN 및 트랜스포머 기반 모델을 설계한다. 먼저 우리는 자기 주의 메커니즘을 통해 토큰 시퀀스의 중요한 부분을 포착하고 양방향 및 좌우 방향 정보를 모두 고려할 수 있는 BART 기반 추천 시스템을 설계한다. 정보 시스템에서, 우리는 의견 대상과 이웃 단어와 같은 중요한 부분에 초점을 맞추기 위해 관계 네트워크 기반 모델을 제시한다.1. Introduction 1 2. Token-level Classification in Recommendation Systems 8 2.1 Overview 8 2.2 Hierarchical RNN-based Recommendation Systems 19 2.3 Entangled Bidirectional Encoder to Auto-regressive Decoder for Sequential Recommendation 27 3. Token-level Classification in Information Extraction 39 3.1 Overview 39 3.2 RABERT: Relation-Aware BERT for Target-Oriented Opinion Words Extraction 49 3.3 Gated Relational Target-aware Encoder and Local Context-aware Decoder for Target-oriented Opinion Words Extraction 58 4. Conclusion 79박

SNU Open Repository and Archive

웹 검색량 기반 주가 변동 예측을 위한 변화하는 주식 관계 모델링

Author: 박재민
Publication venue: 서울대학교 대학원
Publication date: 01/02/2023
Field of study

학위논문(석사) -- 서울대학교대학원 : 공과대학 협동과정 인공지능전공, 2023. 2. 강유.Given historical stock prices and web search volumes of selected keywords, how can we accurately predict stock price predictions? Stock price movement prediction is an attractive task for its applicability in real-world investments. Even a slight improvement in performance can lead to enormous profit. However, the task is extremely challenging due to the inherently volatile and random nature of the stock market. To overcome such difficulties, many researchers have tried to utilize relationships between stocks to make predictions. Despite the effort, previous works have failed to incorporate the dynamic characteristic of stock relationships as they heavily relied on predefined concepts to find stock correlations. However, correlations between stocks change over time and are not dependent on a single criterion. In this paper, we propose GFS (Graph-based Framework using changing relations for Stock price movement prediction), a novel framework for stock price movement prediction using web search volumes to capture the changing relations between stocks. GFS combines relationship information from stationary connections based on predefined concepts with variable connects made from the correlations of each stocks web search volumes collected using tickers. In addition, from the fact that stock prices are affected by global trends, we collect web search volumes of 5 keywords that best represent a common denominator of the target stocks. Experimental results on a 1-year dataset of semiconductor stocks listed in the U.S. stock market show that our model achieves higher accuracy than its baselines.과거 주가와 관련 키워드 웹 검색량이 주어졌을 때 주가의 변동을 어떻게 정확하게 예측할 수 있을까? 주가 예측은 많은 각광을 받고 있으며 약간의 성능 개선으로도 실제 투자에서 많은 이익을 얻을 수 있기에 매우 매력적인 주제이다. 주가의 움직임을 예측한다는 것은 비록 간단해보이지만 주가의 본질적인 변동성으로 인해 매우 어렵다. 이를 극복하기 위한 방안으로 많은 방법들이 주식 간 상관관계 정보를 활용하기 위해 시도해 왔다. 그러나 이전 연구들은 사전에 정의된 정보를 기반으로 고정된 관계만을 사용하거나 과거 가격만을 사용하여 계속해서 변화하는 주식들간의 관계를 예측에 활용하는데 실패하였다. 본 논문에서는 주식 관계의 동적 변화를 사용해 주가의 변동을 예측하는 방법인 GFS (Graph-based Framework using changing relations for Stock price prediction)를 제안한다. GFS는 사전에 정의된 정보를 활용한 그래프와 함께 웹 검색량으로부터 주식들간의 상관관계를 계산하여 매번 새로운 그래프를 생성하여 사용한다. 또한, GFS 는 뉴스로부터 글로벌 산업 트렌드를 나타내는 키워드를 추출하여 얻은 웹 검색량의 특성을 효과적으로 추출하여 글로벌 산업 트렌드 벡터를 생성한다. 두 그래프와 글로벌 산업 트렌드 벡터는 모두 GFS가 정확한 주가 변동을 예측 하는 것에 상당 부분 기여하며, 실험 결과를 통해 GFS가 주가 변동 예측 분야에서 최고 수준의 정확도를 제공함을 확인할 수 있다I. Introduction 1 II. Related Work 7 2.1 Individual Stock Price Prediction 7 2.2 Correlated Stock Price Prediction 8 III. Proposed Method 9 3.1 Overview 9 3.2 Attentive Feature Extraction 12 3.3 Utilization of Stationary and Trend Graphs 14 3.4 Keyword-based global trend extraction 15 3.5 Stock Price Movement Prediction 15 IV. Experiment 17 4.1 Experiment Settings 17 4.2 Classification Performance 18 4.3 Ablation Study 18 V. Conclusion 21 References 22 Abstract in Korean 24석

SNU Open Repository and Archive

Event Extraction: A Survey

Author: Lai Viet Dac
Publication venue
Publication date: 10/10/2022
Field of study

Extracting the reported events from text is one of the key research themes in natural language processing. This process includes several tasks such as event detection, argument extraction, role labeling. As one of the most important topics in natural language processing and natural language understanding, the applications of event extraction spans across a wide range of domains such as newswire, biomedical domain, history and humanity, and cyber security. This report presents a comprehensive survey for event detection from textual documents. In this report, we provide the task definition, the evaluation method, as well as the benchmark datasets and a taxonomy of methodologies for event extraction. We also present our vision of future research direction in event detection.Comment: 20 page

arXiv.org e-Print Archive

Natural Language Processing: Emerging Neural Approaches and Applications

Author
Publication venue: 'MDPI AG'
Publication date: 06/05/2022
Field of study

This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

Directory of Open Access Books (DOAB)

Named Entity Recognition in Electronic Health Records: A Methodological Review

Author: Andrés Orozco-Duque
Ever A. Torres-Silva
María C. Durango
Publication venue: The Korean Society of Medical Informatics
Publication date: 01/10/2023
Field of study

Objectives A substantial portion of the data contained in Electronic Health Records (EHR) is unstructured, often appearing as free text. This format restricts its potential utility in clinical decision-making. Named entity recognition (NER) methods address the challenge of extracting pertinent information from unstructured text. The aim of this study was to outline the current NER methods and trace their evolution from 2011 to 2022. Methods We conducted a methodological literature review of NER methods, with a focus on distinguishing the classification models, the types of tagging systems, and the languages employed in various corpora. Results Several methods have been documented for automatically extracting relevant information from EHRs using natural language processing techniques such as NER and relation extraction (RE). These methods can automatically extract concepts, events, attributes, and other data, as well as the relationships between them. Most NER studies conducted thus far have utilized corpora in English or Chinese. Additionally, the bidirectional encoder representation from transformers using the BIO tagging system architecture is the most frequently reported classification scheme. We discovered a limited number of papers on the implementation of NER or RE tasks in EHRs within a specific clinical domain. Conclusions EHRs play a pivotal role in gathering clinical information and could serve as the primary source for automated clinical decision support systems. However, the creation of new corpora from EHRs in specific clinical domains is essential to facilitate the swift development of NER and RE models applied to EHRs for use in clinical practice

Directory of Open Access Journals

A Semantic Information Management Approach for Improving Bridge Maintenance based on Advanced Constraint Management

Author: Wu Chengke
Publication venue: Curtin University
Publication date: 01/01/2021
Field of study

Bridge rehabilitation projects are important for transportation infrastructures. This research proposes a novel information management approach based on state-of-the-art deep learning models and ontologies. The approach can automatically extract, integrate, complete, and search for project knowledge buried in unstructured text documents. The approach on the one hand facilitates implementation of modern management approaches, i.e., advanced working packaging to delivery success bridge rehabilitation projects, on the other hand improves information management practices in the construction industry

espace@Curtin