1,445 research outputs found
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational
linguistics, is to design methods capable of analyzing people's argumentation.
In this article, we go beyond the state of the art in several ways. (i) We deal
with actual Web data and take up the challenges given by the variety of
registers, multiple domains, and unrestricted noisy user-generated Web
discourse. (ii) We bridge the gap between normative argumentation theories and
argumentation phenomena encountered in actual data by adapting an argumentation
model tested in an extensive annotation study. (iii) We create a new gold
standard corpus (90k tokens in 340 documents) and experiment with several
machine learning methods to identify argument components. We offer the data,
source codes, and annotation guidelines to the community under free licenses.
Our findings show that argumentation mining in user-generated Web discourse is
a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in
User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17
์ฝ๋ฌผ ๊ฐ์๋ฅผ ์ํ ๋น์ ํ ํ ์คํธ ๋ด ์์ ์ ๋ณด ์ถ์ถ ์ฐ๊ตฌ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ์ตํฉ๊ณผํ๊ธฐ์ ๋ํ์ ์์ฉ๋ฐ์ด์ค๊ณตํ๊ณผ, 2023. 2. ์ดํ๊ธฐ.Pharmacovigilance is a scientific activity to detect, evaluate and understand the occurrence of adverse drug events or other problems related to drug safety. However, concerns have been raised over the quality of drug safety information for pharmacovigilance, and there is also a need to secure a new data source to acquire drug safety information. On the other hand, the rise of pre-trained language models
based on a transformer architecture has accelerated the application of natural language processing (NLP) techniques in diverse domains. In this context, I tried to define two problems in pharmacovigilance as an NLP task and provide baseline models for the defined tasks: 1) extracting comprehensive drug safety information from adverse drug events narratives reported through a spontaneous reporting system (SRS) and 2) extracting drug-food interaction information from abstracts of biomedical articles. I developed annotation guidelines and performed manual annotation, demonstrating that strong NLP models can be trained to extracted clinical information from unstructrued free-texts by fine-tuning transformer-based language models on a high-quality annotated corpus. Finally, I discuss issues to consider when when developing annotation guidelines for extracting clinical information related to pharmacovigilance. The annotated corpora and the NLP models in this dissertation can streamline pharmacovigilance activities by enhancing the data quality of reported drug safety information and expanding the data sources.์ฝ๋ฌผ ๊ฐ์๋ ์ฝ๋ฌผ ๋ถ์์ฉ ๋๋ ์ฝ๋ฌผ ์์ ์ฑ๊ณผ ๊ด๋ จ๋ ๋ฌธ์ ์ ๋ฐ์์ ๊ฐ์ง, ํ๊ฐ ๋ฐ ์ดํดํ๊ธฐ ์ํ ๊ณผํ์ ํ๋์ด๋ค. ๊ทธ๋ฌ๋ ์ฝ๋ฌผ ๊ฐ์์ ์ฌ์ฉ๋๋ ์์ฝํ ์์ ์ฑ ์ ๋ณด์ ๋ณด๊ณ ํ์ง์ ๋ํ ์ฐ๋ ค๊ฐ ๊พธ์คํ ์ ๊ธฐ๋์์ผ๋ฉฐ, ํด๋น ๋ณด๊ณ ํ์ง์ ๋์ด๊ธฐ ์ํด์๋ ์์ ์ฑ ์ ๋ณด๋ฅผ ํ๋ณดํ ์๋ก์ด ์๋ฃ์์ด ํ์ํ๋ค. ํํธ ํธ๋์คํฌ๋จธ ์ํคํ
์ฒ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ์ฌ์ ํ๋ จ ์ธ์ด๋ชจ๋ธ์ด ๋ฑ์ฅํ๋ฉด์ ๋ค์ํ ๋๋ฉ์ธ์์ ์์ฐ์ด์ฒ๋ฆฌ ๊ธฐ์ ์ ์ฉ์ด ๊ฐ์ํ๋์๋ค. ์ด๋ฌํ ๋งฅ๋ฝ์์ ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ์ฝ๋ฌผ ๊ฐ์๋ฅผ ์ํ ๋ค์ 2๊ฐ์ง ์ ๋ณด ์ถ์ถ ๋ฌธ์ ๋ฅผ ์์ฐ์ด์ฒ๋ฆฌ ๋ฌธ์ ํํ๋ก ์ ์ํ๊ณ ๊ด๋ จ ๊ธฐ์ค ๋ชจ๋ธ์ ๊ฐ๋ฐํ์๋ค: 1) ์๋์ ์ฝ๋ฌผ ๊ฐ์ ์ฒด๊ณ์ ๋ณด๊ณ ๋ ์ด์์ฌ๋ก ์์ ์๋ฃ์์ ํฌ๊ด์ ์ธ ์ฝ๋ฌผ ์์ ์ฑ ์ ๋ณด๋ฅผ ์ถ์ถํ๋ค. 2) ์๋ฌธ ์์ฝํ ๋
ผ๋ฌธ ์ด๋ก์์ ์ฝ๋ฌผ-์ํ ์ํธ์์ฉ ์ ๋ณด๋ฅผ ์ถ์ถํ๋ค. ์ด๋ฅผ ์ํด ์์ ์ฑ ์ ๋ณด ์ถ์ถ์ ์ํ ์ด๋
ธํ
์ด์
๊ฐ์ด๋๋ผ์ธ์ ๊ฐ๋ฐํ๊ณ ์์์
์ผ๋ก ์ด๋
ธํ
์ด์
์ ์ํํ์๋ค. ๊ฒฐ๊ณผ์ ์ผ๋ก ๊ณ ํ์ง์ ์์ฐ์ด ํ์ต๋ฐ์ดํฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ์ฌ์ ํ์ต ์ธ์ด๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํจ์ผ๋ก์จ ๋น์ ํ ํ
์คํธ์์ ์์ ์ ๋ณด๋ฅผ ์ถ์ถํ๋ ๊ฐ๋ ฅํ ์์ฐ์ด์ฒ๋ฆฌ ๋ชจ๋ธ ๊ฐ๋ฐ์ด ๊ฐ๋ฅํจ์ ํ์ธํ์๋ค. ๋ง์ง๋ง์ผ๋ก ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ์ฝ๋ฌผ๊ฐ์์ ๊ด๋ จ๋์์ ์ ๋ณด ์ถ์ถ์ ์ํ ์ด๋
ธํ
์ด์
๊ฐ์ด๋๋ผ์ธ์ ๊ฐ๋ฐํ ๋ ๊ณ ๋ คํด์ผ ํ ์ฃผ์ ์ฌํญ์ ๋ํด ๋
ผ์ํ์๋ค. ๋ณธ ํ์ ๋
ผ๋ฌธ์์ ์๊ฐํ ์์ฐ์ด ํ์ต๋ฐ์ดํฐ์ ์์ฐ์ด์ฒ๋ฆฌ ๋ชจ๋ธ์ ์ฝ๋ฌผ ์์ ์ฑ ์ ๋ณด์ ๋ณด๊ณ ํ์ง์ ํฅ์์ํค๊ณ ์๋ฃ์์ ํ์ฅํ์ฌ ์ฝ๋ฌผ ๊ฐ์ ํ๋์ ๋ณด์กฐํ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.Chapter 1 1
1.1 Contributions of this dissertation 2
1.2 Overview of this dissertation 2
1.3 Other works 3
Chapter 2 4
2.1 Pharmacovigilance 4
2.2 Biomedical NLP for pharmacovigilance 6
2.2.1 Pre-trained language models 6
2.2.2 Corpora to extract clinical information for pharmacovigilance 9
Chapter 3 11
3.1 Motivation 12
3.2 Proposed Methods 14
3.2.1 Data source and text corpus 15
3.2.2 Annotation of ADE narratives 16
3.2.3 Quality control of annotation 17
3.2.4 Pretraining KAERS-BERT 18
3.2.6 Named entity recognition 20
3.2.7 Entity label classification and sentence extraction 21
3.2.8 Relation extraction 21
3.2.9 Model evaluation 22
3.2.10 Ablation experiment 23
3.3 Results 24
3.3.1 Annotated ICSRs 24
3.3.2 Corpus statistics 26
3.3.3 Performance of NLP models to extract drug safety information 28
3.3.4 Ablation experiment 31
3.4 Discussion 33
3.5 Conclusion 38
Chapter 4 39
4.1 Motivation 39
4.2 Proposed Methods 43
4.2.1 Data source 44
4.2.2 Annotation 45
4.2.3 Quality control of annotation 49
4.2.4 Baseline model development 49
4.3 Results 50
4.3.1 Corpus statistics 50
4.3.2 Annotation Quality 54
4.3.3 Performance of baseline models 55
4.3.4 Qualitative error analysis 56
4.4 Discussion 59
4.5 Conclusion 63
Chapter 5 64
5.1 Issues around defining a word entity 64
5.2 Issues around defining a relation between word entities 66
5.3 Issues around defining entity labels 68
5.4 Issues around selecting and preprocessing annotated documents 68
Chapter 6 71
6.1 Dissertation summary 71
6.2 Limitation and future works 72
6.2.1 Development of end-to-end information extraction models from free-texts to database based on existing structured information 72
6.2.2 Application of in-context learning framework in clinical information extraction 74
Chapter 7 76
7.1 Annotation Guideline for "Extraction of Comprehensive Drug Safety Information from Adverse Event Narratives Reported through Spontaneous Reporting System" 76
7.2 Annotation Guideline for "Extraction of Drug-Food Interactions from the Abtracts of Biomedical Articles" 100๋ฐ
Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004
International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants
Stabilizing knowledge through standards - A perspective for the humanities
It is usual to consider that standards generate mixed feelings among
scientists. They are often seen as not really reflecting the state of the art
in a given domain and a hindrance to scientific creativity. Still, scientists
should theoretically be at the best place to bring their expertise into
standard developments, being even more neutral on issues that may typically be
related to competing industrial interests. Even if it could be thought of as
even more complex to think about developping standards in the humanities, we
will show how this can be made feasible through the experience gained both
within the Text Encoding Initiative consortium and the International
Organisation for Standardisation. By taking the specific case of lexical
resources, we will try to show how this brings about new ideas for designing
future research infrastructures in the human and social sciences
- โฆ