Search CORE

20 research outputs found

Generating a Linguistic Model for Requirement Quality Analysis

Author: Kang Juyeon
Park Jungyeul
Publication venue: Hankookmunhwasa
Publication date: 01/01/2016
Field of study

Waseda University Repository

Korean Language Resources for Everyone

Author: Cha Jeong-Won
Hong Jeen-Pyo
Park Jungyeul
Publication venue: Hankookmunhwasa
Publication date: 01/01/2016
Field of study

Waseda University Repository

Extrinsic Factors Affecting the Accuracy of Biomedical NER

Author: Li Zhiyi
Park Jungyeul
Song Yujie
Zhang Shengjie
Publication venue
Publication date: 29/05/2023
Field of study

Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important biomedical information, which can be used to improve downstream applications including the healthcare system. However, NER in the biomedical domain is challenging due to limited data availability, as the high expertise, time, and expenses are required to annotate its data. In this paper, by using the limited data, we explore various extrinsic factors including the corpus annotation scheme, data augmentation techniques, semi-supervised learning and Brill transformation, to improve the performance of a NER model on a clinical text dataset (i2b2 2012, \citet{sun-rumshisky-uzuner:2013}). Our experiments demonstrate that these approaches can significantly improve the model's F1 score from original 73.74 to 77.55. Our findings suggest that considering different extrinsic factors and combining these techniques is a promising approach for improving NER performance in the biomedical domain where the size of data is limited

arXiv.org e-Print Archive

Recommending the Meanings of Newly Coined Words

Author: Cha Jeong-Won
Kim Young-Min
Lee Jong Gun
Park Jungyeul
Publication venue: Published by Elsevier Ltd.
Publication date: 01/01/2011
Field of study

AbstractIn this paper, we investigate how to recommend the meanings of newly coined words, such as newly coined named entities and Internet jargon. Our approach automatically chooses a document explaining a given newly coined word among candidate documents from multiple web references using Probabilistic Latent Semantic Analysis [1]. Briefly, it involves finding the topic of a document containing the newly coined word and computing the conditional probability of the topic given each candidate document. We validate our methodology with two real datasets from MySpace forums and Twitter by referencing three web services, Google, Urbandictionary, and Wikipedia, and we show that we properly recommend the meanings of a set of given newly coined words with 69.5% and 80.5% accuracies based on our three recommendations, respectively. Moreover, we compare our approach against three baselines where one references the result from each web service and our approach outperforms them

Elsevier - Publisher Connector

Crossref

Yet Another Format of Universal Dependencies for Korean

Author: Chen Yige
Jo Eunkyul Leah
Lim KyungTae
Park Jungyeul
Silfverberg Miikka
Tyers Francis M.
Yao Yundong
Publication venue
Publication date: 20/09/2022
Field of study

In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automatically. The effectiveness of the proposed format for Korean dependency parsing is then testified by both statistical and neural models, including UDPipe and Stanza, with our carefully constructed morpheme-based word embedding for Korean. morphUD outperforms parsing results for all Korean UD treebanks, and we also present detailed error analyses.Comment: COLING2022, Poste

arXiv.org e-Print Archive

Trance parser model for Korean: Sejong treebank

Author: Jungyeul Park (5182154)
Publication venue
Publication date
Field of study

Trance parsing model + Embedding vector. See https://github.com/tarowatanabe/trance for the parser and its usage. We also provide parsing and learning scripts for the Trance parser that we used for the paper; 1/ parsing model: ptb_train.txt.model-d100.tar.gz 2/ embedding vector: embedding-d100.vec.gz 3/ trance parser parsing script: trance-parsing.sh 4/ trance parser (batch) learning script: trance-training-batch.sh 5/ test.txt (gold file) and test.txt.leaf is for the parser input. ## Jungyeul Park, A Note on Constituent Parsing for Korean Using the Sejong Treebank (submitted to TALLIP). October 2017. See https://github.com/jungyeul/tallip-sjtree-parsing for more detail. </p

FigShare

Universal Dependencies for Korean: Hani (ver1.0)

Author: Park Jungyeul (5365918)
Publication venue
Publication date
Field of study

Universal Dependencies for Korean: Hani (ver1.0) </p

FigShare

MaltParser model for Korean: Sejong treebank

Author: Jungyeul Park (5182154)
Publication venue
Publication date
Field of study

MaltParser model for Korean: Sejong treebank Jungyeul Park, Jeen-Pyo Hong, and Jeong-Won Cha (2016). Korean Language Resources for Everyone. In Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation (PACLIC 30). Seoul, Korea. [pdf] @inproceedings{park-hong-cha:2016:PACLIC, address = {Seoul, Korea}, author = {Park, Jungyeul and Hong, Jeen-Pyo and Cha, Jeong-Won}, booktitle = {Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation (PACLIC 30)}, pages = {49--58}, title = {{Korean Language Resources for Everyone}}, year = {2016} } It requires Espresso's POS tagging results for input. Espresso is available at https://zenodo.org/record/884606 </p

FigShare