Search CORE

21 research outputs found

Generating a Linguistic Model for Requirement Quality Analysis

Author: Kang Juyeon
Park Jungyeul
Publication venue: Hankookmunhwasa
Publication date: 01/01/2016
Field of study

Waseda University Repository

Korean Language Resources for Everyone

Author: Cha Jeong-Won
Hong Jeen-Pyo
Park Jungyeul
Publication venue: Hankookmunhwasa
Publication date: 01/01/2016
Field of study

Waseda University Repository

Extrinsic Factors Affecting the Accuracy of Biomedical NER

Author: Li Zhiyi
Park Jungyeul
Song Yujie
Zhang Shengjie
Publication venue
Publication date: 29/05/2023
Field of study

Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important biomedical information, which can be used to improve downstream applications including the healthcare system. However, NER in the biomedical domain is challenging due to limited data availability, as the high expertise, time, and expenses are required to annotate its data. In this paper, by using the limited data, we explore various extrinsic factors including the corpus annotation scheme, data augmentation techniques, semi-supervised learning and Brill transformation, to improve the performance of a NER model on a clinical text dataset (i2b2 2012, \citet{sun-rumshisky-uzuner:2013}). Our experiments demonstrate that these approaches can significantly improve the model's F1 score from original 73.74 to 77.55. Our findings suggest that considering different extrinsic factors and combining these techniques is a promising approach for improving NER performance in the biomedical domain where the size of data is limited

arXiv.org e-Print Archive

Recommending the Meanings of Newly Coined Words

Author: Cha Jeong-Won
Kim Young-Min
Lee Jong Gun
Park Jungyeul
Publication venue: Published by Elsevier Ltd.
Publication date: 01/01/2011
Field of study

AbstractIn this paper, we investigate how to recommend the meanings of newly coined words, such as newly coined named entities and Internet jargon. Our approach automatically chooses a document explaining a given newly coined word among candidate documents from multiple web references using Probabilistic Latent Semantic Analysis [1]. Briefly, it involves finding the topic of a document containing the newly coined word and computing the conditional probability of the topic given each candidate document. We validate our methodology with two real datasets from MySpace forums and Twitter by referencing three web services, Google, Urbandictionary, and Wikipedia, and we show that we properly recommend the meanings of a set of given newly coined words with 69.5% and 80.5% accuracies based on our three recommendations, respectively. Moreover, we compare our approach against three baselines where one references the result from each web service and our approach outperforms them

Elsevier - Publisher Connector

Crossref

Yet Another Format of Universal Dependencies for Korean

Author: Chen Yige
Jo Eunkyul Leah
Lim KyungTae
Park Jungyeul
Silfverberg Miikka
Tyers Francis M.
Yao Yundong
Publication venue
Publication date: 20/09/2022
Field of study

In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automatically. The effectiveness of the proposed format for Korean dependency parsing is then testified by both statistical and neural models, including UDPipe and Stanza, with our carefully constructed morpheme-based word embedding for Korean. morphUD outperforms parsing results for all Korean UD treebanks, and we also present detailed error analyses.Comment: COLING2022, Poste

arXiv.org e-Print Archive

Neural Automated Writing Evaluation with Corrective Feedback

Author: Coates Edith
Kuang Jiexin
Liu Siliang
Park Jungyeul
Qiu Mengyang
Wang Izia Xiaoxiao
Wu Xihan
Zeng Min
Publication venue
Publication date: 06/05/2024
Field of study

The utilization of technology in second language learning and teaching has become ubiquitous. For the assessment of writing specifically, automated writing evaluation (AWE) and grammatical error correction (GEC) have become immensely popular and effective methods for enhancing writing proficiency and delivering instant and individualized feedback to learners. By leveraging the power of natural language processing (NLP) and machine learning algorithms, AWE and GEC systems have been developed separately to provide language learners with automated corrective feedback and more accurate and unbiased scoring that would otherwise be subject to examiners. In this paper, we propose an integrated system for automated writing evaluation with corrective feedback as a means of bridging the gap between AWE and GEC results for second language learners. This system enables language learners to simulate the essay writing tests: a student writes and submits an essay, and the system returns the assessment of the writing along with suggested grammatical error corrections. Given that automated scoring and grammatical correction are more efficient and cost-effective than human grading, this integrated system would also alleviate the burden of manually correcting innumerable essays.Comment: Supported by the SoTL Seed Program at UB

arXiv.org e-Print Archive

Trance parser model for Korean: Sejong treebank

Author: Jungyeul Park (5182154)
Publication venue
Publication date
Field of study

Trance parsing model + Embedding vector. See https://github.com/tarowatanabe/trance for the parser and its usage. We also provide parsing and learning scripts for the Trance parser that we used for the paper; 1/ parsing model: ptb_train.txt.model-d100.tar.gz 2/ embedding vector: embedding-d100.vec.gz 3/ trance parser parsing script: trance-parsing.sh 4/ trance parser (batch) learning script: trance-training-batch.sh 5/ test.txt (gold file) and test.txt.leaf is for the parser input. ## Jungyeul Park, A Note on Constituent Parsing for Korean Using the Sejong Treebank (submitted to TALLIP). October 2017. See https://github.com/jungyeul/tallip-sjtree-parsing for more detail. </p

The Francis Crick Institute

Universal Dependencies for Korean: Hani (ver1.0)

Author: Park Jungyeul (5365918)
Publication venue
Publication date
Field of study

Universal Dependencies for Korean: Hani (ver1.0) </p

The Francis Crick Institute