Search CORE

6 research outputs found

Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

Author: Kang Jaewoo
Kim Hyunjae
Yoo Jaehyo
Yoon Seunghyun
Publication venue
Publication date: 01/06/2023
Field of study

Most weakly supervised named entity recognition (NER) models rely on domain-specific dictionaries provided by experts. This approach is infeasible in many domains where dictionaries do not exist. While a phrase retrieval model was used to construct pseudo-dictionaries with entities retrieved from Wikipedia automatically in a recent study, these dictionaries often have limited coverage because the retriever is likely to retrieve popular entities rather than rare ones. In this study, we present a novel framework, HighGEN, that generates NER datasets with high-coverage pseudo-dictionaries. Specifically, we create entity-rich dictionaries with a novel search method, called phrase embedding search, which encourages the retriever to search a space densely populated with various entities. In addition, we use a new verification process based on the embedding distance between candidate entity mentions and entity types to reduce the false-positive noise in weak labels generated by high-coverage dictionaries. We demonstrate that HighGEN outperforms the previous best model by an average F1 score of 4.7 across five NER benchmark datasets.Comment: ACL 202

arXiv.org e-Print Archive

Simple Questions Generate Named Entity Recognition Datasets

Author: Kang Jaewoo
Kim Hyunjae
Lee Jinhyuk
Yoo Jaehyo
Yoon Seunghyun
Publication venue
Publication date: 24/05/2022
Field of study

Recent named entity recognition (NER) models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer training resources, our models solely trained on the generated datasets largely outperform strong low-resource models by 20.8 F1 score on average across six popular NER benchmarks. Our models also show competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by 5.2 F1 score on three benchmarks and achieve new state-of-the-art performance.Comment: Code available at https://github.com/dmis-lab/GeNE

arXiv.org e-Print Archive

Synthesis of Carbon-Coated TiO 2

Author: Hyunjae Park
Kangil Kim
Seungryul Yoo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Risk management algorithm for rear-side collision avoidance using a combined steering torque overlay and differential braking

Author: Bongchul Ko
Gahinet P
Hyokjin Chong
Hyunjae Yoo
Junyung Lee
Kwon WH
Kyongsu Yi
Rajamani R.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Ion-to-ion amplification through an open-junction ionic diode

Author: Asplund
Chun
Fattahi
Gabrielsson
Hae-Ryung Lee
Hammock
Han
Han
Hyunjae Yoo
Jeong-Yun Sun
Jiang
Kim
Lee
Lin
Liu
Min-Ah Oh
Ono
Pang
Peppas
Seok Hee Han
Seung-Min Lim
Simon
Sjostrom
Taek Dong Chung
Takahashi
Tybrandt
Tybrandt
Young-Chang Joo
Zhang
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date
Field of study

Crossref