Search CORE

177 research outputs found

Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie

Author: Yoshinaga Naoki
Publication venue
Publication date: 30/05/2023
Field of study

Accurate neural models are much less efficient than non-neural models and are useless for processing billions of social media posts or handling user queries in real time with a limited budget. This study revisits the fastest pattern-based NLP methods to make them as accurate as possible, thus yielding a strikingly simple yet surprisingly accurate morphological analyzer for Japanese. The proposed method induces reliable patterns from a morphological dictionary and annotated data. Experimental results on two standard datasets confirm that the method exhibits comparable accuracy to learning-based baselines, while boasting a remarkable throughput of over 1,000,000 sentences per second on a single modern CPU. The source code is available at https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jagger/Comment: 9 pages, 1 figure, 10 tables, Accepted by ACL 2023 (main conference

arXiv.org e-Print Archive

Summarization-based Data Augmentation for Document Classification

Author: Wang Yueguan
Yoshinaga Naoki
Publication venue
Publication date: 01/12/2023
Field of study

Despite the prevalence of pretrained language models in natural language understanding tasks, understanding lengthy text such as document is still challenging due to the data sparseness problem. Inspired by that humans develop their ability of understanding lengthy text from reading shorter text, we propose a simple yet effective summarization-based data augmentation, SUMMaug, for document classification. We first obtain easy-to-learn examples for the target document classification task by summarizing the input of the original training examples, while optionally merging the original labels to conform to the summarized input. We then use the generated pseudo examples to perform curriculum learning. Experimental results on two datasets confirmed the advantage of our method compared to existing baseline methods in terms of robustness and accuracy. We release our code and data at https://github.com/etsurin/summaug.Comment: The 4th New Frontiers in Summarization (with LLMs) Worksho

arXiv.org e-Print Archive

Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge

Author: Nishida Kosuke
Nishida Kyosuke
Yoshinaga Naoki
Publication venue
Publication date: 10/03/2023
Field of study

Although named entity recognition (NER) helps us to extract domain-specific entities from text (e.g., artists in the music domain), it is costly to create a large amount of training data or a structured knowledge base to perform accurate NER in the target domain. Here, we propose self-adaptive NER, which retrieves external knowledge from unstructured text to learn the usages of entities that have not been learned well. To retrieve useful knowledge for NER, we design an effective two-stage model that retrieves unstructured knowledge using uncertain entities as queries. Our model predicts the entities in the input and then finds those of which the prediction is not confident. Then, it retrieves knowledge by using these uncertain entities as queries and concatenates the retrieved text to the original input to revise the prediction. Experiments on CrossNER datasets demonstrated that our model outperforms strong baselines by 2.35 points in F1 metric.Comment: EACL2023 (long

arXiv.org e-Print Archive

PerPLM: Personalized Fine-tuning of Pretrained Language Models via Writer-specific Intermediate Learning and Prompts

Author: Oba Daisuke
Toyoda Masashi
Yoshinaga Naoki
Publication venue
Publication date: 14/09/2023
Field of study

The meanings of words and phrases depend not only on where they are used (contexts) but also on who use them (writers). Pretrained language models (PLMs) are powerful tools for capturing context, but they are typically pretrained and fine-tuned for universal use across different writers. This study aims to improve the accuracy of text understanding tasks by personalizing the fine-tuning of PLMs for specific writers. We focus on a general setting where only the plain text from target writers are available for personalization. To avoid the cost of fine-tuning and storing multiple copies of PLMs for different users, we exhaustively explore using writer-specific prompts to personalize a unified PLM. Since the design and evaluation of these prompts is an underdeveloped area, we introduce and compare different types of prompts that are possible in our setting. To maximize the potential of prompt-based personalized fine-tuning, we propose a personalized intermediate learning based on masked language modeling to extract task-independent traits of writers' text. Our experiments, using multiple tasks, datasets, and PLMs, reveal the nature of different prompts and the effectiveness of our intermediate learning approach.Comment: 11 page

arXiv.org e-Print Archive

An Adolescent Patient with Scabies Mimicking Gottron Papules

Author: Kawada Akira
Kawara Shigeru
Oiso Naoki
Yoshinaga Eiji
Publication venue: S. Karger AG
Publication date: 01/01/2009
Field of study

Atypical features of scabies occur in infants and children and patients with prolonged use of corticosteroids or immunosuppression. We report a non-immunosuppressed 15-year-old female case of scabies showing scaly reddish papules over the proximal interphalangeal joints mimicking Gottron papules in classic dermatomyositis. Periungal erythema was also seen. Four months’ topical corticosteroids from previous clinics had been used. Dermoscopic findings were consistent with typical pictures of scabies. Scraping of hand crusts demonstrated scabies mites and ova. Skin lesions of the patient were cured with oral ivermectin and topical 10% crotamiton. This case suggests that a lesion resembling Gottron papules may be added to the panel of unusual presentations of scabies

Crossref

Directory of Open Access Journals

PubMed Central

Mechanical properties of stainless steels with heterogeneous nanostructures

Author: Kobayashi Masakazu
Miura Hiromi
Sugiura Natuko
Yoshinaga Naoki
Publication venue: 'Purdue University (bepress)'
Publication date: 13/10/2016
Field of study

Purdue E-Pubs

Collective Sentiment Classification based on User Leniency and Product Popularity

Author: Gao Wenliang
Kaji Nobuhiro
Kitsuregawa Masaru
Yoshinaga Naoki
Publication venue: Department of English, National Chengchi University
Publication date: 01/01/2013
Field of study

Waseda University Repository