177 research outputs found
Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie
Accurate neural models are much less efficient than non-neural models and are
useless for processing billions of social media posts or handling user queries
in real time with a limited budget. This study revisits the fastest
pattern-based NLP methods to make them as accurate as possible, thus yielding a
strikingly simple yet surprisingly accurate morphological analyzer for
Japanese. The proposed method induces reliable patterns from a morphological
dictionary and annotated data. Experimental results on two standard datasets
confirm that the method exhibits comparable accuracy to learning-based
baselines, while boasting a remarkable throughput of over 1,000,000 sentences
per second on a single modern CPU. The source code is available at
https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jagger/Comment: 9 pages, 1 figure, 10 tables, Accepted by ACL 2023 (main conference
Summarization-based Data Augmentation for Document Classification
Despite the prevalence of pretrained language models in natural language
understanding tasks, understanding lengthy text such as document is still
challenging due to the data sparseness problem. Inspired by that humans develop
their ability of understanding lengthy text from reading shorter text, we
propose a simple yet effective summarization-based data augmentation, SUMMaug,
for document classification. We first obtain easy-to-learn examples for the
target document classification task by summarizing the input of the original
training examples, while optionally merging the original labels to conform to
the summarized input. We then use the generated pseudo examples to perform
curriculum learning. Experimental results on two datasets confirmed the
advantage of our method compared to existing baseline methods in terms of
robustness and accuracy. We release our code and data at
https://github.com/etsurin/summaug.Comment: The 4th New Frontiers in Summarization (with LLMs) Worksho
Self-Adaptive Named Entity Recognition by Retrieving Unstructured Knowledge
Although named entity recognition (NER) helps us to extract domain-specific
entities from text (e.g., artists in the music domain), it is costly to create
a large amount of training data or a structured knowledge base to perform
accurate NER in the target domain. Here, we propose self-adaptive NER, which
retrieves external knowledge from unstructured text to learn the usages of
entities that have not been learned well. To retrieve useful knowledge for NER,
we design an effective two-stage model that retrieves unstructured knowledge
using uncertain entities as queries. Our model predicts the entities in the
input and then finds those of which the prediction is not confident. Then, it
retrieves knowledge by using these uncertain entities as queries and
concatenates the retrieved text to the original input to revise the prediction.
Experiments on CrossNER datasets demonstrated that our model outperforms strong
baselines by 2.35 points in F1 metric.Comment: EACL2023 (long
PerPLM: Personalized Fine-tuning of Pretrained Language Models via Writer-specific Intermediate Learning and Prompts
The meanings of words and phrases depend not only on where they are used
(contexts) but also on who use them (writers). Pretrained language models
(PLMs) are powerful tools for capturing context, but they are typically
pretrained and fine-tuned for universal use across different writers. This
study aims to improve the accuracy of text understanding tasks by personalizing
the fine-tuning of PLMs for specific writers. We focus on a general setting
where only the plain text from target writers are available for
personalization. To avoid the cost of fine-tuning and storing multiple copies
of PLMs for different users, we exhaustively explore using writer-specific
prompts to personalize a unified PLM. Since the design and evaluation of these
prompts is an underdeveloped area, we introduce and compare different types of
prompts that are possible in our setting. To maximize the potential of
prompt-based personalized fine-tuning, we propose a personalized intermediate
learning based on masked language modeling to extract task-independent traits
of writers' text. Our experiments, using multiple tasks, datasets, and PLMs,
reveal the nature of different prompts and the effectiveness of our
intermediate learning approach.Comment: 11 page
An Adolescent Patient with Scabies Mimicking Gottron Papules
Atypical features of scabies occur in infants and children and patients with prolonged use of corticosteroids or immunosuppression. We report a non-immunosuppressed 15-year-old female case of scabies showing scaly reddish papules over the proximal interphalangeal joints mimicking Gottron papules in classic dermatomyositis. Periungal erythema was also seen. Four months’ topical corticosteroids from previous clinics had been used. Dermoscopic findings were consistent with typical pictures of scabies. Scraping of hand crusts demonstrated scabies mites and ova. Skin lesions of the patient were cured with oral ivermectin and topical 10% crotamiton. This case suggests that a lesion resembling Gottron papules may be added to the panel of unusual presentations of scabies
- …