Search CORE

4 research outputs found

A comparison of cooking recipe named entities between Japanese and English

Author: Carroll John
Mori Shinsuke
Yamakata Yoko
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

In this paper, we analyze the structural differences between the instructional text in Japanese and English cooking recipes. First, we constructed an English recipe corpus of 100 recipes, designed to be comparable to an existing Japanese recipe corpus. We annotated recipe named entities (r-NEs) in the English corpus according to guidelines previously defined for Japanese. We trained a state-of-art NE recognizer, PWNER, on the English r-NEs, and achieved very similar accuracy and coverage to previous results for the Japanese corpus, thus demonstrating the quality and consistency of the annotations. Second, we compared the r-NEs annotated in the Japanese and English corpora, and uncovered lexical, semantic, and underlying structural differences between Japanese and English recipes. We discuss reasons for these differences, which have significant implications for cross-language retrieval and automatic translation of recipes

Crossref

Sussex Research Online

A comparison of cooking recipe named entities between Japanese and English

Author: Yamakata Yoko
Carroll John
Mori Shinsuke
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1942
Field of study

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)

Crossref

Sussex Research Online

固有表現抽出におけるアノテーション手法の比較

Author: Hiroyuki SHINNOU
Kanako KOMIYA
Masaya SUZUKI
Minoru SASAKI
Tomoya IWAKURA
佐々木稔
古宮嘉那子
岩倉友哉
新納浩幸
鈴木雅也
Publication venue: 国立国語研究所
Publication date: 01/01/2017
Field of study

会議名: 言語資源活用ワークショップ2016, 開催地: 国立国語研究所, 会期: 2017年3月7日-8日, 主催: 国立国語研究所コーパス開発センター本稿では, 非専門家による固有表現抽出のタスクとしてのアノテーションを題材に, ふたつの手法について比較を行った. ひとつは既存の固有表現抽出器によるアノテーション結果に対し, 人手で修正を行う手法であり, もうひとつは人手で一からアノテーションを行う手法である. 実験には現代日本語書き言葉均衡コーパス(BCCWJ) を利用し, 手法ごとに1 テキストに対し2 人の非専門家を割り当てて, アノテーションを行った. 評価には, アノテーションにかかる時間, 一致率, Gold Standard との比較による正解率, それぞれの手法で作成されたコーパスを訓練事例とした場合の正解率を利用し, ジャンルごと, 及び, 全ジャンルのマイクロ平均とマクロ平均を算出した. 本実験の結果から, 全ジャンルのマイクロ平均とマクロ平均で比較した場合には既存のアノテーション結果を用いた手法の方が良い結果となるが, 既存の固有表現抽出器の訓練事例から離れたジャンルで同様に比較した場合には人手でアノテーションを行う手法の方が良い結果となることが明らかになった

Academic Repository of the National Institute for Japanese Language and Linguistics / 国立国語研究所学術情報リポジトリ

＜全文＞言語資源活用ワークショップ2016発表論文集

Author: National Institute for Japanese Language and Linguistics Center for Corpus Development
国立国語研究所コーパス開発センター
Publication venue: 国立国語研究所
Publication date: 01/01/2017
Field of study

会議名: 言語資源活用ワークショップ2016, 開催地: 国立国語研究所, 会期: 2017年3月7日-8日, 主催: 国立国語研究所コーパス開発センタ

Institutional Repositories DataBase (IRDB)

Academic Repository of the National Institute for Japanese Language and Linguistics / 国立国語研究所学術情報リポジトリ