Search CORE

14 research outputs found

Parallel Distributed Grammar Engineering for Practical Applications

Author: Bender Emily M.
Callmeier Uli
Flickinger Dan
Oepen Stephan
Siegel Melanie
Publication venue
Publication date: 21/12/2011
Field of study

Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a software demonstration

Hochschulschriftenserver - Universität Frankfurt am Main

Efficient deep processing of japanese

Author: Bender Emily M.
Siegel Melanie
Publication venue
Publication date: 01/01/2002
Field of study

We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages

arXiv.org e-Print Archive

Crossref

Publications at Bielefeld University

Hochschulschriftenserver - Universität Frankfurt am Main

Corpora and evaluation tools for multilingual named entity grammar development

Author: Bering Christian
Droźdźyński Witold
Erbach Gregor
Guasch Clara
Homola Petr
Krieger Hans-Ulrich
Lehmann Sabine
Li Hong
Piskorski Jakub
Schäfer Ulrich
Shimada Atsuko
Siegel Melanie
Xu Feiyu
Ziegler-Eisele Dorothee
Publication venue
Publication date: 14/12/2011
Field of study

We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats

Hochschulschriftenserver - Universität Frankfurt am Main

Support for Internet-Based Commonsense Processing – Causal Knowledge Discovery Using Japanese “If” Forms

Author: J. Weizenbaum
P. Singh
R. Rzepka
T. Inui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Abstract. This paper introduces our method for causal knowledge re-trieval from the Internet resources, its results and evaluation of using it in utterance creation process. Our system automatically retrieves common-sensical knowledge from the Web resources by using simple web-mining and information extraction techniques. For retrieving causal knowledge the system uses three of specific several Japanese “if ” forms. From the results we can conclude that Japanese web pages indexed by a common search engine spiders are enough to discover common causal relationships and this knowledge can be used for making Human-Computer Interfaces sound more natural and interesting than while using classic methods

CiteSeerX

Crossref

Pronunciation Ambiguities in Japanese Kanji

Author: Zhang Wen
Publication venue: CUNY Academic Works
Publication date: 01/02/2023
Field of study

Japanese writing is a complex system, and a large part of the complexity resides in the use of kanji. A single kanji character in modern Japanese may have multiple pronunciations, either as native vocabulary or as words borrowed from Chinese. This causes a problem for text-to-speech synthesis (TTS) because the system has to predict which pronunciation of each kanji character is appropriate in the context. The problem is called homograph disambiguation. In Japanese TTS technology, the trick in any case is to know which is the right reading, which makes reading Japanese text a challenge. To solve the problem, this research provides a new annotated Japanese single kanji character pronunciation data set and describes an experiment using logistic regression (LR) classifier. A baseline is computed to compare with the LR classifier accuracy. The LR classifier improves the modeling performance by 16%. This experiment provides the first experimental research in Japanese single kanji homograph disambiguation. The annotated Japanese data is freely released to the public to support further work

City University of New York

JACY - a grammar for annotating syntax, semantics and pragmatics of written and spoken japanese for NLP application purposes

Author: Siegel Melanie
Publication venue
Publication date: 01/01/2006
Field of study

In this text, we describe the development of a broad coverage grammar for Japanese that has been built for and used in different application contexts. The grammar is based on work done in the Verbmobil project (Siegel 2000) on machine translation of spoken dialogues in the domain of travel planning. The second application for JACY was the automatic email response task. Grammar development was described in Oepen et al. (2002a). Third, it was applied to the task of understanding material on mobile phones available on the internet, while embedded in the project DeepThought (Callmeier et al. 2004, Uszkoreit et al. 2004). Currently, it is being used for treebanking and ontology extraction from dictionary definition sentences by the Japanese company NTT (Bond et al. 2004)

Hochschulschriftenserver - Universität Frankfurt am Main

Designing Service-Oriented Chatbot Systems Using a Construction Grammar-Driven Natural Language Generation System

Author: Jenkins Marie-Claire
Publication venue
Publication date: 30/04/2011
Field of study

Service oriented chatbot systems are used to inform users in a conversational manner about a particular service or product on a website. Our research shows that current systems are time consuming to build and not very accurate or satisfying to users. We find that natural language understanding and natural language generation methods are central to creating an e�fficient and useful system. In this thesis we investigate current and past methods in this research area and place particular emphasis on Construction Grammar and its computational implementation. Our research shows that users have strong emotive reactions to how these systems behave, so we also investigate the human computer interaction component. We present three systems (KIA, John and KIA2), and carry out extensive user tests on all of them, as well as comparative tests. KIA is built using existing methods, John is built with the user in mind and KIA2 is built using the construction grammar method. We found that the construction grammar approach performs well in service oriented chatbots systems, and that users preferred it over other systems

University of East Anglia digital repository