Search CORE

56 research outputs found

JACY - a grammar for annotating syntax, semantics and pragmatics of written and spoken japanese for NLP application purposes

Author: Siegel Melanie
Publication venue
Publication date: 01/01/2006
Field of study

In this text, we describe the development of a broad coverage grammar for Japanese that has been built for and used in different application contexts. The grammar is based on work done in the Verbmobil project (Siegel 2000) on machine translation of spoken dialogues in the domain of travel planning. The second application for JACY was the automatic email response task. Grammar development was described in Oepen et al. (2002a). Third, it was applied to the task of understanding material on mobile phones available on the internet, while embedded in the project DeepThought (Callmeier et al. 2004, Uszkoreit et al. 2004). Currently, it is being used for treebanking and ontology extraction from dictionary definition sentences by the Japanese company NTT (Bond et al. 2004)

Hochschulschriftenserver - Universität Frankfurt am Main

JTEC panel report on machine translation in Japan

Author: Carbonell Jaime
Johnson David
Rich Elaine
Tomita Masaru
Vasconcellos Muriel
Wilks Yorick
Publication venue
Publication date
Field of study

The goal of this report is to provide an overview of the state of the art of machine translation (MT) in Japan and to provide a comparison between Japanese and Western technology in this area. The term 'machine translation' as used here, includes both the science and technology required for automating the translation of text from one human language to another. Machine translation is viewed in Japan as an important strategic technology that is expected to play a key role in Japan's increasing participation in the world economy. MT is seen in Japan as important both for assimilating information into Japanese as well as for disseminating Japanese information throughout the world. Most of the MT systems now available in Japan are transfer-based systems. The majority of them exploit a case-frame representation of the source text as the basis of the transfer process. There is a gradual movement toward the use of deeper semantic representations, and some groups are beginning to look at interlingua-based systems

NASA Technical Reports Server

Recommended from our members

Placing the Displaced: Translating the North Korean Characters in Sister Mok-rahn for an American Audience

Author: Lee Dayoung
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

No two languages are more different from each other than American English and Korean spoken in the Democratic People’s Republic of Korea (North Korea). It goes the same for their speakers. Average theatergoers in New York in 2017 share little with the average North Koreans or North Korean refugees scattered around the world. So how should a translator present North Korean characters for an American audience? This paper discusses dramaturgical issues surrounding translating the North Korean characters in "Sister Mok-rahn" for an American audience

Columbia University Academic Commons

Hybrid discourse modeling and summarization for a speech-to-speech translation system

Author: Alexandersson Jan
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2003
Field of study

The thesis discusses two parts of the speech-to-speech translation system VerbMobil: the dialogue model and one of its applications, multilingual summary generation. In connection with the dialogue model, two topics are of special interest: (a) the use of a default unification operation called overlay as the fundamental operation for dialogue management; and (b) an intentional model that is able to describe intentions in dialogue on five levels in a language-independent way. Besides the actual generation algorithm developed, we present a comprehensive evaluation of the summarization functionality. In addition to precision and recall, a new characterization - confabulation - is defined that provides a more precise understanding of the performance of complex natural language processing systems.Die vorliegende Arbeit behandelt hauptsächlich zwei Themen, die für das VerbMobil-System, ein Übersetzungssystem gesprochener Spontansprache, entwickelt wurden: das Dialogmodell und als Applikation die multilinguale Generierung von Ergebnissprotokollen. Für die Dialogmodellierung sind zwei Themen von besonderem Interesse. Das erste behandelt eine in der vorliegenden Arbeit formalisierte Default-Unifikations-Operation namens Overlay, die als fundamentale Operation für Diskursverarbeitung dient. Das zweite besteht aus einem intentionalen Modell, das Intentionen eines Dialogs auf fünf Ebenen in einer sprachunabhängigen Repräsentation darstellt. Neben dem für die Protokollgenerierung entwickelten Generierungsalgorithmus wird eine umfassende Evaluation zur Protokollgenerierungsfunktionalität vorgestellt. Zusätzlich zu "precision" und "recall" wird ein neues Maß - Konfabulation (Engl.: "confabulation") - vorgestellt, das eine präzisere Charakterisierung der Qualität eines komplexen Sprachverarbeitungssystems ermöglicht

Universaar

Acronym

Hybrid discourse modeling and summarization for a speech-to-speech translation system

Author: Alexandersson Jan
Publication venue
Publication date: 01/01/2003
Field of study

Acronym

A survey of studies in systemic functional language description and typology

Author: Mwinlaaru IN
Xuan WWH
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/05/2017
Field of study

Version of RecordPublishe

The Hong Kong Polytechnic University Pao Yue-kong Library

PolyU Institutional Repository

Structured Named Entities

Author: Ringland Nicola
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2016
Field of study

The names of people, locations, and organisations play a central role in language, and named entity recognition (NER) has been widely studied, and successfully incorporated, into natural language processing (NLP) applications. The most common variant of NER involves identifying and classifying proper noun mentions of these and miscellaneous entities as linear spans in text. Unfortunately, this version of NER is no closer to a detailed treatment of named entities than chunking is to a full syntactic analysis. NER, so construed, reflects neither the syntactic nor semantic structure of NE mentions, and provides insufficient categorical distinctions to represent that structure. Representing this nested structure, where a mention may contain mention(s) of other entities, is critical for applications such as coreference resolution. The lack of this structure creates spurious ambiguity in the linear approximation. Research in NER has been shaped by the size and detail of the available annotated corpora. The existing structured named entity corpora are either small, in specialist domains, or in languages other than English. This thesis presents our Nested Named Entity (NNE) corpus of named entities and numerical and temporal expressions, taken from the WSJ portion of the Penn Treebank (PTB, Marcus et al., 1993). We use the BBN Pronoun Coreference and Entity Type Corpus (Weischedel and Brunstein, 2005a) as our basis, manually annotating it with a principled, fine-grained, nested annotation scheme and detailed annotation guidelines. The corpus comprises over 279,000 entities over 49,211 sentences (1,173,000 words), including 118,495 top-level entities. Our annotations were designed using twelve high-level principles that guided the development of the annotation scheme and difficult decisions for annotators. We also monitored the semantic grammar that was being induced during annotation, seeking to identify and reinforce common patterns to maintain consistent, parsimonious annotations. The result is a scheme of 118 hierarchical fine-grained entity types and nesting rules, covering all capitalised mentions of entities, and numerical and temporal expressions. Unlike many corpora, we have developed detailed guidelines, including extensive discussion of the edge cases, in an ongoing dialogue with our annotators which is critical for consistency and reproducibility. We annotated independently from the PTB bracketing, allowing annotators to choose spans which were inconsistent with the PTB conventions and errors, and only refer back to it to resolve genuine ambiguity consistently. We merged our NNE with the PTB, requiring some systematic and one-off changes to both annotations. This allows the NNE corpus to complement other PTB resources, such as PropBank, and inform PTB-derived corpora for other formalisms, such as CCG and HPSG. We compare this corpus against BBN. We consider several approaches to integrating the PTB and NNE annotations, which affect the sparsity of grammar rules and visibility of syntactic and NE structure. We explore their impact on parsing the NNE and merged variants using the Berkeley parser (Petrov et al., 2006), which performs surprisingly well without specialised NER features. We experiment with flattening the NNE annotations into linear NER variants with stacked categories, and explore the ability of a maximum entropy and a CRF NER system to reproduce them. The CRF performs substantially better, but is infeasible to train on the enormous stacked category sets. The flattened output of the Berkeley parser are almost competitive with the CRF. Our results demonstrate that the NNE corpus is feasible for statistical models to reproduce. We invite researchers to explore new, richer models of (joint) parsing and NER on this complex and challenging task. Our nested named entity corpus will improve a wide range of NLP tasks, such as coreference resolution and question answering, allowing automated systems to understand and exploit the true structure of named entities

Sydney eScholarship

Head-Driven Phrase Structure Grammar

Author
Publication venue: Language Science Press
Publication date: 27/01/2022
Field of study

Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)

Directory of Open Access Books (DOAB)

Language, Nation, Race

Author: Ueda Atsuko
Publication venue: 'University of California Press'
Publication date
Field of study

A free open access ebook is available upon publication. Learn more at www.luminosoa.org. Language, Nation, Race explores the various language reforms at the onset of Japanese modernity, a time when a “national language” (kokugo) was produced to standardize Japanese. Faced with the threat of Western colonialism, Meiji intellectuals proposed various reforms to standardize the Japanese language in order to quickly educate the illiterate masses. This book liberates these language reforms from the predetermined category of the “nation,” for such a notion had yet to exist as a clear telos to which the reforms aspired. Atsuko Ueda draws on, while critically intervening in, the vast scholarship of language reform that engaged with numerous works of postcolonial and cultural studies. She examines the first two decades of the Meiji period, with specific focus on the issue of race, contending that no analysis of imperialism or nationalism is possible without it

OAPEN Library

Head-Driven Phrase Structure Grammar

Author
Publication venue
Publication date
Field of study

OAPEN Library