Search CORE

658 research outputs found

Chinese text chunking using lexicalized HMMS

Author: Fu GH
Lu Q
Luke KK
Xu RF
Publication venue: IEEE.
Publication date: 01/01/2005
Field of study

This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system. © 2005 IEEE.published_or_final_versio

The Hong Kong Polytechnic University Pao Yue-kong Library

HKU Scholars Hub

TCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models

Author: Li Wan Yin
Liu James
Lu Qin
Publication venue: 'Tsinghua University Press'
Publication date: 01/10/2006
Field of study

PACLIC 20 / Wuhan, China / 1-3 November, 200

Waseda University Repository

A Hybrid Extraction Model for Chinese Noun/Verb Synonymous bi-gram Collocations

Author: Li Wanyin
Lu Qin
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

MATREX: DCU machine translation system for IWSLT 2006

Author: Stroppa Nicolas
Way Andy
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, we give a description of the machine translation system developed at DCU that was used for our first participation in the evaluation campaign of the International Workshop on Spoken Language Translation (2006). This system combines two types of approaches. First, we use an EBMT approach to collect aligned chunks based on two steps: deterministic chunking of both sides and chunk alignment. We use several chunking and alignment strategies. We also extract SMT-style aligned phrases, and the two types of resources are combined. We participated in the Open Data Track for the following translation directions: Arabic-English and Italian-English, for which we translated both the single-best ASR hypotheses and the text input. We report the results of the system for the provided evaluation sets

Irish Universities

DCU Online Research Access Service

Syntactic Nuclei in Dependency Parsing -- A Multilingual Exploration

Author: Basirat Ali
Nivre Joakim
Publication venue
Publication date: 01/01/2021
Field of study

Standard models for syntactic dependency parsing take words to be the elementary units that enter into dependency relations. In this paper, we investigate whether there are any benefits from enriching these models with the more abstract notion of nucleus proposed by Tesni\`{e}re. We do this by showing how the concept of nucleus can be defined in the framework of Universal Dependencies and how we can use composition functions to make a transition-based dependency parser aware of this concept. Experiments on 12 languages show that nucleus composition gives small but significant improvements in parsing accuracy. Further analysis reveals that the improvement mainly concerns a small number of dependency relations, including nominal modifiers, relations of coordination, main predicates, and direct objects.Comment: Accepted at EACL-202

arXiv.org e-Print Archive

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Identifying Prepositional Phrases in Chinese Patent Texts with Rule-based and CRF Methods

Author: Jin Yaohong
Li Hongzheng
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

A hybrid extraction model for Chinese noun/verb synonym bi-gram

Author: Li W
Lu Q
Publication venue: Institute for Digital Enhancement of Cognitive Development, Waseda University
Publication date: 11/12/2014
Field of study

2011-2012 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe

CiteSeerX

PolyU Institutional Repository

An Arabic CCG approach for determining constituent types from Arabic Treebank

Author: Abo Bakr Hitahm M.
El-taher Ahmed I.
Shaalan Khaled
Zidan Ibrahim
Publication venue: King Saud University. Production and hosting by Elsevier B.V.
Publication date: 01/12/2014
Field of study

AbstractConverting a treebank into a CCGbank opens the respective language to the sophisticated tools developed for Combinatory Categorial Grammar (CCG) and enriches cross-linguistic development. The conversion is primarily a three-step process: determining constituents’ types, binarization, and category conversion. Usually, this process involves a preprocessing step to the Treebank of choice for correcting brackets and normalizing tags for any changes that were introduced during the manual annotation, as well as extracting morpho-syntactic information that is necessary for determining constituents’ types. In this article, we describe the required preprocessing step on the Arabic Treebank, as well as how to determine Arabic constituents’ types. We conducted an experiment on parts 1 and 2 of the Penn Arabic Treebank (PATB) aimed at converting the PATB into an Arabic CCGbank. The performance of our algorithm when applied to ATB1v2.0 & ATB2v2.0 was 99% identification of head nodes and 100% coverage over the Treebank data

Elsevier - Publisher Connector

Directory of Open Access Journals