Search CORE

965 research outputs found

Chinese named entity recognition using lexicalized HMMs

Author: Fu G
Luke KK
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

This paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied.postprin

HKU Scholars Hub

Chinese unknown word identification as known word tagging

Author: Fu GH
Luke KK
Publication venue: IEEE.
Publication date: 01/01/2004
Field of study

This paper presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.published_or_final_versio

CiteSeerX

HKU Scholars Hub

Integrated approaches to prosodic word prediction for Chinese TTS

Author: Fu G
Luke KK
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

We focus on integrated prosodic word prediction for Chinese TTS. To avoid the problem of inconsistency between lexical words and prosodic words in Chinese, lexical word segmentation and prosodic word prediction are taken as one process instead of two independent tasks. Furthermore, two word-based approaches are proposed to drive this integrated prosodic word prediction: The first one follows the notion of lexicalized hidden Markov models, and the second one is borrowed from unknown word identification for Chinese. The results of our primary experiment show these integrated approaches are effective.published_or_final_versio

HKU Scholars Hub

An integrated approach for Chinese word segmentation

Author: Fu Guohong
Luke K.K
Publication venue: COLIPS PUBLICATIONS
Publication date: 01/01/2003
Field of study

Waseda University Repository

HKU Scholars Hub

Chinese text chunking using lexicalized HMMS

Author: Fu GH
Lu Q
Luke KK
Xu RF
Publication venue: IEEE.
Publication date: 01/01/2005
Field of study

This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system. © 2005 IEEE.published_or_final_versio

The Hong Kong Polytechnic University Pao Yue-kong Library

HKU Scholars Hub

Template-Based Static Posterior Inference for Bayesian Probabilistic Programming

Author: Fu Hongfei
Li Guanyan
Ong Luke
Wang Peixin
Yang Tengshun
Publication venue
Publication date: 24/07/2023
Field of study

In Bayesian probabilistic programming, a central problem is to estimate the normalised posterior distribution (NPD) of a probabilistic program with conditioning. Prominent approximate approaches to address this problem include Markov chain Monte Carlo and variational inference, but neither can generate guaranteed outcomes within limited time. Moreover, most existing formal approaches that perform exact inference for NPD are restricted to programs with closed-form solutions or bounded loops/recursion. A recent work (Beutner et al., PLDI 2022) derived guaranteed bounds for NPD over programs with unbounded recursion. However, as this approach requires recursion unrolling, it suffers from the path explosion problem. Furthermore, previous approaches do not consider score-recursive probabilistic programs that allow score statements inside loops, which is non-trivial and requires careful treatment to ensure the integrability of the normalising constant in NPD. In this work, we propose a novel automated approach to derive bounds for NPD via polynomial templates. Our approach can handle probabilistic programs with unbounded while loops and continuous distributions with infinite supports. The novelties in our approach are three-fold: First, we use polynomial templates to circumvent the path explosion problem from recursion unrolling; Second, we derive a novel multiplicative variant of Optional Stopping Theorem that addresses the integrability issue in score-recursive programs; Third, to increase the accuracy of the derived bounds via polynomial templates, we propose a novel technique of truncation that truncates a program into a bounded range of program values. Experiments over a wide range of benchmarks demonstrate that our approach is time-efficient and can derive bounds for NPD that are comparable with (or tighter than) the recursion-unrolling approach (Beutner et al., PLDI 2022)

arXiv.org e-Print Archive

Smoking behaviour, involuntary smoking, attitudes towards smoke-free legislations, and tobacco control activities in the European Union

Author: Clancy Luke
Fernández Muñoz Esteve
Fu Balboa Marcela
Gallus Silvano
La Vecchia Carlo
Martínez Martínez Cristina
Martínez Sánchez Jose M.
Sureda Xisca
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/05/2013
Field of study

The six most important cost-effective policies on tobacco control can be measured by the Tobacco Control Scale (TCS). The objective of our study was to describe the correlation between the TCS and smoking prevalence, self-reported exposure to secondhand smoke (SHS) and attitudes towards smoking restrictions in the 27 countries of the European Union (EU27)

CiteSeerX

Diposit Digital de la Universitat de Barcelona