944 research outputs found
Chinese named entity recognition using lexicalized HMMs
This paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied.postprin
Chinese unknown word identification as known word tagging
This paper presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.published_or_final_versio
Integrated approaches to prosodic word prediction for Chinese TTS
We focus on integrated prosodic word prediction for Chinese TTS. To avoid the problem of inconsistency between lexical words and prosodic words in Chinese, lexical word segmentation and prosodic word prediction are taken as one process instead of two independent tasks. Furthermore, two word-based approaches are proposed to drive this integrated prosodic word prediction: The first one follows the notion of lexicalized hidden Markov models, and the second one is borrowed from unknown word identification for Chinese. The results of our primary experiment show these integrated approaches are effective.published_or_final_versio
Chinese text chunking using lexicalized HMMS
This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system. © 2005 IEEE.published_or_final_versio
Template-Based Static Posterior Inference for Bayesian Probabilistic Programming
In Bayesian probabilistic programming, a central problem is to estimate the
normalised posterior distribution (NPD) of a probabilistic program with
conditioning. Prominent approximate approaches to address this problem include
Markov chain Monte Carlo and variational inference, but neither can generate
guaranteed outcomes within limited time. Moreover, most existing formal
approaches that perform exact inference for NPD are restricted to programs with
closed-form solutions or bounded loops/recursion. A recent work (Beutner et
al., PLDI 2022) derived guaranteed bounds for NPD over programs with unbounded
recursion. However, as this approach requires recursion unrolling, it suffers
from the path explosion problem. Furthermore, previous approaches do not
consider score-recursive probabilistic programs that allow score statements
inside loops, which is non-trivial and requires careful treatment to ensure the
integrability of the normalising constant in NPD.
In this work, we propose a novel automated approach to derive bounds for NPD
via polynomial templates. Our approach can handle probabilistic programs with
unbounded while loops and continuous distributions with infinite supports. The
novelties in our approach are three-fold: First, we use polynomial templates to
circumvent the path explosion problem from recursion unrolling; Second, we
derive a novel multiplicative variant of Optional Stopping Theorem that
addresses the integrability issue in score-recursive programs; Third, to
increase the accuracy of the derived bounds via polynomial templates, we
propose a novel technique of truncation that truncates a program into a bounded
range of program values. Experiments over a wide range of benchmarks
demonstrate that our approach is time-efficient and can derive bounds for NPD
that are comparable with (or tighter than) the recursion-unrolling approach
(Beutner et al., PLDI 2022)
Smoking behaviour, involuntary smoking, attitudes towards smoke-free legislations, and tobacco control activities in the European Union
The six most important cost-effective policies on tobacco control can be measured by the Tobacco Control Scale (TCS). The objective of our study was to describe the correlation between the TCS and smoking prevalence, self-reported exposure to secondhand smoke (SHS) and attitudes towards smoking restrictions in the 27 countries of the European Union (EU27)
- âŠ