944 research outputs found

    Chinese named entity recognition using lexicalized HMMs

    Get PDF
    This paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied.postprin

    Chinese unknown word identification as known word tagging

    Get PDF
    This paper presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.published_or_final_versio

    Integrated approaches to prosodic word prediction for Chinese TTS

    Get PDF
    We focus on integrated prosodic word prediction for Chinese TTS. To avoid the problem of inconsistency between lexical words and prosodic words in Chinese, lexical word segmentation and prosodic word prediction are taken as one process instead of two independent tasks. Furthermore, two word-based approaches are proposed to drive this integrated prosodic word prediction: The first one follows the notion of lexicalized hidden Markov models, and the second one is borrowed from unknown word identification for Chinese. The results of our primary experiment show these integrated approaches are effective.published_or_final_versio

    An integrated approach for Chinese word segmentation

    Get PDF

    Chinese text chunking using lexicalized HMMS

    Get PDF
    This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system. © 2005 IEEE.published_or_final_versio

    Template-Based Static Posterior Inference for Bayesian Probabilistic Programming

    Full text link
    In Bayesian probabilistic programming, a central problem is to estimate the normalised posterior distribution (NPD) of a probabilistic program with conditioning. Prominent approximate approaches to address this problem include Markov chain Monte Carlo and variational inference, but neither can generate guaranteed outcomes within limited time. Moreover, most existing formal approaches that perform exact inference for NPD are restricted to programs with closed-form solutions or bounded loops/recursion. A recent work (Beutner et al., PLDI 2022) derived guaranteed bounds for NPD over programs with unbounded recursion. However, as this approach requires recursion unrolling, it suffers from the path explosion problem. Furthermore, previous approaches do not consider score-recursive probabilistic programs that allow score statements inside loops, which is non-trivial and requires careful treatment to ensure the integrability of the normalising constant in NPD. In this work, we propose a novel automated approach to derive bounds for NPD via polynomial templates. Our approach can handle probabilistic programs with unbounded while loops and continuous distributions with infinite supports. The novelties in our approach are three-fold: First, we use polynomial templates to circumvent the path explosion problem from recursion unrolling; Second, we derive a novel multiplicative variant of Optional Stopping Theorem that addresses the integrability issue in score-recursive programs; Third, to increase the accuracy of the derived bounds via polynomial templates, we propose a novel technique of truncation that truncates a program into a bounded range of program values. Experiments over a wide range of benchmarks demonstrate that our approach is time-efficient and can derive bounds for NPD that are comparable with (or tighter than) the recursion-unrolling approach (Beutner et al., PLDI 2022)

    Smoking behaviour, involuntary smoking, attitudes towards smoke-free legislations, and tobacco control activities in the European Union

    Get PDF
    The six most important cost-effective policies on tobacco control can be measured by the Tobacco Control Scale (TCS). The objective of our study was to describe the correlation between the TCS and smoking prevalence, self-reported exposure to secondhand smoke (SHS) and attitudes towards smoking restrictions in the 27 countries of the European Union (EU27)
    • 

    corecore