2,405 research outputs found

    Chinese unknown word identification as known word tagging

    Get PDF
    This paper presents a tagging approach to Chinese unknown word identification based on lexicalized hidden Markov models (LHMMs). In this work, Chinese unknown word identification is represented as a tagging task on a sequence of known words by introducing word-formation patterns and part-of-speech. Based on the lexicalized HMMs, a statistical tagger is further developed to assign each known word an appropriate tag that indicates its pattern in forming a word and the part-of-speech of the formed word. The experimental results on the Peking University corpus indicate that the use of lexicalization technique and the introduction of part-of-speech are helpful to unknown word identification. The experiment on the SIGHAN-PK open test data also shows that our system can achieve state-of-art performance.published_or_final_versio

    Chinese text chunking using lexicalized HMMS

    Get PDF
    This paper presents a lexicalized HMM-based approach to Chinese text chunking. To tackle the problem of unknown words, we formalize Chinese text chunking as a tagging task on a sequence of known words. To do this, we employ the uniformly lexicalized HMMs and develop a lattice-based tagger to assign each known word a proper hybrid tag, which involves four types of information: word boundary, POS, chunk boundary and chunk type. In comparison with most previous approaches, our approach is able to integrate different features such as part-of-speech information, chunk-internal cues and contextual information for text chunking under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. Our preliminary experiments on the PolyU Shallow Treebank show that the use of lexicalization technique can substantially improve the performance of a HMM-based chunking system. © 2005 IEEE.published_or_final_versio

    Crystal Structures of the structure-selective nuclease Mus81-Eme1 bound to flap DNA substrates

    Get PDF
    The Mus81-Eme1 complex is a structure-selective endonuclease with a critical role in the resolution of recombination intermediates during DNA repair after interstrand cross-links, replication fork collapse, or double-strand breaks. To explain the molecular basis of 3 ' flap substrate recognition and cleavage mechanism by Mus81-Eme1, we determined crystal structures of human Mus81-Eme1 bound to various flap DNA substrates. Mus81-Eme1 undergoes gross substrate-induced conformational changes that reveal two key features: (i) a hydrophobic wedge of Mus81 that separates pre- and post-nick duplex DNA and (ii) a 5 ' end binding pocket that hosts the 5 ' nicked end of post-nick DNA. These features are crucial for comprehensive protein-DNA interaction, sharp bending of the 3 ' flap DNA substrate, and incision strand placement at the active site. While Mus81-Eme1 unexpectedly shares several common features with members of the 5 ' flap nuclease family, the combined structural, biochemical, and biophysical analyses explain why Mus81-Eme1 preferentially cleaves 3 ' flap DNA substrates with 5 ' nicked ends.X11119Ysciescopu

    Video-Based Real Time Analysis of Plankton Particle Size Spectrum

    Get PDF
    Plankton is one of the most basic components in the marine ecosystem. The community structure and population change of plankton are the important ecological information to reflect the environmental situation. As the fundamental parameter of the plankton community structure, size spectrum is very useful for the evaluation of the marine ecosystem. In this paper, we propose a real-time and adaptive algorithm to calculate the size spectrum of underwater plankton video, which is captured by the high-resolution and high-speed optical camera. First, this algorithm screens the high-resolution plankton images to ensure that every plankton is counted once with the clearest frame. Second, edge detection and morphological methods are performed to get plankton areas. Furthermore, we perform several simplifications that each particle is handled as ellipses shape to calculate the volume to obtain the size spectrum. Moreover, in order to facilitate the biologists to research plankton deeply, we record a region of the clear area containing each plankton to build a plankton database
    corecore