Search CORE

6,058 research outputs found

Fast and Accurate Neural Word Segmentation for Chinese

Author: Cai Deng
Huang Feiyue
Wu Yongjian
Xin Yuan
Zhang Zhisong
Zhao Hai
Publication venue
Publication date: 01/01/2017
Field of study

Neural models with minimal feature engineering have achieved competitive performance against traditional methods for the task of Chinese word segmentation. However, both training and working procedures of the current neural models are computationally inefficient. This paper presents a greedy neural word segmenter with balanced word and character embedding inputs to alleviate the existing drawbacks. Our segmenter is truly end-to-end, capable of performing segmentation much faster and even more accurate than state-of-the-art neural models on Chinese benchmark datasets.Comment: To appear in ACL201

arXiv.org e-Print Archive

Crossref

Estimating the granularity coefficient of a Potts-Markov random field within an MCMC algorithm

Author: Batatia Hadj
Dobigeon Nicolas
Pereyra Marcelo
Tourneret Jean-Yves
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/07/2012
Field of study

This paper addresses the problem of estimating the Potts parameter B jointly with the unknown parameters of a Bayesian model within a Markov chain Monte Carlo (MCMC) algorithm. Standard MCMC methods cannot be applied to this problem because performing inference on B requires computing the intractable normalizing constant of the Potts model. In the proposed MCMC method the estimation of B is conducted using a likelihood-free Metropolis-Hastings algorithm. Experimental results obtained for synthetic data show that estimating B jointly with the other unknown parameters leads to estimation results that are as good as those obtained with the actual value of B. On the other hand, assuming that the value of B is known can degrade estimation performance significantly if this value is incorrect. To illustrate the interest of this method, the proposed algorithm is successfully applied to real bidimensional SAR and tridimensional ultrasound images

arXiv.org e-Print Archive

CiteSeerX

Crossref

Heriot Watt Pure

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Explore Bristol Research

Filtered Semi-Markov CRF

Author: Charnois Thierry
Holat Pierre
Khbir Niama El
Tomeh Nadi
Zaratiana Urchade
Publication venue
Publication date: 29/11/2023
Field of study

Semi-Markov CRF has been proposed as an alternative to the traditional Linear Chain CRF for text segmentation tasks such as Named Entity Recognition (NER). Unlike CRF, which treats text segmentation as token-level prediction, Semi-CRF considers segments as the basic unit, making it more expressive. However, Semi-CRF suffers from two major drawbacks: (1) quadratic complexity over sequence length, as it operates on every span of the input sequence, and (2) inferior performance compared to CRF for sequence labeling tasks like NER. In this paper, we introduce Filtered Semi-Markov CRF, a variant of Semi-CRF that addresses these issues by incorporating a filtering step to eliminate irrelevant segments, reducing complexity and search space. Our approach is evaluated on several NER benchmarks, where it outperforms both CRF and Semi-CRF while being significantly faster. The implementation of our method is available on \href{https://github.com/urchade/Filtered-Semi-Markov-CRF}{Github}.Comment: EMNLP 2023 (Findings

arXiv.org e-Print Archive

Which Is Essential for Chinese Word Segmentation: Character versus Word

Author: Huang Chang-Ning
Zhao Hai
Publication venue: 'Tsinghua University Press'
Publication date: 01/10/2006
Field of study

PACLIC 20 / Wuhan, China / 1-3 November, 200

Waseda University Repository

Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition

Author: Jin Lianwen
Lyons Terry
Ni Hao
Sun Zenghui
Xie Zecheng
Publication venue
Publication date: 01/01/2017
Field of study

Online handwritten Chinese text recognition (OHCTR) is a challenging problem as it involves a large-scale character set, ambiguous segmentation, and variable-length input sequences. In this paper, we exploit the outstanding capability of path signature to translate online pen-tip trajectories into informative signature feature maps using a sliding window-based method, successfully capturing the analytic and geometric properties of pen strokes with strong local invariance and robustness. A multi-spatial-context fully convolutional recurrent network (MCFCRN) is proposed to exploit the multiple spatial contexts from the signature feature maps and generate a prediction sequence while completely avoiding the difficult segmentation problem. Furthermore, an implicit language model is developed to make predictions based on semantic context within a predicting feature sequence, providing a new perspective for incorporating lexicon constraints and prior knowledge about a certain language in the recognition procedure. Experiments on two standard benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with correct rates of 97.10% and 97.15%, respectively, which are significantly better than the best result reported thus far in the literature.Comment: 14 pages, 9 figure

arXiv.org e-Print Archive

UCL Discovery

Oxford University Research Archive