Search CORE

221,996 research outputs found

Radical Recognition in Off-Line Handwritten Chinese Characters Using Non-Negative Matrix Factorization

Author: Shuai Xiangying
Publication venue: Bard Digital Commons
Publication date: 01/01/2016
Field of study

In the past decade, handwritten Chinese character recognition has received renewed interest with the emergence of touch screen devices. Other popular applications include on-line Chinese character dictionary look-up and visual translation in mobile phone applications. Due to the complex structure of Chinese characters, this classification task is not exactly an easy one, as it involves knowledge from mathematics, computer science, and linguistics. Given a large image database of handwritten character data, the goal of my senior project is to use Non-Negative Matrix Factorization (NMF), a recent method for finding a suitable representation (parts-based representation) of image data, to detect specific sub-components in Chinese characters. NMF has only been applied to typed (printed) Chinese characters in different fonts. This project focuses specifically on how well NMF works on handwritten characters. In addition, research in Chinese character classification has mainly been done using holistic approaches - treating each character as an inseparable unit. By using NMF, this project takes a different approach by focusing on a more specific problem in Chinese character classification: radical (sub-component) detection. Finally, a possible application of radical detection will be proposed. This interactive application can potentially help Chinese language learners better recognize characters by radicals

Bard College

Boosting Named Entity Recognition with Neural Character Embeddings

Author: Guimarães Victor
Santos Cicero Nogueira dos
Publication venue
Publication date: 01/01/2015
Field of study

Most state-of-the-art named entity recognition (NER) systems rely on handcrafted features and on the output of other NLP tasks such as part-of-speech (POS) tagging and text chunking. In this work we propose a language-independent NER system that uses automatically learned features only. Our approach is based on the CharWNN deep neural network, which uses word-level and character-level representations (embeddings) to perform sequential classification. We perform an extensive number of experiments using two annotated corpora in two different languages: HAREM I corpus, which contains texts in Portuguese; and the SPA CoNLL-2002 corpus, which contains texts in Spanish. Our experimental results shade light on the contribution of neural character embeddings for NER. Moreover, we demonstrate that the same neural network which has been successfully applied to POS tagging can also achieve state-of-the-art results for language-independet NER, using the same hyperparameters, and without any handcrafted features. For the HAREM I corpus, CharWNN outperforms the state-of-the-art system by 7.9 points in the F1-score for the total scenario (ten NE classes), and by 7.2 points in the F1 for the selective scenario (five NE classes).Comment: 9 page

arXiv.org e-Print Archive

Crossref