195 research outputs found

    An Open Source Testing Tool for Evaluating Handwriting Input Methods

    Full text link
    This paper presents an open source tool for testing the recognition accuracy of Chinese handwriting input methods. The tool consists of two modules, namely the PC and Android mobile client. The PC client reads handwritten samples in the computer, and transfers them individually to the Android client in accordance with the socket communication protocol. After the Android client receives the data, it simulates the handwriting on screen of client device, and triggers the corresponding handwriting recognition method. The recognition accuracy is recorded by the Android client. We present the design principles and describe the implementation of the test platform. We construct several test datasets for evaluating different handwriting recognition systems, and conduct an objective and comprehensive test using six Chinese handwriting input methods with five datasets. The test results for the recognition accuracy are then compared and analyzed.Comment: 5 pages, 3 figures, 11 tables. Accepted to appear at ICDAR 201

    Open Set Chinese Character Recognition using Multi-typed Attributes

    Get PDF
    Recognition of Off-line Chinese characters is still a challenging problem, especially in historical documents, not only in the number of classes extremely large in comparison to contemporary image retrieval methods, but also new unseen classes can be expected under open learning conditions (even for CNN). Chinese character recognition with zero or a few training samples is a difficult problem and has not been studied yet. In this paper, we propose a new Chinese character recognition method by multi-type attributes, which are based on pronunciation, structure and radicals of Chinese characters, applied to character recognition in historical books. This intermediate attribute code has a strong advantage over the common `one-hot' class representation because it allows for understanding complex and unseen patterns symbolically using attributes. First, each character is represented by four groups of attribute types to cover a wide range of character possibilities: Pinyin label, layout structure, number of strokes, three different input methods such as Cangjie, Zhengma and Wubi, as well as a four-corner encoding method. A convolutional neural network (CNN) is trained to learn these attributes. Subsequently, characters can be easily recognized by these attributes using a distance metric and a complete lexicon that is encoded in attribute space. We evaluate the proposed method on two open data sets: printed Chinese character recognition for zero-shot learning, historical characters for few-shot learning and a closed set: handwritten Chinese characters. Experimental results show a good general classification of seen classes but also a very promising generalization ability to unseen characters.Comment: 29 pages, submitted to Pattern Recognitio

    The Challenges of Recognizing Offline Handwritten Chinese: A Technical Review

    Get PDF
    Offline handwritten Chinese recognition is an important research area of pattern recognition, including offline handwritten Chinese character recognition (offline HCCR) and offline handwritten Chinese text recognition (offline HCTR), which are closely related to daily life. With new deep learning techniques and the combination with other domain knowledge, offline handwritten Chinese recognition has gained breakthroughs in methods and performance in recent years. However, there have yet to be articles that provide a technical review of this field since 2016. In light of this, this paper reviews the research progress and challenges of offline handwritten Chinese recognition based on traditional techniques, deep learning methods, methods combining deep learning with traditional techniques, and knowledge from other areas from 2016 to 2022. Firstly, it introduces the research background and status of handwritten Chinese recognition, standard datasets, and evaluation metrics. Secondly, a comprehensive summary and analysis of offline HCCR and offline HCTR approaches during the last seven years is provided, along with an explanation of their concepts, specifics, and performances. Finally, the main research problems in this field over the past few years are presented. The challenges still exist in offline handwritten Chinese recognition are discussed, aiming to inspire future research work

    Character-level Convolutional Networks for Text Classification

    Get PDF
    This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.Comment: An early version of this work entitled "Text Understanding from Scratch" was posted in Feb 2015 as arXiv:1502.01710. The present paper has considerably more experimental results and a rewritten introduction, Advances in Neural Information Processing Systems 28 (NIPS 2015

    InkSight: Offline-to-Online Handwriting Conversion by Learning to Read and Write

    Full text link
    Digital note-taking is gaining popularity, offering a durable, editable, and easily indexable way of storing notes in the vectorized form, known as digital ink. However, a substantial gap remains between this way of note-taking and traditional pen-and-paper note-taking, a practice still favored by a vast majority. Our work, InkSight, aims to bridge the gap by empowering physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting), a process we refer to as Derendering. Prior research on the topic has focused on the geometric properties of images, resulting in limited generalization beyond their training domains. Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples, which are difficult to obtain. To our knowledge, this is the first work that effectively derenders handwritten text in arbitrary photos with diverse visual characteristics and backgrounds. Furthermore, it generalizes beyond its training domain into simple sketches. Our human evaluation reveals that 87% of the samples produced by our model on the challenging HierText dataset are considered as a valid tracing of the input image and 67% look like a pen trajectory traced by a human. Interactive visualizations of 100 word-level model outputs for each of the three public datasets are available in our Hugging Face space: https://huggingface.co/spaces/Derendering/Model-Output-Playground. Model release is in progress

    A Longitudinal Analysis of the Development of Mandarin Chinese in Fourth Grade Chinese Immersion

    Full text link
    Many studies have confirmed the benefits of dual language immersion programs. Research into reading and writing development in these programs, and particularly in Chinese immersion, is less common. In this dissertation, an attempt is made to address this gap in research by exploring the literacy development of fourth grade Chinese immersion students. Participants were 70 students, the entire fourth grade of an urban Chinese immersion school in the northeastern U.S. The school had recently made several curricular changes. They were adopting a practice of freewriting, or independent writing. In freewriting, students are encouraged to write as much as they can on a topic using all of their linguistic and meaning-making resources without regard for accuracy. They learn to write for self-expression and for readers (as opposed to writing for feedback). The school, in addition, adopted the Level Chinese reading system as part of an effort to systematize reading instruction and assessment. Lastly, they were actively considering ways to support student writing development through digital technologies. The school also administered annual year-end STAMP 4Se standardized tests of Chinese. The current studies aimed to understand effects of and relations between these curricular approaches. The first study in this dissertation aimed to understand how digital writing using Pinyin input might support development of literacy skills in Chinese immersion. In this study, the effects of a digital text messaging curriculum on freewriting were investigated. It was hypothesized that use of digital Pinyin input would facilitate connections between oral and written language by allowing learners to access vocabulary they could not yet write by hand but could type using Pinyin on an alphabetic keyboard. Students in two classes engaged in text messaging in small groups using digital Pinyin input in online chatrooms for 20 minutes, 3 times per week over an 8-week period. A matched group of students in other classes taught by the same teachers completed regular pencil-and-paper word work that focused on analysis of characters during the same time period. Texting with classmates using Pinyin input, when replacing multi-component word work, was negatively associated with freewriting output, that is, students who completed word work did better in freewriting post-texting intervention. Within texting groups, however, children who were successful at texting showed greater gains in freewriting abilities as compared to children with lesser success at texting. Given the importance of digital writing and online learning, the findings indicate that texting should supplement, but not replace multi-component word work. The second study reported in this dissertation built on the first study by investigating the development of writing, reading, and proficiency in L2 Chinese across the entire school year through a focus on freewriting. Our aim was to better understand how students use Chinese and all of their meaning-making resources in writing, and the relationship between student writing, reading and proficiency. First, student freewrites, that were collected at 3 time points over the school year, were examined to understand how students deployed their linguistic and meaning-making resources in writing. Students used a combination of correct characters and words written in Pinyin, homophones, English and pictures to fulfill their meaning-making needs in the moment. Proportions of words written in correct Chinese characters increased from 63% to 81% over successive freewrites. Writing ability grew over time, as assessed by diversity of vocabulary in freewrites. Reading ability as assessed by teachers using the Level Chinese system also grew. Lastly, we examined relations between classroom measures of writing and reading, participation in the texting curriculum, and language proficiency as measured by end-of-year 4Se standardized assessments of Chinese in the domains of reading, writing, listening and speaking. Classroom measures of reading predicted proficiency across the four domains of reading, writing, listening and speaking, while freewriting also predicted reading and writing proficiency. Students in the texting classes had higher proficiency in speaking, suggesting that digital interaction with peers supported oral communication. Pedagogical implications of the findings will be shared and discussed

    Researching the use of WebCT in Chinese language teaching and learning

    Get PDF
    This dissertation aims to fill the void in the literature on Chinese language pedagogy, research of technology use in real language classrooms, and research on heritage learners. The major goal of the research was to investigate diverse learners\u27, especially the heritage learners\u27, needs in order to identify well-grounded pedagogical innovations and desired learning outcomes in technology-integrated Chinese language classrooms. Adopting multiple theoretical frameworks and research methods, the three research articles included in this dissertation were designed to ensure that the learners, tools, and learning tasks were investigated in order to present a comprehensive and systematic look at the design, implementation, and evaluation of the use of Chinese WebCT in college-level Chinese language classes mixed with heritage and non-heritage learners in a U.S. Midwest university.;Chapter 2, The Teaching of Chinese to Chinese Americans: A Critical Multicultural Approach, reports on an ethnographic study of six American-born Chinese (ABC) heritage learners, presenting a portrait of these learners and their learning experiences in this White-dominated university. This study also indicates the complexities of the education of heritage learners in such a context. The WebCT online learning environment is seen to provide a useful venue to research heritage learners\u27 learning needs, process, knowledge-sharing and construction, and identity negotiation and development.;Chapter 3, Students\u27 Attention to Form in Different Dimensions of Interaction in Chinese WebCT, addresses a learner-centered issue, focus on form (FonF), and its impact on the pedagogical practices in blended Chinese language classes. This study sheds light on Chinese classroom instruction because focus on form itself challenges the traditional mindset of Chinese teachers who tend to assume an authoritative role in the classroom and depend on pre-planned syllabi and textbooks. The pedagogical implications of the study illustrate the value of learner-centered learning environments and language pedagogy with technology integration.;Chapter 4, Essay Writing in a Chinese WebCT Discussion Board, describes and evaluates a specific learning task---essay writing. This article presents the pedagogical design and evaluation of the appropriateness and effectiveness of an essay-writing task in the Chinese WebCT discussion board for diverse learners. It illustrates the power of a well-designed language learning task using the asynchronous CMC tool in WebCT.;Overall, the research findings suggest that (1) with theoretically and pedagogically sound design of the learning tasks and environment, technology is powerful in terms of scaffolding students\u27 language learning, creating an online learning community, and providing effective and innovative classroom instruction; (2) there is value in a mixture of heritage learners and non-heritage learners in classroom teaching and research as long as the learners have many opportunities for interaction; (3) researching my own classroom and teaching practices led to better understanding of my students, the learning process, my pedagogical beliefs, and to improvement of pedagogical practices, which reveals the promise and power of action research
    corecore