Search CORE

201 research outputs found

Multi-Character Field Recognition for Arabic and Chinese Handwriting

Author: Lopresti Daniel
Nagy George
Seth Sharad C.
Zhang Xiaoli
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 27/07/2017
Field of study

Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references

DigitalCommons@University of Nebraska

Multi-Character Field Recognition for Arabic and Chinese Handwriting

Author: Lopresti Daniel
Nagy George
Seth Sharad C.
Zhang Xiaoli
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 27/07/2017
Field of study

DigitalCommons@University of Nebraska

Combining diverse systems for handwritten text line recognition

Author: Bunke Horst
Knerr Stefan
Liwicki Marcus
Pittman James
Publication venue
Publication date: 18/06/2018
Field of study

In this paper, we present a recognition system for on-line handwritten texts acquired from a whiteboard. The system is based on the combination of several individual classifiers of diverse nature. Recognizers based on different architectures (hidden Markov models and bidirectional long short-term memory networks) and on different sets of features (extracted from on-line and off-line data) are used in the combination. In order to increase the diversity of the underlying classifiers and fully exploit the current state-of-the-art in cursive handwriting recognition, commercial recognition systems have been included in the combined system, leading to a final word level accuracy of 86.16%. This value is significantly higher than the performance of the best individual classifier (81.26%

RERO DOC Digital Library

Recommended from our members

Examining the effects of sub-word processing units on the time-course of typewriting

Author: Vernon ML
Publication venue
Publication date: 01/08/2019
Field of study

Contrary to models of speech production and handwriting, models of typewriting lack an account of processing of sub-word units (i.e. processing that occurs after the writer / speaker has started to output the word). This thesis examines factors that affect the time-course of production of sub-word letter strings. The first series of experiments examined letter-chunking in typewriting. Participants repeatedly typed short letter-stings, manipulated for trigram and bigram frequency. Onset latency was shorter for high frequency bigrams and trigrams relative to low-frequency controls. Latencies were also shorter for the second keystroke in higher frequency bigrams. These findings can be interpreted as providing strong evidence that: (1) higher levels of processing are not limited to preparing individual letters when familiar words are not available; (2) stored motor plans are available for frequently used bigrams. The second series of experiments addressed whether phonology affects within-word typewriting time-course. Participants typed letter strings designed to elicit resyllabification – the adjustment of syllable structure across a word boundary to aid speech articulation (see Levelt, Roelofs, & Meyer, 1999). For example, "bent inwards" is articulated with /tin/ as the second syllable. Participants typed word pairs in which consonant-vowel structure was manipulated across the word boundary such that if the words were articulated (including internally as inner speech) resyllabification would or would not occur. Latency of the consonant immediately before the word boundary in the resyllabification condition was shorter than in the control condition. Conversely, keystroke latencies after the word boundary were longer in the resyllabification condition. This is evidence of inner speech influencing the timing of motor production. The time-course of typewriting is influenced by sub-word processing units – production is facilitated for high-frequency letter combinations – but that motor processing after word output is not, contrary to some current theory, informationally encapsulated, but instead affected by concurrent, non-motor processing

Nottingham Trent Institutional Repository (IRep)

Multiple Contributions to Interactive Transcription and Translation of Old Text Documents

Author: Serrano Martínez-Santos Nicolás
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 22/07/2011
Field of study

There are huge historical document collections residing in libraries, museums and archives that are currently being digitized for preservation purposes and to make them available worldwide through large, on-line digital libraries. The main objective, however, is not to simply provide access to raw images of digitized documents, but to annotate them with their real informative content and, in particular, with text transcriptions and, if convenient, text translations too. This work aims at contributing to the development of advanced techniques and interfaces for the analysis, transcription and translation of images of old archive documents, following an interactive-predictive approach.Serrano Martínez-Santos, N. (2009). Multiple Contributions to Interactive Transcription and Translation of Old Text Documents. http://hdl.handle.net/10251/11272Archivo delegad

RiuNet

Sentiment Analysis of Microblogs Using Multilayer Feed-Forward Artificial Neural Networks

Author: Despotovic Vladimir
Tanikic Dejan
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 19/12/2017
Field of study

Sentiment analysis aims to extract public opinion on a particular topic and microblogs, especially Twitter as the most influential platform, represent a significant source of information. The application to microblogs has to cope with difficulties, such as informal language with abbreviations, internet jargons, emoticons, hashtags that do not appear in conventional text documents. Sentiment analysis technique for microblogs based on a feed-forward artificial neural network (ANN) with sigmoid activation function is proposed in this paper and compared to machine learning approaches, i.e. Multinomial Naive Bayes, Support Vector Machines and Maximum Entropy. Experiments were performed on Stanford Twitter Sentiment corpus, a balanced dataset which contains noisy training labels weakly annotated using emoticons as sentiment indicators; and SemEval-2014 Task 9 corpus, an unbalanced dataset which contains manually annotated training examples. The obtained results show that ANN produces superior or at least comparable results to state-of-the-art machine learning techniques

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)