Search CORE

12 research outputs found

Large vocabulary recognition for online Turkish handwriting with sublexical units

Author: Bilgin Taşdemir Esma Fatıma
Yanıkoğlu Berrin
Publication venue: 'The Scientific and Technological Research Council of Turkey'
Publication date: 28/09/2018
Field of study

We present a system for large vocabulary recognition of online Turkish handwriting, using hidden Markov models. While using a traditional approach for the recognizer, we have identified and developed solutions for the main problems specific to Turkish handwriting recognition. First, since large amounts of Turkish handwriting samples are not available, the system is trained and optimized using the large UNIPEN dataset of English handwriting, before extending it to Turkish using a small Turkish dataset. The delayed strokes, which pose a significant source of variation in writing order due to the large number of diacritical marks in Turkish, are removed during preprocessing. Finally, as a solution to the high out-of-vocabulary rates encountered when using a fixed size lexicon in general purpose recognition, a lexicon is constructed from sublexical units (stems and endings) learned from a large Turkish corpus. A statistical bigram language model learned from the same corpus is also applied during the decoding process. The system obtains a 91.7% word recognition rate when tested on a small Turkish handwritten word dataset using a medium sized (1950 words) lexicon corresponding to the vocabulary of the test set and 63.8% using a large, general purpose lexicon (130,000 words). However, with the proposed stem+ending lexicon (12,500 words) and bigram language model with lattice expansion, a 67.9% word recognition accuracy is obtained, surpassing the results obtained with the general purpose lexicon while using a much smaller one

Sabanci University Research Database

Pattern Recognition for Command and Control Data Systems

Author: Schwier Jason
Publication venue: Clemson University Libraries
Publication date: 01/08/2009
Field of study

To analyze real-world events, researchers collect observation data from an underlying process and construct models to represent the observed situation. In this work, we consider issues that affect the construction and usage of a specific type of model. Markov models are commonly used because their combination of discrete states and stochastic transitions is suited to applications with both deterministic and stochastic components. Hidden Markov Models (HMMs) are a class of Markov model commonly used in pattern recognition. We first demonstrate how to construct HMMs using only the observation data, and no a priori information, by extending a previously developed approach from J.P. Crutchfield and C.R. Shalizi. We also show how to determine with a level of statistical confidence whether or not the model fully encapsulates the underlying process. Once models are constructed from observation data, the models are used to identify other types of observations. Traditional approaches consider the maximum likelihood that the model matches the observation, solving a classification problem. We present a new method using confidence intervals and receiver operating characteristic curves. Our method solves a detection problem by determining if observation data matches zero, one, or more than one model. To detect the occurrence of a behavior in observation data, one must consider the amount of data required. We consider behaviors to be \u27serial Markovian,\u27 when the behavior can change from one model to another at any time. When analyzing observation data, considering too much data induces high delay and could lead to confusion in the system if multiple behaviors are observed in the data stream. If too little data is used, the system has a high false positive rate and is unable to correctly detect behaviors. We demonstrate the effectiveness of all methods using illustrative examples and consumer behavior data

Clemson University: TigerPrints

Reconnaissance de l'écriture manuscrite en-ligne par approche combinant systèmes à vastes marges et modèles de Markov cachés

Author: Ahmad Abdul Rahim
Publication venue: HAL CCSD
Publication date: 29/12/2008
Field of study

Handwriting recognition is one of the leading applications of pattern recognition and machine learning. Despite having some limitations, handwriting recognition systems have been used as an input method of many electronic devices and helps in the automation of many manual tasks requiring processing of handwriting images. In general, a handwriting recognition system comprises three functional components; preprocessing, recognition and post-processing. There have been improvements made within each component in the system. However, to further open the avenues of expanding its applications, specific improvements need to be made in the recognition capability of the system. Hidden Markov Model (HMM) has been the dominant methods of recognition in handwriting recognition in offline and online systems. However, the use of Gaussian observation densities in HMM and representational model for word modeling often does not lead to good classification. Hybrid of Neural Network (NN) and HMM later improves word recognition by taking advantage of NN discriminative property and HMM representational capability. However, the use of NN does not optimize recognition capability as the use of Empirical Risk minimization (ERM) principle in its training leads to poor generalization. In this thesis, we focus on improving the recognition capability of a cursive online handwritten word recognition system by using an emerging method in machine learning, the support vector machine (SVM). We first evaluated SVM in isolated character recognition environment using IRONOFF and UNIPEN character databases. SVM, by its use of principle of structural risk minimization (SRM) have allowed simultaneous optimization of representational and discriminative capability of the character recognizer. We finally demonstrate the various practical issues in using SVM within a hybrid setting with HMM. In addition, we tested the hybrid system on the IRONOFF word database and obtained favourable results.Nos travaux concernent la reconnaissance de l'écriture manuscrite qui est l'un des domaines de prédilection pour la reconnaissance des formes et les algorithmes d'apprentissage. Dans le domaine de l'écriture en-ligne, les applications concernent tous les dispositifs de saisie permettant à un usager de communiquer de façon transparente avec les systèmes d'information. Dans ce cadre, nos travaux apportent une contribution pour proposer une nouvelle architecture de reconnaissance de mots manuscrits sans contrainte de style. Celle-ci se situe dans la famille des approches hybrides locale/globale où le paradigme de la segmentation/reconnaissance va se trouver résolu par la complémentarité d'un système de reconnaissance de type discriminant agissant au niveau caractère et d'un système par approche modèle pour superviser le niveau global. Nos choix se sont portés sur des Séparateurs à Vastes Marges (SVM) pour le classifieur de caractères et sur des algorithmes de programmation dynamique, issus d'une modélisation par Modèles de Markov Cachés (HMM). Cette combinaison SVM/HMM est unique dans le domaine de la reconnaissance de l'écriture manuscrite. Des expérimentations ont été menées, d'abord dans un cadre de reconnaissance de caractères isolés puis sur la base IRONOFF de mots cursifs. Elles ont montré la supériorité des approches SVM par rapport aux solutions à bases de réseaux de neurones à convolutions (Time Delay Neural Network) que nous avions développées précédemment, et leur bon comportement en situation de reconnaissance de mots

Data clustering using the Bees Algorithm and the Kd-tree structure

Author: Al-Jabbouli Hasan
Publication venue
Publication date: 01/01/2009
Field of study

Data clustering has been studied intensively during the past decade. The K-means and C-means algorithms are the most popular of clustering techniques. The former algorithm is suitable for 'crisp' clustering and the latter, for 'fuzzy' clustering. Clustering using the K-means or C-means algorithms generally is fast and produces good results. Although these algorithms have been successfully implemented in several areas, they still have a number of limitations. The main aim of this work is to develop flexible data management strategies to address some of those limitations and improve the performance of the algorithms. The first part of the thesis introduces improvements to the K-means algorithm. A flexible data structure was applied to help the algorithm to find stable results and to decrease the number of nearest neighbour queries needed to assign data points to clusters. The method has overcome most of the deficiencies of the K-means algorithm. The second and third parts of the thesis present two new clustering algorithms that are capable of locating near optimal solutions efficiently. The proposed algorithms combine the simplicity of the K-means algorithm and the C-means algorithm with the capability of a new optimisation method called the Bees Algorithm to avoid local optima in crisp and fuzzy clustering, respectively. Experimental results for different data sets have demonstrated that the new clustering algorithms produce better performances than those of other algorithms based upon combining an evolutionary optimisation tool and the K-means and C-means clustering methods. The fourth part of this thesis presents an improvement to the basic Bees Algorithm by applying the concept of recursion to reduce the randomness of its local search procedure. The improved Bees Algorithm was applied to crisp and fuzzy data clustering of several data sets. The results obtained confirm the superior performance of the new algorithm.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

Archivio Ricerca Ca'Foscari

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

OpenGrey Repository