    How to separate between Machine-Printed/Handwritten and Arabic/Latin Words?

    This paper gathers some contributions to script and its nature identification. Different sets of features have been employed successfully for discriminating between handwritten and machine-printed Arabic and Latin scripts. They include some well established features, previously used in the literature, and new structural features which are intrinsic to Arabic and Latin scripts. The performance of such features is studied towards this paper. We also compared the performance of five classifiers: Bayes (AODEsr), k-Nearest Neighbor (k-NN), Decision Tree (J48), Support Vector Machine (SVM) and Multilayer perceptron (MLP) used to identify the script at word level. These classifiers have been chosen enough different to test the feature contributions. Experiments have been conducted with handwritten and machine-printed words, covering a wide range of fonts. Experimental results show the capability of the proposed features to capture differences between scripts and the effectiveness of the three classifiers. An average identification precision and recall rates of 98.72% was achieved, using a set of 58 features and AODEsr classifier, which is slightly better than those reported in similar works

    Arabic/Latin and Machine-printed/Handwritten Word Discrimination using HOG-based Shape Descriptor

    In this paper, we present an approach for Arabic and Latin script and its type identification based onHistogram of Oriented Gradients (HOG) descriptors. HOGs are first applied at word level based on writingorientation analysis. Then, they are extended to word image partitions to capture fine and discriminativedetails. Pyramid HOG are also used to study their effects on different observation levels of the image.Finally, co-occurrence matrices of HOG are performed to consider spatial information between pairs ofpixels which is not taken into account in basic HOG. A genetic algorithm is applied to select the potentialinformative features combinations which maximizes the classification accuracy. The output is a relativelyshort descriptor that provides an effective input to a Bayes-based classifier. Experimental results on a set ofwords, extracted from standard databases, show that our identification system is robust and provides goodword script and type identification: 99.07% of words are correctly classified

    Visual Attention in Dynamic Environments and its Application to Playing Online Games

    Abstract In this thesis we present a prototype of Cognitive Programs (CPs) - an executive controller built on top of Selective Tuning (ST) model of attention. CPs enable top-down control of visual system and interaction between the low-level vision and higher-level task demands. Abstract We implement a subset of CPs for playing online video games in real time using only visual input. Two commercial closed-source games - Canabalt and Robot Unicorn Attack - are used for evaluation. Their simple gameplay and minimal controls put the emphasis on reaction speed and attention over planning. Abstract Our implementation of Cognitive Programs plays both games at human expert level, which experimentally proves the validity of the concept. Additionally we resolved multiple theoretical and engineering issues, e.g. extending the CPs to dynamic environments, finding suitable data structures for describing the task and information flow within the network and determining the correct timing for each process

    Pattern detection and recognition using over-complete and sparse representations

    Recent research in harmonic analysis and mammalian vision systems has revealed that over-complete and sparse representations play an important role in visual information processing. The research on applying such representations to pattern recognition and detection problems has become an interesting field of study. The main contribution of this thesis is to propose two feature extraction strategies - the global strategy and the local strategy - to make use of these representations. In the global strategy, over-complete and sparse transformations are applied to the input pattern as a whole and features are extracted in the transformed domain. This strategy has been applied to the problems of rotation invariant texture classification and script identification, using the Ridgelet transform. Experimental results have shown that better performance has been achieved when compared with Gabor multi-channel filtering method and Wavelet based methods. The local strategy is divided into two stages. The first one is to analyze the local over-complete and sparse structure, where the input 2-D patterns are divided into patches and the local over-complete and sparse structure is learned from these patches using sparse approximation techniques. The second stage concerns the application of the local over-complete and sparse structure. For an object detection problem, we propose a sparsity testing technique, where a local over-complete and sparse structure is built to give sparse representations to the text patterns and non-sparse representations to other patterns. Object detection is achieved by identifying patterns that can be sparsely represented by the learned. structure. This technique has been applied. to detect texts in scene images with a recall rate of 75.23% (about 6% improvement compared with other works) and a precision rate of 67.64% (about 12% improvement). For applications like character or shape recognition, the learned over-complete and sparse structure is combined. with a Convolutional Neural Network (CNN). A second text detection method is proposed based on such a combination to further improve (about 11% higher compared with our first method based on sparsity testing) the accuracy of text detection in scene images. Finally, this method has been applied to handwritten Farsi numeral recognition, which has obtained a 99.22% recognition rate on the CENPARMI Database and a 99.5% recognition rate on the HODA Database. Meanwhile, a SVM with gradient features achieves recognition rates of 98.98% and 99.22% on these databases respectivel

    Efficient Machine Learning Methods for Document Image Analysis

    With the exponential growth in volume of multimedia content on the internet, there has been an increasing interest for developing more efficient and scalable algorithms to learn directly from data without excessive restrictions on nature of the content. In the context of document images, many large scale digitization projects have called for reliable and scalable triage methods for enhancement, segmentation, grouping and categorization of captured images. Current approaches, however, are typically limited to a specific class of documents such as scanned books, newspapers, journal articles or forms for example, and analysis and processing of more unconstrained and noisy heterogeneous document collections has not been as widely addressed. Additionally, existing machine-learning based approaches for document processing need to be carefully applied to handle the challenges associated with large and imbalanced training data. In this thesis, we address these challenges in three primary applications of document image analysis - low-level document enhancement, mid-level handwritten line segmentation, and high-level classification and retrieval. We first present a data selection method for training Support Vector Machines (SVM) on large-scale data sets. We apply the proposed approach to pixel-level document image enhancement, and show promising results with a relatively small number of training samples. Second, we present a graph-based method for segmentation of handwritten document images into text-lines which is more efficient and adaptive than previous approaches. Our approach demonstrates that combining results from local and global methods enhances the final performance of text-line segmentation. Third, we present an approach to compute structural similarities between images for classification and retrieval. Results on real-world data sets show that the approach is more effective than earlier approaches when the labeled data is limited. We extend our classification approach to a completely unsupervised setting, where both the number of classes and representative samples from each class is assumed to be unknown. We present a method for computing similarities based on learned structural patterns and correlations from the given data. Experiments with four different data sets show that our approach can estimate number of classes in large document collections and group structurally similar images with a high-accuracy

    Migrating characters: effective user guidance in instrumented environments

    The work at hand deals with the conceptual design as well as with the realization of virtual characters, which, unlike previous works in this research area, are not limited to a use in virtual worlds. The presented Migrating Character approach on the contrary allows virtual characters to act and interact with the physical world. Different technical solutions allowing a Migrating Character to move throughout physical space, either completely autonomously or in conjunction with a user, are introduced and discussed as well as resulting implications for the characters behavior. While traditional virtual characters are acting in a well defined virtual world, Migrating Characters need to adapt to changing environmental setups in a very flexible way. A Migrating Character must be capable of determining these environmental changes by means of sensors. Furthermore, based on this data, an adequate adaptation of the characters behavior has to be realized. Apart from a theoretical discussion of the necessary enhancements of a virtual character when taking the step from virtual to real worlds, different exemplary Migrating Character implementations are introduced in the course of the work.Die vorliegende Arbeit beschäftigt sich mit dem konzeptuellen Entwurf und der technischen Realisierung von virtuellen Charakteren, die im Gegensatz zu bisherigen Arbeiten auf diesem Gebiet nicht auf den Einsatz in virtuellen Welten beschränkt sind. Der vorgestellte Migrating Character Ansatz erlaubt virtuellen Charakteren vielmehr in der physikalischen Welt zu agieren und zu interagieren. Verschiedene technische Lösungen, welche es einem Migrating Character ermöglichen sich in der physikalischen Welt autonom bzw. in Abhängigkeit vom Benutzer zu bewegen, sind ebenso Gegenstand der Arbeit wie eine ausführliche Diskussion der daraus für das Verhalten des virtuellen Charakters resultierenden Implikationen. Während sich traditionelle virtuelle Charaktere in einer wohl definierten virtuellen Umgebung bewegen, muss ein Migrating Character flexibel auf sich ändernde Umgebungsbedingungen reagieren. Aus sensorischer Sicht benötigt ein Migrating Character also die Fähigkeit eine sich ändernde physikalische Situation zu erkennen. Basierend auf diesen Daten muss weiterhin eine adäquate Anpassung des Verhaltens des Migrating Characters geschehen. Neben einer theoretischen Diskussion der notwendigen Erweiterungen eines virtuellen Charakters beim übergang von virtueller zu realer Umgebung werden auch exemplarische Migrating Character Implementierungen vorgestellt