1,858 research outputs found

    Clock drawing test digit recognition using static and dynamic features

    Get PDF
    The clock drawing test (CDT) is a standard neurological test for detection of cognitive impairment. A computerised version of the test promises to improve the accessibility of the test in addition to obtaining more detailed data about the subject's performance. Automatic handwriting recognition is one of the first stages in the analysis of the computerised test, which produces a set of recognized digits and symbols together with their positions on the clock face. Subsequently, these are used in the test scoring. This is a challenging problem because the average CDT taker has a high likelihood of cognitive impairment, and writing is one of the first functional activities to be affected. Current handwritten digit recognition system perform less well on this kind of data due to its unintelligibility. In this paper, a new system for numeral handwriting recognition in the CDT is proposed. The system is based on two complementary sources of data, namely static and dynamic features extracted from handwritten data. The main novelty of this paper is the new handwriting digit recognition system, which combines two classifiers—fuzzy k-nearest neighbour for dynamic stroke-based features and convolutional neural network for static image- based features, which can take advantage of both static and dynamic data. The proposed digit recognition system is tested on two sets of data: first, Pendigits online handwriting digits; and second, digits from the actual CDTs. The latter data set came from 65 drawings made by healthy people and 100 drawings reproduced from the drawings by dementia patients. The test on both data sets shows that the proposed combination system can outperform each classifier individually in terms of recognition accuracy, especially when assessing the handwriting of people with dementi

    SAFE: Scale Aware Feature Encoder for Scene Text Recognition

    Full text link
    In this paper, we address the problem of having characters with different scales in scene text recognition. We propose a novel scale aware feature encoder (SAFE) that is designed specifically for encoding characters with different scales. SAFE is composed of a multi-scale convolutional encoder and a scale attention network. The multi-scale convolutional encoder targets at extracting character features under multiple scales, and the scale attention network is responsible for selecting features from the most relevant scale(s). SAFE has two main advantages over the traditional single-CNN encoder used in current state-of-the-art text recognizers. First, it explicitly tackles the scale problem by extracting scale-invariant features from the characters. This allows the recognizer to put more effort in handling other challenges in scene text recognition, like those caused by view distortion and poor image quality. Second, it can transfer the learning of feature encoding across different character scales. This is particularly important when the training set has a very unbalanced distribution of character scales, as training with such a dataset will make the encoder biased towards extracting features from the predominant scale. To evaluate the effectiveness of SAFE, we design a simple text recognizer named scale-spatial attention network (S-SAN) that employs SAFE as its feature encoder, and carry out experiments on six public benchmarks. Experimental results demonstrate that S-SAN can achieve state-of-the-art (or, in some cases, extremely competitive) performance without any post-processing.Comment: ACCV201

    DTW-Radon-based Shape Descriptor for Pattern Recognition

    Get PDF
    International audienceIn this paper, we present a pattern recognition method that uses dynamic programming (DP) for the alignment of Radon features. The key characteristic of the method is to use dynamic time warping (DTW) to match corresponding pairs of the Radon features for all possible projections. Thanks to DTW, we avoid compressing the feature matrix into a single vector which would otherwise miss information. To reduce the possible number of matchings, we rely on a initial normalisation based on the pattern orientation. A comprehensive study is made using major state-of-the-art shape descriptors over several public datasets of shapes such as graphical symbols (both printed and hand-drawn), handwritten characters and footwear prints. In all tests, the method proves its generic behaviour by providing better recognition performance. Overall, we validate that our method is robust to deformed shape due to distortion, degradation and occlusion

    Parking lot monitoring system using an autonomous quadrotor UAV

    Get PDF
    The main goal of this thesis is to develop a drone-based parking lot monitoring system using low-cost hardware and open-source software. Similar to wall-mounted surveillance cameras, a drone-based system can monitor parking lots without affecting the flow of traffic while also offering the mobility of patrol vehicles. The Parrot AR Drone 2.0 is the quadrotor drone used in this work due to its modularity and cost efficiency. Video and navigation data (including GPS) are communicated to a host computer using a Wi-Fi connection. The host computer analyzes navigation data using a custom flight control loop to determine control commands to be sent to the drone. A new license plate recognition pipeline is used to identify license plates of vehicles from video received from the drone

    A New Approach to Synthetic Image Evaluation

    Get PDF
    This study is dedicated to enhancing the effectiveness of Optical Character Recognition (OCR) systems, with a special emphasis on Arabic handwritten digit recognition. The choice to focus on Arabic handwritten digits is twofold: first, there has been relatively less research conducted in this area compared to its English counterparts; second, the recognition of Arabic handwritten digits presents more challenges due to the inherent similarities between different Arabic digits.OCR systems, engineered to decipher both printed and handwritten text, often face difficulties in accurately identifying low-quality or distorted handwritten text. The quality of the input image and the complexity of the text significantly influence their performance. However, data augmentation strategies can notably improve these systems\u27 performance. These strategies generate new images that closely resemble the original ones, albeit with minor variations, thereby enriching the model\u27s learning and enhancing its adaptability. The research found Conditional Variational Autoencoders (C-VAE) and Conditional Generative Adversarial Networks (C-GAN) to be particularly effective in this context. These two generative models stand out due to their superior image generation and feature extraction capabilities. A significant contribution of the study has been the formulation of the Synthetic Image Evaluation Procedure, a systematic approach designed to evaluate and amplify the generative models\u27 image generation abilities. This procedure facilitates the extraction of meaningful features, computation of the Fréchet Inception Distance (LFID) score, and supports hyper-parameter optimization and model modifications

    Fuzzy ARTMAP: A Neural Network Architecture for Incremental Supervised Learning of Analog Multidimensional Maps

    Full text link
    A new neural network architecture is introduced for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, resonance, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units", to met accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator (Λ) and the MAX operator (v) of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights correspond to increasing sizes of category "boxes". Smaller vigilance values lead to larger category boxes. Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Four classes of simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; and (iv) a letter recognition database. The Fuzzy ARTMAP system is also compared to Salzberg's NGE system and to Simpson's FMMC system.British Petroleum (89-A-1204); Defense Advanced Research Projects Agency (90-0083); National Science Foundation (IRI 90-00530); Office of Naval Research (N00014-91-J-4100); Air Force Office of Scientific Research (90-0175

    Pattern Recognition

    Get PDF
    A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

    An Analysis on Adversarial Machine Learning: Methods and Applications

    Get PDF
    Deep learning has witnessed astonishing advancement in the last decade and revolutionized many fields ranging from computer vision to natural language processing. A prominent field of research that enabled such achievements is adversarial learning, investigating the behavior and functionality of a learning model in presence of an adversary. Adversarial learning consists of two major trends. The first trend analyzes the susceptibility of machine learning models to manipulation in the decision-making process and aims to improve the robustness to such manipulations. The second trend exploits adversarial games between components of the model to enhance the learning process. This dissertation aims to provide an analysis on these two sides of adversarial learning and harness their potential for improving the robustness and generalization of deep models. In the first part of the dissertation, we study the adversarial susceptibility of deep learning models. We provide an empirical analysis on the extent of vulnerability by proposing two adversarial attacks that explore the geometric and frequency-domain characteristics of inputs to manipulate deep decisions. Afterward, we formalize the susceptibility of deep networks using the first-order approximation of the predictions and extend the theory to the ensemble classification scheme. Inspired by theoretical findings, we formalize a reliable and practical defense against adversarial examples to robustify ensembles. We extend this part by investigating the shortcomings of \gls{at} and highlight that the popular momentum stochastic gradient descent, developed essentially for natural training, is not proper for optimization in adversarial training since it is not designed to be robust against the chaotic behavior of gradients in this setup. Motivated by these observations, we develop an optimization method that is more suitable for adversarial training. In the second part of the dissertation, we harness adversarial learning to enhance the generalization and performance of deep networks in discriminative and generative tasks. We develop several models for biometric identification including fingerprint distortion rectification and latent fingerprint reconstruction. In particular, we develop a ridge reconstruction model based on generative adversarial networks that estimates the missing ridge information in latent fingerprints. We introduce a novel modification that enables the generator network to preserve the ID information during the reconstruction process. To address the scarcity of data, {\it e.g.}, in latent fingerprint analysis, we develop a supervised augmentation technique that combines input examples based on their salient regions. Our findings advocate that adversarial learning improves the performance and reliability of deep networks in a wide range of applications
    • …
    corecore