2,663 research outputs found

    Radical Recognition in Off-Line Handwritten Chinese Characters Using Non-Negative Matrix Factorization

    Get PDF
    In the past decade, handwritten Chinese character recognition has received renewed interest with the emergence of touch screen devices. Other popular applications include on-line Chinese character dictionary look-up and visual translation in mobile phone applications. Due to the complex structure of Chinese characters, this classification task is not exactly an easy one, as it involves knowledge from mathematics, computer science, and linguistics. Given a large image database of handwritten character data, the goal of my senior project is to use Non-Negative Matrix Factorization (NMF), a recent method for finding a suitable representation (parts-based representation) of image data, to detect specific sub-components in Chinese characters. NMF has only been applied to typed (printed) Chinese characters in different fonts. This project focuses specifically on how well NMF works on handwritten characters. In addition, research in Chinese character classification has mainly been done using holistic approaches - treating each character as an inseparable unit. By using NMF, this project takes a different approach by focusing on a more specific problem in Chinese character classification: radical (sub-component) detection. Finally, a possible application of radical detection will be proposed. This interactive application can potentially help Chinese language learners better recognize characters by radicals

    Instantiating deformable models with a neural net

    Get PDF
    Deformable models are an attractive approach to recognizing objects which have considerable within-class variability such as handwritten characters. However, there are severe search problems associated with fitting the models to data which could be reduced if a better starting point for the search were available. We show that by training a neural network to predict how a deformable model should be instantiated from an input image, such improved starting points can be obtained. This method has been implemented for a system that recognizes handwritten digits using deformable models, and the results show that the search time can be significantly reduced without compromising recognition performance. © 1997 Academic Press

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Using generative models for handwritten digit recognition

    Get PDF
    We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian ``ink generators'' spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the Expectation Maximization (EM) algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages. (1) After identifying the model most likely to have generated the data, the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style. (2) During the process of explaining the image, generative models can perform recognition driven segmentation. (3) The method involves a relatively small number of parameters and hence training is relatively easy and fast. (4) Unlike many other recognition schemes it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is it requires much more computation than more standard OCR techniques

    A New Approach to Synthetic Image Evaluation

    Get PDF
    This study is dedicated to enhancing the effectiveness of Optical Character Recognition (OCR) systems, with a special emphasis on Arabic handwritten digit recognition. The choice to focus on Arabic handwritten digits is twofold: first, there has been relatively less research conducted in this area compared to its English counterparts; second, the recognition of Arabic handwritten digits presents more challenges due to the inherent similarities between different Arabic digits.OCR systems, engineered to decipher both printed and handwritten text, often face difficulties in accurately identifying low-quality or distorted handwritten text. The quality of the input image and the complexity of the text significantly influence their performance. However, data augmentation strategies can notably improve these systems\u27 performance. These strategies generate new images that closely resemble the original ones, albeit with minor variations, thereby enriching the model\u27s learning and enhancing its adaptability. The research found Conditional Variational Autoencoders (C-VAE) and Conditional Generative Adversarial Networks (C-GAN) to be particularly effective in this context. These two generative models stand out due to their superior image generation and feature extraction capabilities. A significant contribution of the study has been the formulation of the Synthetic Image Evaluation Procedure, a systematic approach designed to evaluate and amplify the generative models\u27 image generation abilities. This procedure facilitates the extraction of meaningful features, computation of the Fréchet Inception Distance (LFID) score, and supports hyper-parameter optimization and model modifications

    Knowledge Elicitation in Deep Learning Models

    Get PDF
    Embora o aprendizado profundo (mais conhecido como deep learning) tenha se tornado uma ferramenta popular na solução de problemas modernos em vários domínios, ele apresenta um desafio significativo - a interpretabilidade. Esta tese percorre um cenário de elicitação de conhecimento em modelos de deep learning, lançando luz sobre a visualização de características, mapas de saliência e técnicas de destilação. Estas técnicas foram aplicadas a duas arquiteturas: redes neurais convolucionais (CNNs) e um modelo de pacote (Google Vision). A nossa investigação forneceu informações valiosas sobre a sua eficácia na elicitação e interpretação do conhecimento codificado. Embora tenham demonstrado potencial, também foram observadas limitações, sugerindo espaço para mais desenvolvimento neste campo. Este trabalho não só realça a necessidade de modelos de deep learning mais transparentes e explicáveis, como também impulsiona o desenvolvimento de técnicas para extrair conhecimento. Trata-se de garantir uma implementação responsável e enfatizar a importância da transparência e compreensão no aprendizado de máquina. Além de avaliar os métodos existentes, esta tese explora também o potencial de combinar múltiplas técnicas para melhorar a interpretabilidade dos modelos de deep learning. Uma mistura de visualização de características, mapas de saliência e técnicas de destilação de modelos foi usada de uma maneira complementar para extrair e interpretar o conhecimento das arquiteturas escolhidas. Os resultados experimentais destacam a utilidade desta abordagem combinada, revelando uma compreensão mais abrangente dos processos de tomada de decisão dos modelos. Além disso, propomos um novo modelo para a elicitação sistemática de conhecimento em deep learning, que integra de forma coesa estes métodos. Este quadro demonstra o valor de uma abordagem holística para a interpretabilidade do modelo, em vez de se basear num único método. Por fim, discutimos as implicações éticas do nosso trabalho. À medida que os modelos de deep learning continuam a permear vários setores, desde a saúde até às finanças, garantir que as suas decisões são explicáveis e justificadas torna-se cada vez mais crucial. A nossa investigação sublinha esta importância, preparando o terreno para a criação de sistemas de inteligência artificial mais transparentes e responsáveis no futuro.Though a buzzword in modern problem-solving across various domains, deep learning presents a significant challenge - interpretability. This thesis journeys through a landscape of knowledge elicitation in deep learning models, shedding light on feature visualization, saliency maps, and model distillation techniques. These techniques were applied to two deep learning architectures: convolutional neural networks (CNNs) and a black box package model (Google Vision). Our investigation provided valuable insights into their effectiveness in eliciting and interpreting the encoded knowledge. While they demonstrated potential, limitations were also observed, suggesting room for further development in this field. This work does not just highlight the need for more transparent, more explainable deep learning models, it gives a gentle nudge to developing innovative techniques to extract knowledge. It is all about ensuring responsible deployment and emphasizing the importance of transparency and comprehension in machine learning. In addition to evaluating existing methods, this thesis also explores the potential for combining multiple techniques to enhance the interpretability of deep learning models. A blend of feature visualization, saliency maps, and model distillation techniques was used in a complementary manner to extract and interpret the knowledge from our chosen architectures. Experimental results highlight the utility of this combined approach, revealing a more comprehensive understanding of the models' decision-making processes. Furthermore, we propose a novel framework for systematic knowledge elicitation in deep learning, which cohesively integrates these methods. This framework showcases the value of a holistic approach toward model interpretability rather than relying on a single method. Lastly, we discuss the ethical implications of our work. As deep learning models continue to permeate various sectors, from healthcare to finance, ensuring their decisions are explainable and justified becomes increasingly crucial. Our research underscores this importance, laying the groundwork for creating more transparent, accountable AI systems in the future
    corecore