887 research outputs found

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Automatic segmentation and recognition system for handwritten dates on cheques

    Get PDF
    This thesis presents the first automatic date processing system developed on a Canadian real-life standard cheque database. This system can process unconstrained handwritten dates written in English or in French, and it can also be applied to the recognition of any handwritten dates with similar format on many other kinds of documents. A knowledge-based module has been proposed for the date segmentation and a new cursive month word recognition system has also been implemented based on a combination of classifiers. The interaction between the segmentation and recognition stages has been properly established by using a multi-hypotheses generation and evaluation module. In addition, a verification module with two levels is designed in the postprocessing stage to correct some errors and reject invalid results, which further improves the reliability of the system. The segmentation of the date zone can be implemented in the knowledge-based segmentation module, the multi-hypotheses generation and evaluation module, or the verification module. An effective neural network ensemble system is proposed in this knowledge extraction stage to differentiate handwritten alphabetic words from numeric strings (A/N). We investigate the use of effective features extensively, and propose several new methods in the design of neural networks, creation of neural network ensembles, and combination methods for the ensembles created. For date recognition, the new cursive month word recognizer is implemented by combining a Hidden Markov Model classifier (HMM) with two Multi-Layer Perceptron (MLP) classifier

    Towards robust real-world historical handwriting recognition

    Get PDF
    In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data
    • …
    corecore