109,823 research outputs found

    Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition

    Get PDF
    Online handwritten Chinese text recognition (OHCTR) is a challenging problem as it involves a large-scale character set, ambiguous segmentation, and variable-length input sequences. In this paper, we exploit the outstanding capability of path signature to translate online pen-tip trajectories into informative signature feature maps using a sliding window-based method, successfully capturing the analytic and geometric properties of pen strokes with strong local invariance and robustness. A multi-spatial-context fully convolutional recurrent network (MCFCRN) is proposed to exploit the multiple spatial contexts from the signature feature maps and generate a prediction sequence while completely avoiding the difficult segmentation problem. Furthermore, an implicit language model is developed to make predictions based on semantic context within a predicting feature sequence, providing a new perspective for incorporating lexicon constraints and prior knowledge about a certain language in the recognition procedure. Experiments on two standard benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with correct rates of 97.10% and 97.15%, respectively, which are significantly better than the best result reported thus far in the literature.Comment: 14 pages, 9 figure

    Print-Scan Resilient Text Image Watermarking Based on Stroke Direction Modulation for Chinese Document Authentication

    Get PDF
    Print-scan resilient watermarking has emerged as an attractive way for document security. This paper proposes an stroke direction modulation technique for watermarking in Chinese text images. The watermark produced by the idea offers robustness to print-photocopy-scan, yet provides relatively high embedding capacity without losing the transparency. During the embedding phase, the angle of rotatable strokes are quantized to embed the bits. This requires several stages of preprocessing, including stroke generation, junction searching, rotatable stroke decision and character partition. Moreover, shuffling is applied to equalize the uneven embedding capacity. For the data detection, denoising and deskewing mechanisms are used to compensate for the distortions induced by hardcopy. Experimental results show that our technique attains high detection accuracy against distortions resulting from print-scan operations, good quality photocopies and benign attacks in accord with the future goal of soft authentication

    Learning Temporal Alignment Uncertainty for Efficient Event Detection

    Full text link
    In this paper we tackle the problem of efficient video event detection. We argue that linear detection functions should be preferred in this regard due to their scalability and efficiency during estimation and evaluation. A popular approach in this regard is to represent a sequence using a bag of words (BOW) representation due to its: (i) fixed dimensionality irrespective of the sequence length, and (ii) its ability to compactly model the statistics in the sequence. A drawback to the BOW representation, however, is the intrinsic destruction of the temporal ordering information. In this paper we propose a new representation that leverages the uncertainty in relative temporal alignments between pairs of sequences while not destroying temporal ordering. Our representation, like BOW, is of a fixed dimensionality making it easily integrated with a linear detection function. Extensive experiments on CK+, 6DMG, and UvA-NEMO databases show significant performance improvements across both isolated and continuous event detection tasks.Comment: Appeared in DICTA 2015, 8 page

    A fast 3-D object recognition algorithm for the vision system of a special-purpose dexterous manipulator

    Get PDF
    A fast 3-D object recognition algorithm that can be used as a quick-look subsystem to the vision system for the Special-Purpose Dexterous Manipulator (SPDM) is described. Global features that can be easily computed from range data are used to characterize the images of a viewer-centered model of an object. This algorithm will speed up the processing by eliminating the low level processing whenever possible. It may identify the object, reject a set of bad data in the early stage, or create a better environment for a more powerful algorithm to carry the work further

    MASCOT: a mechanism for attention-based scale-invariant object recognition in images

    Get PDF
    The efficient management of large multimedia databases requires the development of new techniques to process, characterize, and search for multimedia objects. Especially in the case of image data, the rapidly growing amount of documents prohibits a manual description of the images’ content. Instead, the automated characterization is highly desirable to support annotation and retrieval of digital images. However, this is a very complex and still unsolved task. To contribute to a solution of this problem, we have developed a mechanism for recognizing objects in images based on the query by example paradigm. Therefore, the most salient image features of an example image representing the searched object are extracted to obtain a scale-invariant object model. The use of this model provides an efficient and robust strategy for recognizing objects in images independently of their size. Further applications of the mechanism are classical recognition tasks such as scene decomposition or object tracking in video sequences
    corecore