1,594 research outputs found

    Unconstrained Scene Text and Video Text Recognition for Arabic Script

    Full text link
    Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesising millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNNs to model contextual dependencies yields superior recognition results.Comment: 5 page

    Connectionist models and figurative speech

    Get PDF
    This paper contains an introduction to connectionist models. Then we focus on the question of how novel figurative usages of descriptive adjectives may be interpreted in a structured connectionist model of conceptual combination. The suggestion is that inferences drawn from an adjective\u27s use in familiar contexts form the basis for all possible interpretations of the adjective in a novel context. The more plausible of the possibilities, it is speculated, are reinforced by some form of one-shot learning, rendering the interpretative process obsolete after only one (memorable) encounter with a novel figure of speech

    A Binary Neural Shape Matcher using Johnson Counters and Chain Codes

    Get PDF
    In this paper, we introduce a neural network-based shape matching algorithm that uses Johnson Counter codes coupled with chain codes. Shape matching is a fundamental requirement in content-based image retrieval systems. Chain codes describe shapes using sequences of numbers. They are simple and flexible. We couple this power with the efficiency and flexibility of a binary associative-memory neural network. We focus on the implementation details of the algorithm when it is constructed using the neural network. We demonstrate how the binary associative-memory neural network can index and match chain codes where the chain code elements are represented by Johnson codes

    Hypothesis-driven Online Video Stream Learning with Augmented Memory

    Full text link
    The ability to continuously acquire new knowledge without forgetting previous tasks remains a challenging problem for computer vision systems. Standard continual learning benchmarks focus on learning from static iid images in an offline setting. Here, we examine a more challenging and realistic online continual learning problem called online stream learning. Like humans, some AI agents have to learn incrementally from a continuous temporal stream of non-repeating data. We propose a novel model, Hypotheses-driven Augmented Memory Network (HAMN), which efficiently consolidates previous knowledge using an augmented memory matrix of "hypotheses" and replays reconstructed image features to avoid catastrophic forgetting. Compared with pixel-level and generative replay approaches, the advantages of HAMN are two-fold. First, hypothesis-based knowledge consolidation avoids redundant information in the image pixel space and makes memory usage far more efficient. Second, hypotheses in the augmented memory can be re-used for learning new tasks, improving generalization and transfer learning ability. Given a lack of online incremental class learning datasets on video streams, we introduce and adapt two additional video datasets, Toybox and iLab, for online stream learning. We also evaluate our method on the CORe50 and online CIFAR100 datasets. Our method performs significantly better than all state-of-the-art methods, while offering much more efficient memory usage. All source code and data are publicly available at https://github.com/kreimanlab/AugMe
    corecore