1,594 research outputs found
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
Connectionist models and figurative speech
This paper contains an introduction to connectionist models. Then we focus on the question of how novel figurative usages of descriptive adjectives may be interpreted in a structured connectionist model of conceptual combination. The suggestion is that inferences drawn from an adjective\u27s use in familiar contexts form the basis for all possible interpretations of the adjective in a novel context. The more plausible of the possibilities, it is speculated, are reinforced by some form of one-shot learning, rendering the interpretative process obsolete after only one (memorable) encounter with a novel figure of speech
A Binary Neural Shape Matcher using Johnson Counters and Chain Codes
In this paper, we introduce a neural network-based shape matching algorithm that uses Johnson Counter codes coupled with chain codes. Shape matching is a fundamental requirement in content-based image retrieval systems. Chain codes describe shapes using sequences of numbers. They are simple and flexible. We couple this power with the efficiency and flexibility of a binary associative-memory neural network. We focus on the implementation details of the algorithm when it is constructed using the neural network. We demonstrate how the binary associative-memory neural network can index and match chain codes where the chain code elements are represented by Johnson codes
Hypothesis-driven Online Video Stream Learning with Augmented Memory
The ability to continuously acquire new knowledge without forgetting previous
tasks remains a challenging problem for computer vision systems. Standard
continual learning benchmarks focus on learning from static iid images in an
offline setting. Here, we examine a more challenging and realistic online
continual learning problem called online stream learning. Like humans, some AI
agents have to learn incrementally from a continuous temporal stream of
non-repeating data. We propose a novel model, Hypotheses-driven Augmented
Memory Network (HAMN), which efficiently consolidates previous knowledge using
an augmented memory matrix of "hypotheses" and replays reconstructed image
features to avoid catastrophic forgetting. Compared with pixel-level and
generative replay approaches, the advantages of HAMN are two-fold. First,
hypothesis-based knowledge consolidation avoids redundant information in the
image pixel space and makes memory usage far more efficient. Second, hypotheses
in the augmented memory can be re-used for learning new tasks, improving
generalization and transfer learning ability. Given a lack of online
incremental class learning datasets on video streams, we introduce and adapt
two additional video datasets, Toybox and iLab, for online stream learning. We
also evaluate our method on the CORe50 and online CIFAR100 datasets. Our method
performs significantly better than all state-of-the-art methods, while offering
much more efficient memory usage. All source code and data are publicly
available at https://github.com/kreimanlab/AugMe
- …