11,631 research outputs found
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
New Method for Optimization of License Plate Recognition system with Use of Edge Detection and Connected Component
License Plate recognition plays an important role on the traffic monitoring
and parking management systems. In this paper, a fast and real time method has
been proposed which has an appropriate application to find tilt and poor
quality plates. In the proposed method, at the beginning, the image is
converted into binary mode using adaptive threshold. Then, by using some edge
detection and morphology operations, plate number location has been specified.
Finally, if the plat has tilt, its tilt is removed away. This method has been
tested on another paper data set that has different images of the background,
considering distance, and angel of view so that the correct extraction rate of
plate reached at 98.66%.Comment: 3rd IEEE International Conference on Computer and Knowledge
Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi Universit
Mashha
Temporal Attention-Gated Model for Robust Sequence Classification
Typical techniques for sequence classification are designed for
well-segmented sequences which have been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
expected in real-world applications. In this paper, we present the Temporal
Attention-Gated Model (TAGM) which integrates ideas from attention models and
gated recurrent networks to better deal with noisy or unsegmented sequences.
Specifically, we extend the concept of attention model to measure the relevance
of each observation (time step) of a sequence. We then use a novel gated
recurrent network to learn the hidden representation for the final prediction.
An important advantage of our approach is interpretability since the temporal
attention weights provide a meaningful value for the salience of each time step
in the sequence. We demonstrate the merits of our TAGM approach, both for
prediction accuracy and interpretability, on three different tasks: spoken
digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201
Language Identification Using Visual Features
Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to iden- tify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discrimi- nating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artefacts, either visual (e.g. skin-tone) or audio (e.g. rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of < 10% in discriminating between Arabic and English on 19 speakers and using about 30s of visual speech
Adversarial Reprogramming of Text Classification Neural Networks
Adversarial Reprogramming has demonstrated success in utilizing pre-trained
neural network classifiers for alternative classification tasks without
modification to the original network. An adversary in such an attack scenario
trains an additive contribution to the inputs to repurpose the neural network
for the new classification task. While this reprogramming approach works for
neural networks with a continuous input space such as that of images, it is not
directly applicable to neural networks trained for tasks such as text
classification, where the input space is discrete. Repurposing such
classification networks would require the attacker to learn an adversarial
program that maps inputs from one discrete space to the other. In this work, we
introduce a context-based vocabulary remapping model to reprogram neural
networks trained on a specific sequence classification task, for a new sequence
classification task desired by the adversary. We propose training procedures
for this adversarial program in both white-box and black-box settings. We
demonstrate the application of our model by adversarially repurposing various
text-classification models including LSTM, bi-directional LSTM and CNN for
alternate classification tasks
Automatic Detection of Online Jihadist Hate Speech
We have developed a system that automatically detects online jihadist hate
speech with over 80% accuracy, by using techniques from Natural Language
Processing and Machine Learning. The system is trained on a corpus of 45,000
subversive Twitter messages collected from October 2014 to December 2016. We
present a qualitative and quantitative analysis of the jihadist rhetoric in the
corpus, examine the network of Twitter users, outline the technical procedure
used to train the system, and discuss examples of use.Comment: 31 page
- …