14,526 research outputs found
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
A deep learning approach to 3D segmentation of brain vasculature
The segmentation of blood-vessels is an important preprocessing step for the quantitative analysis of brain vasculature. We approach the segmentation task for two-photon brain angiograms using a fully convolutional 3D deep neural network.Published versio
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
The alignment of heterogeneous sequential data (video to text) is an
important and challenging problem. Standard techniques for this task, including
Dynamic Time Warping (DTW) and Conditional Random Fields (CRFs), suffer from
inherent drawbacks. Mainly, the Markov assumption implies that, given the
immediate past, future alignment decisions are independent of further history.
The separation between similarity computation and alignment decision also
prevents end-to-end training. In this paper, we propose an end-to-end neural
architecture where alignment actions are implemented as moving data between
stacks of Long Short-term Memory (LSTM) blocks. This flexible architecture
supports a large variety of alignment tasks, including one-to-one, one-to-many,
skipping unmatched elements, and (with extensions) non-monotonic alignment.
Extensive experiments on semi-synthetic and real datasets show that our
algorithm outperforms state-of-the-art baselines.Comment: Accepted at CVPR 2018 (Spotlight). arXiv file includes the paper and
the supplemental materia
Deep Divergence-Based Approach to Clustering
A promising direction in deep learning research consists in learning
representations and simultaneously discovering cluster structure in unlabeled
data by optimizing a discriminative loss function. As opposed to supervised
deep learning, this line of research is in its infancy, and how to design and
optimize suitable loss functions to train deep neural networks for clustering
is still an open question. Our contribution to this emerging field is a new
deep clustering network that leverages the discriminative power of
information-theoretic divergence measures, which have been shown to be
effective in traditional clustering. We propose a novel loss function that
incorporates geometric regularization constraints, thus avoiding degenerate
structures of the resulting clustering partition. Experiments on synthetic
benchmarks and real datasets show that the proposed network achieves
competitive performance with respect to other state-of-the-art methods, scales
well to large datasets, and does not require pre-training steps
- …