3,991 research outputs found
Handwriting Recognition of Historical Documents with few labeled data
Historical documents present many challenges for offline handwriting
recognition systems, among them, the segmentation and labeling steps. Carefully
annotated textlines are needed to train an HTR system. In some scenarios,
transcripts are only available at the paragraph level with no text-line
information. In this work, we demonstrate how to train an HTR system with few
labeled data. Specifically, we train a deep convolutional recurrent neural
network (CRNN) system on only 10% of manually labeled text-line data from a
dataset and propose an incremental training procedure that covers the rest of
the data. Performance is further increased by augmenting the training set with
specially crafted multiscale data. We also propose a model-based normalization
scheme which considers the variability in the writing scale at the recognition
phase. We apply this approach to the publicly available READ dataset. Our
system achieved the second best result during the ICDAR2017 competition
A Very Brief Introduction to Machine Learning With Applications to Communication Systems
Given the unprecedented availability of data and computing resources, there
is widespread renewed interest in applying data-driven machine learning methods
to problems for which the development of conventional engineering solutions is
challenged by modelling or algorithmic deficiencies. This tutorial-style paper
starts by addressing the questions of why and when such techniques can be
useful. It then provides a high-level introduction to the basics of supervised
and unsupervised learning. For both supervised and unsupervised learning,
exemplifying applications to communication networks are discussed by
distinguishing tasks carried out at the edge and at the cloud segments of the
network at different layers of the protocol stack
The OS* Algorithm: a Joint Approach to Exact Optimization and Sampling
Most current sampling algorithms for high-dimensional distributions are based
on MCMC techniques and are approximate in the sense that they are valid only
asymptotically. Rejection sampling, on the other hand, produces valid samples,
but is unrealistically slow in high-dimension spaces. The OS* algorithm that we
propose is a unified approach to exact optimization and sampling, based on
incremental refinements of a functional upper bound, which combines ideas of
adaptive rejection sampling and of A* optimization search. We show that the
choice of the refinement can be done in a way that ensures tractability in
high-dimension spaces, and we present first experiments in two different
settings: inference in high-order HMMs and in large discrete graphical models.Comment: 21 page
- …