5 research outputs found
MIDV-2019: Challenges of the modern mobile-based document OCR
Recognition of identity documents using mobile devices has become a topic of
a wide range of computer vision research. The portfolio of methods and
algorithms for solving such tasks as face detection, document detection and
rectification, text field recognition, and other, is growing, and the scarcity
of datasets has become an important issue. One of the openly accessible
datasets for evaluating such methods is MIDV-500, containing video clips of 50
identity document types in various conditions. However, the variability of
capturing conditions in MIDV-500 did not address some of the key issues, mainly
significant projective distortions and different lighting conditions. In this
paper we present a MIDV-2019 dataset, containing video clips shot with modern
high-resolution mobile cameras, with strong projective distortions and with low
lighting conditions. The description of the added data is presented, and
experimental baselines for text field recognition in different conditions. The
dataset is available for download at
ftp://smartengines.com/midv-500/extra/midv-2019/.Comment: 6 pages, 3 figures, 3 tables, 18 references, submitted and accepted
to the 12th International Conference on Machine Vision (ICMV 2019
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
With the rise of artificial intelligence in recent years, Deep Neural
Networks (DNNs) have been widely used in many domains. To achieve high
performance and energy efficiency, hardware acceleration (especially inference)
of DNNs is intensively studied both in academia and industry. However, we still
face two challenges: large DNN models and datasets, which incur frequent
off-chip memory accesses; and the training of DNNs, which is not well-explored
in recent accelerator designs. To truly provide high throughput and energy
efficient acceleration for the training of deep and large models, we inevitably
need to use multiple accelerators to explore the coarse-grain parallelism,
compared to the fine-grain parallelism inside a layer considered in most of the
existing architectures. It poses the key research question to seek the best
organization of computation and dataflow among accelerators. In this paper, we
propose a solution HyPar to determine layer-wise parallelism for deep neural
network training with an array of DNN accelerators. HyPar partitions the
feature map tensors (input and output), the kernel tensors, the gradient
tensors, and the error tensors for the DNN accelerators. A partition
constitutes the choice of parallelism for weighted layers. The optimization
target is to search a partition that minimizes the total communication during
training a complete DNN. To solve this problem, we propose a communication
model to explain the source and amount of communications. Then, we use a
hierarchical layer-wise dynamic programming method to search for the partition
for each layer.Comment: To appear in the 2019 25th International Symposium on
High-Performance Computer Architecture (HPCA 2019