215 research outputs found
AON: Towards Arbitrarily-Oriented Text Recognition
Recognizing text from natural images is a hot research topic in computer
vision due to its various applications. Despite the enduring research of
several decades on optical character recognition (OCR), recognizing texts from
natural images is still a challenging task. This is because scene texts are
often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted)
arrangements, which have not yet been well addressed in the literature.
Existing methods on text recognition mainly work with regular (horizontal and
frontal) texts and cannot be trivially generalized to handle irregular texts.
In this paper, we develop the arbitrary orientation network (AON) to directly
capture the deep features of irregular texts, which are combined into an
attention-based decoder to generate character sequence. The whole network can
be trained end-to-end by using only images and word-level annotations.
Extensive experiments on various benchmarks, including the CUTE80,
SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed
AON-based method achieves the-state-of-the-art performance in irregular
datasets, and is comparable to major existing methods in regular datasets.Comment: Accepted by CVPR201
Learned Quality Enhancement via Multi-Frame Priors for HEVC Compliant Low-Delay Applications
Networked video applications, e.g., video conferencing, often suffer from
poor visual quality due to unexpected network fluctuation and limited
bandwidth. In this paper, we have developed a Quality Enhancement Network
(QENet) to reduce the video compression artifacts, leveraging the spatial and
temporal priors generated by respective multi-scale convolutions spatially and
warped temporal predictions in a recurrent fashion temporally. We have
integrated this QENet as a standard-alone post-processing subsystem to the High
Efficiency Video Coding (HEVC) compliant decoder. Experimental results show
that our QENet demonstrates the state-of-the-art performance against default
in-loop filters in HEVC and other deep learning based methods with noticeable
objective gains in Peak-Signal-to-Noise Ratio (PSNR) and subjective gains
visually
Spin gap and magnetic resonance in superconducting BaFeNiAs
We use neutron spectroscopy to determine the nature of the magnetic
excitations in superconducting BaFeNiAs ( K).
Above the excitations are gapless and centered at the commensurate
antiferromagnetic wave vector of the parent compound, while the intensity
exhibits a sinusoidal modulation along the c-axis. As the superconducting state
is entered a spin gap gradually opens, whose magnitude tracks the
-dependence of the superconducting gap observed by angle resolved
photoemission. Both the spin gap and magnetic resonance energies are
temperature \textit{and} wave vector dependent, but their ratio is the same
within uncertainties. These results suggest that the spin resonance is a
singlet-triplet excitation related to electron pairing and superconductivity.Comment: 4 pages, 4 figure
E2-AEN: End-to-End Incremental Learning with Adaptively Expandable Network
Expandable networks have demonstrated their advantages in dealing with
catastrophic forgetting problem in incremental learning. Considering that
different tasks may need different structures, recent methods design dynamic
structures adapted to different tasks via sophisticated skills. Their routine
is to search expandable structures first and then train on the new tasks,
which, however, breaks tasks into multiple training stages, leading to
suboptimal or overmuch computational cost. In this paper, we propose an
end-to-end trainable adaptively expandable network named E2-AEN, which
dynamically generates lightweight structures for new tasks without any accuracy
drop in previous tasks. Specifically, the network contains a serial of powerful
feature adapters for augmenting the previously learned representations to new
tasks, and avoiding task interference. These adapters are controlled via an
adaptive gate-based pruning strategy which decides whether the expanded
structures can be pruned, making the network structure dynamically changeable
according to the complexity of the new tasks. Moreover, we introduce a novel
sparsity-activation regularization to encourage the model to learn
discriminative features with limited parameters. E2-AEN reduces cost and can be
built upon any feed-forward architectures in an end-to-end manner. Extensive
experiments on both classification (i.e., CIFAR and VDD) and detection (i.e.,
COCO, VOC and ICCV2021 SSLAD challenge) benchmarks demonstrate the
effectiveness of the proposed method, which achieves the new remarkable
results
MANGO: A Mask Attention Guided One-Stage Scene Text Spotter
Recently end-to-end scene text spotting has become a popular research topic
due to its advantages of global optimization and high maintainability in real
applications. Most methods attempt to develop various region of interest (RoI)
operations to concatenate the detection part and the sequence recognition part
into a two-stage text spotting framework. However, in such framework, the
recognition part is highly sensitive to the detected results (e.g.), the
compactness of text contours). To address this problem, in this paper, we
propose a novel Mask AttentioN Guided One-stage text spotting framework named
MANGO, in which character sequences can be directly recognized without RoI
operation. Concretely, a position-aware mask attention module is developed to
generate attention weights on each text instance and its characters. It allows
different text instances in an image to be allocated on different feature map
channels which are further grouped as a batch of instance features. Finally, a
lightweight sequence decoder is applied to generate the character sequences. It
is worth noting that MANGO inherently adapts to arbitrary-shaped text spotting
and can be trained end-to-end with only coarse position information (e.g.),
rectangular bounding box) and text annotations. Experimental results show that
the proposed method achieves competitive and even new state-of-the-art
performance on both regular and irregular text spotting benchmarks, i.e., ICDAR
2013, ICDAR 2015, Total-Text, and SCUT-CTW1500.Comment: Accepted to AAAI2021. Code is available at
https://davar-lab.github.io/publication.html or
https://github.com/hikopensource/DAVAR-Lab-OC
Deep Active Learning for Computer Vision: Past and Future
As an important data selection schema, active learning emerges as the
essential component when iterating an Artificial Intelligence (AI) model. It
becomes even more critical given the dominance of deep neural network based
models, which are composed of a large number of parameters and data hungry, in
application. Despite its indispensable role for developing AI models, research
on active learning is not as intensive as other research directions. In this
paper, we present a review of active learning through deep active learning
approaches from the following perspectives: 1) technical advancements in active
learning, 2) applications of active learning in computer vision, 3) industrial
systems leveraging or with potential to leverage active learning for data
iteration, 4) current limitations and future research directions. We expect
this paper to clarify the significance of active learning in a modern AI model
manufacturing process and to bring additional research attention to active
learning. By addressing data automation challenges and coping with automated
machine learning systems, active learning will facilitate democratization of AI
technologies by boosting model production at scale.Comment: Accepted by APSIPA Transactions on Signal and Information Processin
- …