1,475 research outputs found
Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition
Handwritten mathematical expression recognition is a challenging problem due
to the complicated two-dimensional structures, ambiguous handwriting input and
variant scales of handwritten math symbols. To settle this problem, we utilize
the attention based encoder-decoder model that recognizes mathematical
expression images from two-dimensional layouts to one-dimensional LaTeX
strings. We improve the encoder by employing densely connected convolutional
networks as they can strengthen feature extraction and facilitate gradient
propagation especially on a small training set. We also present a novel
multi-scale attention model which is employed to deal with the recognition of
math symbols in different scales and save the fine-grained details that will be
dropped by pooling operations. Validated on the CROHME competition task, the
proposed method significantly outperforms the state-of-the-art methods with an
expression recognition accuracy of 52.8% on CROHME 2014 and 50.1% on CROHME
2016, by only using the official training dataset
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
To address the challenging task of instance-aware human part parsing, a new
bottom-up regime is proposed to learn category-level human semantic
segmentation as well as multi-person pose estimation in a joint and end-to-end
manner. It is a compact, efficient and powerful framework that exploits
structural information over different human granularities and eases the
difficulty of person partitioning. Specifically, a dense-to-sparse projection
field, which allows explicitly associating dense human semantics with sparse
keypoints, is learnt and progressively improved over the network feature
pyramid for robustness. Then, the difficult pixel grouping problem is cast as
an easier, multi-person joint assembling task. By formulating joint association
as maximum-weight bipartite matching, a differentiable solution is developed to
exploit projected gradient descent and Dykstra's cyclic projection algorithm.
This makes our method end-to-end trainable and allows back-propagating the
grouping error to directly supervise multi-granularity human representation
learning. This is distinguished from current bottom-up human parsers or pose
estimators which require sophisticated post-processing or heuristic greedy
algorithms. Experiments on three instance-aware human parsing datasets show
that our model outperforms other bottom-up alternatives with much more
efficient inference.Comment: CVPR 2021 (Oral). Code: https://github.com/tfzhou/MG-HumanParsin
CLCI-Net: Cross-Level fusion and Context Inference Networks for Lesion Segmentation of Chronic Stroke
Segmenting stroke lesions from T1-weighted MR images is of great value for
large-scale stroke rehabilitation neuroimaging analyses. Nevertheless, there
are great challenges with this task, such as large range of stroke lesion
scales and the tissue intensity similarity. The famous encoder-decoder
convolutional neural network, which although has made great achievements in
medical image segmentation areas, may fail to address these challenges due to
the insufficient uses of multi-scale features and context information. To
address these challenges, this paper proposes a Cross-Level fusion and Context
Inference Network (CLCI-Net) for the chronic stroke lesion segmentation from
T1-weighted MR images. Specifically, a Cross-Level feature Fusion (CLF)
strategy was developed to make full use of different scale features across
different levels; Extending Atrous Spatial Pyramid Pooling (ASPP) with CLF, we
have enriched multi-scale features to handle the different lesion sizes; In
addition, convolutional long short-term memory (ConvLSTM) is employed to infer
context information and thus capture fine structures to address the intensity
similarity issue. The proposed approach was evaluated on an open-source
dataset, the Anatomical Tracings of Lesions After Stroke (ATLAS) with the
results showing that our network outperforms five state-of-the-art methods. We
make our code and models available at https://github.com/YH0517/CLCI_Net
Deep filter banks for texture recognition, description, and segmentation
Visual textures have played a key role in image understanding because they
convey important semantics of images, and because texture representations that
pool local image descriptors in an orderless manner have had a tremendous
impact in diverse applications. In this paper we make several contributions to
texture understanding. First, instead of focusing on texture instance and
material category recognition, we propose a human-interpretable vocabulary of
texture attributes to describe common texture patterns, complemented by a new
describable texture dataset for benchmarking. Second, we look at the problem of
recognizing materials and texture attributes in realistic imaging conditions,
including when textures appear in clutter, developing corresponding benchmarks
on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic
texture representations, including bag-of-visual-words and the Fisher vectors,
in the context of deep learning and show that these have excellent efficiency
and generalization properties if the convolutional layers of a deep model are
used as filter banks. We obtain in this manner state-of-the-art performance in
numerous datasets well beyond textures, an efficient method to apply deep
features to image regions, as well as benefit in transferring features from one
domain to another.Comment: 29 pages; 13 figures; 8 table
TasselNet: Counting maize tassels in the wild via local counts regression network
Accurately counting maize tassels is important for monitoring the growth
status of maize plants. This tedious task, however, is still mainly done by
manual efforts. In the context of modern plant phenotyping, automating this
task is required to meet the need of large-scale analysis of genotype and
phenotype. In recent years, computer vision technologies have experienced a
significant breakthrough due to the emergence of large-scale datasets and
increased computational resources. Naturally image-based approaches have also
received much attention in plant-related studies. Yet a fact is that most
image-based systems for plant phenotyping are deployed under controlled
laboratory environment. When transferring the application scenario to
unconstrained in-field conditions, intrinsic and extrinsic variations in the
wild pose great challenges for accurate counting of maize tassels, which goes
beyond the ability of conventional image processing techniques. This calls for
further robust computer vision approaches to address in-field variations. This
paper studies the in-field counting problem of maize tassels. To our knowledge,
this is the first time that a plant-related counting problem is considered
using computer vision technologies under unconstrained field-based environment.Comment: 14 page
- …