441 research outputs found
Semantic Graph Representation Learning for Handwritten Mathematical Expression Recognition
Handwritten mathematical expression recognition (HMER) has attracted
extensive attention recently. However, current methods cannot explicitly study
the interactions between different symbols, which may fail when faced similar
symbols. To alleviate this issue, we propose a simple but efficient method to
enhance semantic interaction learning (SIL). Specifically, we firstly construct
a semantic graph based on the statistical symbol co-occurrence probabilities.
Then we design a semantic aware module (SAM), which projects the visual and
classification feature into semantic space. The cosine distance between
different projected vectors indicates the correlation between symbols. And
jointly optimizing HMER and SIL can explicitly enhances the model's
understanding of symbol relationships. In addition, SAM can be easily plugged
into existing attention-based models for HMER and consistently bring
improvement. Extensive experiments on public benchmark datasets demonstrate
that our proposed module can effectively enhance the recognition performance.
Our method achieves better recognition performance than prior arts on both
CROHME and HME100K datasets.Comment: 12 Page
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation
The recently rising markup-to-image generation poses greater challenges as
compared to natural image generation, due to its low tolerance for errors as
well as the complex sequence and context correlations between markup and
rendered image. This paper proposes a novel model named "Contrast-augmented
Diffusion Model with Fine-grained Sequence Alignment" (FSA-CDM), which
introduces contrastive positive/negative samples into the diffusion model to
boost performance for markup-to-image generation. Technically, we design a
fine-grained cross-modal alignment module to well explore the sequence
similarity between the two modalities for learning robust feature
representations. To improve the generalization ability, we propose a
contrast-augmented diffusion model to explicitly explore positive and negative
samples by maximizing a novel contrastive variational objective, which is
mathematically inferred to provide a tighter bound for the model's
optimization. Moreover, the context-aware cross attention module is developed
to capture the contextual information within markup language during the
denoising process, yielding better noise prediction results. Extensive
experiments are conducted on four benchmark datasets from different domains,
and the experimental results demonstrate the effectiveness of the proposed
components in FSA-CDM, significantly exceeding state-of-the-art performance by
about 2%-12% DTW improvements. The code will be released at
https://github.com/zgj77/FSACDM.Comment: Accepted to ACM MM 2023. The code will be released at
https://github.com/zgj77/FSACD
LiveSketch: Query Perturbations for Guided Sketch-based Visual Search
LiveSketch is a novel algorithm for searching large image collections using
hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch
search by creating visual suggestions that augment the query as it is drawn,
making query specification an iterative rather than one-shot process that helps
disambiguate users' search intent. Our technical contributions are: a triplet
convnet architecture that incorporates an RNN based variational autoencoder to
search for images using vector (stroke-based) queries; real-time clustering to
identify likely search intents (and so, targets within the search embedding);
and the use of backpropagation from those targets to perturb the input stroke
sequence, so suggesting alterations to the query in order to guide the search.
We show improvements in accuracy and time-to-task over contemporary baselines
using a 67M image corpus.Comment: Accepted to CVPR 201
BiOcularGAN: Bimodal Synthesis and Annotation of Ocular Images
Current state-of-the-art segmentation techniques for ocular images are
critically dependent on large-scale annotated datasets, which are
labor-intensive to gather and often raise privacy concerns. In this paper, we
present a novel framework, called BiOcularGAN, capable of generating synthetic
large-scale datasets of photorealistic (visible light and near-infrared) ocular
images, together with corresponding segmentation labels to address these
issues. At its core, the framework relies on a novel Dual-Branch StyleGAN2
(DB-StyleGAN2) model that facilitates bimodal image generation, and a Semantic
Mask Generator (SMG) component that produces semantic annotations by exploiting
latent features of the DB-StyleGAN2 model. We evaluate BiOcularGAN through
extensive experiments across five diverse ocular datasets and analyze the
effects of bimodal data generation on image quality and the produced
annotations. Our experimental results show that BiOcularGAN is able to produce
high-quality matching bimodal images and annotations (with minimal manual
intervention) that can be used to train highly competitive (deep) segmentation
models (in a privacy aware-manner) that perform well across multiple real-world
datasets. The source code for the BiOcularGAN framework is publicly available
at https://github.com/dariant/BiOcularGAN.Comment: 13 pages, 14 figure
DenseBAM-GI: Attention Augmented DeneseNet with momentum aided GRU for HMER
The task of recognising Handwritten Mathematical Expressions (HMER) is
crucial in the fields of digital education and scholarly research. However, it
is difficult to accurately determine the length and complex spatial
relationships among symbols in handwritten mathematical expressions. In this
study, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER,
where the encoder has a Bottleneck Attention Module (BAM) to improve feature
representation and the decoder has a Gated Input-GRU (GI-GRU) unit with an
extra gate to make decoding long and complex expressions easier. The proposed
model is an efficient and lightweight architecture with performance equivalent
to state-of-the-art models in terms of Expression Recognition Rate (exprate).
It also performs better in terms of top 1, 2, and 3 error accuracy across the
CROHME 2014, 2016, and 2019 datasets. DenseBAM-GI achieves the best exprate
among all models on the CROHME 2019 dataset. Importantly, these successes are
accomplished with a drop in the complexity of the calculation and a reduction
in the need for GPU memory
- …