Search CORE

111 research outputs found

CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition

Author: Gao Liangcai
Zhao Wenqi
Publication venue
Publication date: 17/07/2022
Field of study

The Transformer-based encoder-decoder architecture has recently made significant advances in recognizing handwritten mathematical expressions. However, the transformer model still suffers from the lack of coverage problem, making its expression recognition rate (ExpRate) inferior to its RNN counterpart. Coverage information, which records the alignment information of the past steps, has proven effective in the RNN models. In this paper, we propose CoMER, a model that adopts the coverage information in the transformer decoder. Specifically, we propose a novel Attention Refinement Module (ARM) to refine the attention weights with past alignment information without hurting its parallelism. Furthermore, we take coverage information to the extreme by proposing self-coverage and cross-coverage, which utilize the past alignment information from the current and previous layers. Experiments show that CoMER improves the ExpRate by 0.61%/2.09%/1.59% compared to the current state-of-the-art model, and reaches 59.33%/59.81%/62.97% on the CROHME 2014/2016/2019 test sets.Comment: Accept by ECCV 202

arXiv.org e-Print Archive

Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution

Author: Gao Liangcai
Tang Zhi
Wei Baole
Zhou Yuxuan
Publication venue
Publication date: 22/11/2023
Field of study

Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images, consequently elevating recognition accuracy in Scene Text Recognition (STR). Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance to address this issue. Nevertheless, they remain deficient when confronted with severely blurred images, due to their insufficient generation capability when little structural or semantic information can be extracted from original images. Therefore, we introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios. Moreover, we propose a Recognition-Guided Denoising Network, to guide the diffusion model generating LR-consistent results through succinct semantic guidance. Experiments on the TextZoom dataset demonstrate the superiority of RGDiffSR over prior state-of-the-art methods in both text recognition accuracy and image fidelity

arXiv.org e-Print Archive

Iterative projection meets sparsity regularization: towards practical single-shot quantitative phase imaging with in-line holography

Author: Liangcai Cao
Yunhui Gao
Publication venue: 'Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences'
Publication date: 01/02/2023
Field of study

Holography provides access to the optical phase. The emerging compressive phase retrieval approach can achieve in-line holographic imaging beyond the information-theoretic limit or even from a single shot by exploring the signal priors. However, iterative projection methods based on physical knowledge of the wavefield suffer from poor imaging quality, whereas the regularization techniques sacrifice robustness for fidelity. In this work, we present a unified compressive phase retrieval framework for in-line holography that encapsulates the unique advantages of both physical constraints and sparsity priors. In particular, a constrained complex total variation (CCTV) regularizer is introduced that explores the well-known absorption and support constraints together with sparsity in the gradient domain, enabling practical high-quality in-line holographic imaging from a single intensity image. We developed efficient solvers based on the proximal gradient method for the non-smooth regularized inverse problem and the corresponding denoising subproblem. Theoretical analyses further guarantee the convergence of the algorithms with prespecified parameters, obviating the need for manual parameter tuning. As both simulated and optical experiments demonstrate, the proposed CCTV model can characterize complex natural scenes while utilizing physically tractable constraints for quality enhancement. This new compressive phase retrieval approach can be extended, with minor adjustments, to various imaging configurations, sparsifying operators, and physical knowledge. It may cast new light on both theoretical and empirical studies

Directory of Open Access Journals

Cycle Invariant Positional Encoding for Graph Representation Learning

Author: Chen Chao
Gao Liangcai
Ma Tengfei
Tang Zhi
Wang Yusu
Yan Zuoyu
Publication venue
Publication date: 30/11/2023
Field of study

Cycles are fundamental elements in graph-structured data and have demonstrated their effectiveness in enhancing graph learning models. To encode such information into a graph learning framework, prior works often extract a summary quantity, ranging from the number of cycles to the more sophisticated persistence diagram summaries. However, more detailed information, such as which edges are encoded in a cycle, has not yet been used in graph neural networks. In this paper, we make one step towards addressing this gap, and propose a structure encoding module, called CycleNet, that encodes cycle information via edge structure encoding in a permutation invariant manner. To efficiently encode the space of all cycles, we start with a cycle basis (i.e., a minimal set of cycles generating the cycle space) which we compute via the kernel of the 1-dimensional Hodge Laplacian of the input graph. To guarantee the encoding is invariant w.r.t. the choice of cycle basis, we encode the cycle information via the orthogonal projector of the cycle basis, which is inspired by BasisNet proposed by Lim et al. We also develop a more efficient variant which however requires that the input graph has a unique shortest cycle basis. To demonstrate the effectiveness of the proposed module, we provide some theoretical understandings of its expressive power. Moreover, we show via a range of experiments that networks enhanced by our CycleNet module perform better in various benchmarks compared to several existing SOTA models.Comment: Accepted as oral presentation in the Learning on Graphs Conference (LoG 2023

arXiv.org e-Print Archive