Search CORE

306 research outputs found

MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision

Author: Mu Tong
Yang Yuliang
Zhao Jianguo
Zhu Mengyu
Zhu Zhimin
Publication venue
Publication date: 08/04/2023
Field of study

In deep learning, Multi-Layer Perceptrons (MLPs) have once again garnered attention from researchers. This paper introduces MC-MLP, a general MLP-like backbone for computer vision that is composed of a series of fully-connected (FC) layers. In MC-MLP, we propose that the same semantic information has varying levels of difficulty in learning, depending on the coordinate frame of features. To address this, we perform an orthogonal transform on the feature information, equivalent to changing the coordinate frame of features. Through this design, MC-MLP is equipped with multi-coordinate frame receptive fields and the ability to learn information across different coordinate frames. Experiments demonstrate that MC-MLP outperforms most MLPs in image classification tasks, achieving better performance at the same parameter level. The code will be available at: https://github.com/ZZM11/MC-MLP

arXiv.org e-Print Archive

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

Author: Bai Xiang
Fu Ling
Liu Yuliang
Wu Zijie
Zhu Yingying
Publication venue
Publication date: 28/11/2023
Field of study

Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and time-consuming. As a solution, researchers have widely adopted synthetic text images as a complementary resource to real text images during pre-training. Yet there is still room for synthetic datasets to enhance the performance of scene text detectors. We contend that one main limitation of existing generation methods is the insufficient integration of foreground text with the background. To alleviate this problem, we present the Diffusion Model based Text Generator (DiffText), a pipeline that utilizes the diffusion model to seamlessly blend foreground text regions with the background's intrinsic features. Additionally, we propose two strategies to generate visually coherent text with fewer spelling errors. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors. Extensive experiments on detecting horizontal, rotated, curved, and line-level texts demonstrate the effectiveness of DiffText in producing realistic text images

arXiv.org e-Print Archive

Deciphering the microbial community structures and functions of wastewater treatment at high-altitude area

Author: Hao Yang
Hongwei Sun
Huanhuan Chang
Wei Zhang
Yanxiang Zhang
Yucan Liu
Yuliang Zhu
Yuliang Zhu
Publication venue: 'Frontiers Media SA'
Publication date: 01/02/2023
Field of study

Introduction: The proper operation of wastewater treatment plants is a key factor in maintaining a stable river and lake environment. Low purification efficiency in winter is a common problem in high-altitude wastewater treatment plants (WWTPs), and analysis of the microbial community involved in the sewage treatment process at high-altitude can provide valuable references for improving this problem.Methods: In this study, the bacterial communities of high- and low-altitude WWTPs were investigated using Illumina high-throughput sequencing (HTS). The interaction between microbial community and environmental variables were explored by co-occurrence correlation network.Results: At genus level, Thauera (5.2%), unclassified_Rhodocyclaceae (3.0%), Dokdonella (2.5%), and Ferribacterium (2.5%) were the dominant genera in high-altitude group. The abundance of nitrogen and phosphorus removal bacteria were higher in high-altitude group (10.2% and 1.3%, respectively) than in low-altitude group (5.4% and 0.6%, respectively). Redundancy analysis (RDA) and co-occurrence network analysis showed that altitude, ultraviolet index (UVI), pH, dissolved oxygen (DO) and total nitrogen (TN) were the dominated environmental factors (p < 0.05) affecting microbial community assembly, and these five variables explained 21.4%, 20.3%, 16.9%, 11.5%, and 8.2% of the bacterial assembly of AS communities.Discussion: The community diversity of high-altitude group was lower than that of low-altitude group, and WWTPs of high-altitude aeras had a unique microbial community structure. Low temperature and strong UVI are pivotal factors contributing to the reduced diversity of activated sludge microbial communities at high-altitudes

Directory of Open Access Journals

Turning a CLIP Model into a Scene Text Spotter

Author: Bai Xiang
Cao Haoyu
Liu Yuliang
Sun Xing
Yu Wenwen
Zhu Xingkui
Publication venue
Publication date: 20/08/2023
Field of study

We exploit the potential of the large-scale Contrastive Language-Image Pretraining (CLIP) model to enhance scene text detection and spotting tasks, transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes visual prompt learning and cross-attention in CLIP to extract image and text-based prior knowledge. Using predefined and learnable prompts, FastTCM-CR50 introduces an instance-language matching process to enhance the synergy between image and text embeddings, thereby refining text regions. Our Bimodal Similarity Matching (BSM) module facilitates dynamic language prompt generation, enabling offline computations and improving performance. FastTCM-CR50 offers several advantages: 1) It can enhance existing text detectors and spotters, improving performance by an average of 1.7% and 1.5%, respectively. 2) It outperforms the previous TCM-CR50 backbone, yielding an average improvement of 0.2% and 0.56% in text detection and spotting tasks, along with a 48.5% increase in inference speed. 3) It showcases robust few-shot training capabilities. Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26.5% and 5.5% for text detection and spotting tasks, respectively. 4) It consistently enhances performance on out-of-distribution text detection and spotting datasets, particularly the NightTime-ArT subset from ICDAR2019-ArT and the DOTA dataset for oriented object detection. The code is available at https://github.com/wenwenyu/TCM.Comment: arXiv admin note: text overlap with arXiv:2302.1433

arXiv.org e-Print Archive