306 research outputs found
MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision
In deep learning, Multi-Layer Perceptrons (MLPs) have once again garnered
attention from researchers. This paper introduces MC-MLP, a general MLP-like
backbone for computer vision that is composed of a series of fully-connected
(FC) layers. In MC-MLP, we propose that the same semantic information has
varying levels of difficulty in learning, depending on the coordinate frame of
features. To address this, we perform an orthogonal transform on the feature
information, equivalent to changing the coordinate frame of features. Through
this design, MC-MLP is equipped with multi-coordinate frame receptive fields
and the ability to learn information across different coordinate frames.
Experiments demonstrate that MC-MLP outperforms most MLPs in image
classification tasks, achieving better performance at the same parameter level.
The code will be available at: https://github.com/ZZM11/MC-MLP
Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models
Scene text detection techniques have garnered significant attention due to
their wide-ranging applications. However, existing methods have a high demand
for training data, and obtaining accurate human annotations is labor-intensive
and time-consuming. As a solution, researchers have widely adopted synthetic
text images as a complementary resource to real text images during
pre-training. Yet there is still room for synthetic datasets to enhance the
performance of scene text detectors. We contend that one main limitation of
existing generation methods is the insufficient integration of foreground text
with the background. To alleviate this problem, we present the Diffusion Model
based Text Generator (DiffText), a pipeline that utilizes the diffusion model
to seamlessly blend foreground text regions with the background's intrinsic
features. Additionally, we propose two strategies to generate visually coherent
text with fewer spelling errors. With fewer text instances, our produced text
images consistently surpass other synthetic data in aiding text detectors.
Extensive experiments on detecting horizontal, rotated, curved, and line-level
texts demonstrate the effectiveness of DiffText in producing realistic text
images
Deciphering the microbial community structures and functions of wastewater treatment at high-altitude area
Introduction: The proper operation of wastewater treatment plants is a key factor in maintaining a stable river and lake environment. Low purification efficiency in winter is a common problem in high-altitude wastewater treatment plants (WWTPs), and analysis of the microbial community involved in the sewage treatment process at high-altitude can provide valuable references for improving this problem.Methods: In this study, the bacterial communities of high- and low-altitude WWTPs were investigated using Illumina high-throughput sequencing (HTS). The interaction between microbial community and environmental variables were explored by co-occurrence correlation network.Results: At genus level, Thauera (5.2%), unclassified_Rhodocyclaceae (3.0%), Dokdonella (2.5%), and Ferribacterium (2.5%) were the dominant genera in high-altitude group. The abundance of nitrogen and phosphorus removal bacteria were higher in high-altitude group (10.2% and 1.3%, respectively) than in low-altitude group (5.4% and 0.6%, respectively). Redundancy analysis (RDA) and co-occurrence network analysis showed that altitude, ultraviolet index (UVI), pH, dissolved oxygen (DO) and total nitrogen (TN) were the dominated environmental factors (p < 0.05) affecting microbial community assembly, and these five variables explained 21.4%, 20.3%, 16.9%, 11.5%, and 8.2% of the bacterial assembly of AS communities.Discussion: The community diversity of high-altitude group was lower than that of low-altitude group, and WWTPs of high-altitude aeras had a unique microbial community structure. Low temperature and strong UVI are pivotal factors contributing to the reduced diversity of activated sludge microbial communities at high-altitudes
Turning a CLIP Model into a Scene Text Spotter
We exploit the potential of the large-scale Contrastive Language-Image
Pretraining (CLIP) model to enhance scene text detection and spotting tasks,
transforming it into a robust backbone, FastTCM-CR50. This backbone utilizes
visual prompt learning and cross-attention in CLIP to extract image and
text-based prior knowledge. Using predefined and learnable prompts,
FastTCM-CR50 introduces an instance-language matching process to enhance the
synergy between image and text embeddings, thereby refining text regions. Our
Bimodal Similarity Matching (BSM) module facilitates dynamic language prompt
generation, enabling offline computations and improving performance.
FastTCM-CR50 offers several advantages: 1) It can enhance existing text
detectors and spotters, improving performance by an average of 1.7% and 1.5%,
respectively. 2) It outperforms the previous TCM-CR50 backbone, yielding an
average improvement of 0.2% and 0.56% in text detection and spotting tasks,
along with a 48.5% increase in inference speed. 3) It showcases robust few-shot
training capabilities. Utilizing only 10% of the supervised data, FastTCM-CR50
improves performance by an average of 26.5% and 5.5% for text detection and
spotting tasks, respectively. 4) It consistently enhances performance on
out-of-distribution text detection and spotting datasets, particularly the
NightTime-ArT subset from ICDAR2019-ArT and the DOTA dataset for oriented
object detection. The code is available at https://github.com/wenwenyu/TCM.Comment: arXiv admin note: text overlap with arXiv:2302.1433
- …