325 research outputs found
Semi-supervised Semantic Segmentation via Boosting Uncertainty on Unlabeled Data
We bring a new perspective to semi-supervised semantic segmentation by
providing an analysis on the labeled and unlabeled distributions in training
datasets. We first figure out that the distribution gap between labeled and
unlabeled datasets cannot be ignored, even though the two datasets are sampled
from the same distribution. To address this issue, we theoretically analyze and
experimentally prove that appropriately boosting uncertainty on unlabeled data
can help minimize the distribution gap, which benefits the generalization of
the model. We propose two strategies and design an uncertainty booster
algorithm, specially for semi-supervised semantic segmentation. Extensive
experiments are carried out based on these theories, and the results confirm
the efficacy of the algorithm and strategies. Our plug-and-play uncertainty
booster is tiny, efficient, and robust to hyperparameters but can significantly
promote performance. Our approach achieves state-of-the-art performance in our
experiments compared to the current semi-supervised semantic segmentation
methods on the popular benchmarks: Cityscapes and PASCAL VOC 2012 with
different train settings
Impact of Carriage Crowding Level on Bus Dwell Time: Modelling and Analysis
This paper develops two types of estimation models to quantify the impacts of carriage crowding level on bus dwell time. The first model (model I) takes the crowding level and the number of alighting and boarding passengers into consideration and estimates the alighting time and boarding time, respectively. The second model (model II) adopts almost the same regression method, except that the impact of crowding on dwell time is neglected. The analysis was conducted along two major bus routes in Harbin, China, by collecting 640 groups of dwell times under crowded condition manually. Compared with model II, the mean absolute error (MAE) of model I is reduced by 137.51%, which indicates that the accuracy of bus dwell time estimation could be highly improved by introducing carriage crowding level into the model. Meanwhile, the MAE of model I is about 3.9 seconds, which is acceptable in travel time estimation and bus schedule
FS_YOLOv8: A Deep Learning Network for Ground Fissures Instance Segmentation in UAV Images of the Coal Mining Area
The ground fissures caused by coal mining have seriously affected the ecological environment of the land. Timely and accurate identification and landfill treatment of ground fissures can avoid secondary geological disasters in coal mine areas. At present, the fissure identification methods based on deep learning show excellent performance on roads and walls, etc. Nevertheless, the automatic and reliable segmentation of ground fissures in remote sensing images poses a challenge for deep learning networks, due to the diverse and complex texture information included in the mining ground fissures and background. To overcome these challenges, we propose an improved YOLOv8 instance segmentation network to automatically and efficiently segment the ground fissures in coal mining areas. In detail, a model called FS_YOLOv8 is proposed. The DSPP (Dynamic Snake convolutional Pyramid Pooling) module is incorporated into the FS_YOLOv8 model to establish a multi-scale dynamic snake convolution feature aggregation structure. This module replaces the conventional convolution found in the SPPF module of YOLOv8 and aims to enhance the model's ability to extract features related to fissures with tubular structures. Furthermore, the D-LKA (Deformable Large Kernel Attention) module is employed to autonomously collect fissure context information. To enhance the detection capability of challenging samples in remote sensing images with intricate background and fissure texture, we employ a Slide Loss function. Ultimately, the ground fissure dataset of unmanned aerial vehicle (UAV) images in coal mine areas is subjected to experimental analysis. The experimental findings demonstrate that FS_YOLOv8 exhibits exceptional proficiency in segmenting ground fissures within intricate and expansive mining areas
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
The pre-training-fine-tuning paradigm based on layout-aware multimodal
pre-trained models has achieved significant progress on document image question
answering. However, domain pre-training and task fine-tuning for additional
visual, layout, and task modules prevent them from directly utilizing
off-the-shelf instruction-tuning language foundation models, which have
recently shown promising potential in zero-shot learning. Contrary to aligning
language models to the domain of document image question answering, we align
document image question answering to off-the-shell instruction-tuning language
foundation models to utilize their zero-shot capability. Specifically, we
propose layout and task aware instruction prompt called LATIN-Prompt, which
consists of layout-aware document content and task-aware descriptions. The
former recovers the layout information among text segments from OCR tools by
appropriate spaces and line breaks. The latter ensures that the model generates
answers that meet the requirements, especially format requirements, through a
detailed description of task. Experimental results on three benchmarks show
that LATIN-Prompt can improve the zero-shot performance of instruction-tuning
language foundation models on document image question answering and help them
achieve comparable levels to SOTAs based on the pre-training-fine-tuning
paradigm. Quantitative analysis and qualitative analysis demonstrate the
effectiveness of LATIN-Prompt. We provide the code in supplementary and will
release the code to facilitate future research.Comment: Code is available at https://github.com/WenjinW/LATIN-Promp
- …