Search CORE

325 research outputs found

Semi-supervised Semantic Segmentation via Boosting Uncertainty on Unlabeled Data

Author: Luo Yunhao
Zhang Daoan
Zhang Jianguo
Publication venue
Publication date: 30/11/2023
Field of study

We bring a new perspective to semi-supervised semantic segmentation by providing an analysis on the labeled and unlabeled distributions in training datasets. We first figure out that the distribution gap between labeled and unlabeled datasets cannot be ignored, even though the two datasets are sampled from the same distribution. To address this issue, we theoretically analyze and experimentally prove that appropriately boosting uncertainty on unlabeled data can help minimize the distribution gap, which benefits the generalization of the model. We propose two strategies and design an uncertainty booster algorithm, specially for semi-supervised semantic segmentation. Extensive experiments are carried out based on these theories, and the results confirm the efficacy of the algorithm and strategies. Our plug-and-play uncertainty booster is tiny, efficient, and robust to hyperparameters but can significantly promote performance. Our approach achieves state-of-the-art performance in our experiments compared to the current semi-supervised semantic segmentation methods on the popular benchmarks: Cityscapes and PASCAL VOC 2012 with different train settings

arXiv.org e-Print Archive

Impact of Carriage Crowding Level on Bus Dwell Time: Modelling and Analysis

Author: Bie Yiming
Wang Yunhao
Zhang Le
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2020
Field of study

This paper develops two types of estimation models to quantify the impacts of carriage crowding level on bus dwell time. The first model (model I) takes the crowding level and the number of alighting and boarding passengers into consideration and estimates the alighting time and boarding time, respectively. The second model (model II) adopts almost the same regression method, except that the impact of crowding on dwell time is neglected. The analysis was conducted along two major bus routes in Harbin, China, by collecting 640 groups of dwell times under crowded condition manually. Compared with model II, the mean absolute error (MAE) of model I is reduced by 137.51%, which indicates that the accuracy of bus dwell time estimation could be highly improved by introducing carriage crowding level into the model. Meanwhile, the MAE of model I is about 3.9 seconds, which is acceptable in travel time estimation and bus schedule

Directory of Open Access Journals

Chalmers Research

FS_YOLOv8: A Deep Learning Network for Ground Fissures Instance Segmentation in UAV Images of the Coal Mining Area

Author: Lin Yunhao
Xu Zhihua
Zhang Zhenxin
Publication venue
Publication date: 01/05/2024
Field of study

The ground fissures caused by coal mining have seriously affected the ecological environment of the land. Timely and accurate identification and landfill treatment of ground fissures can avoid secondary geological disasters in coal mine areas. At present, the fissure identification methods based on deep learning show excellent performance on roads and walls, etc. Nevertheless, the automatic and reliable segmentation of ground fissures in remote sensing images poses a challenge for deep learning networks, due to the diverse and complex texture information included in the mining ground fissures and background. To overcome these challenges, we propose an improved YOLOv8 instance segmentation network to automatically and efficiently segment the ground fissures in coal mining areas. In detail, a model called FS_YOLOv8 is proposed. The DSPP (Dynamic Snake convolutional Pyramid Pooling) module is incorporated into the FS_YOLOv8 model to establish a multi-scale dynamic snake convolution feature aggregation structure. This module replaces the conventional convolution found in the SPPF module of YOLOv8 and aims to enhance the model's ability to extract features related to fissures with tubular structures. Furthermore, the D-LKA (Deformable Large Kernel Attention) module is employed to autonomously collect fissure context information. To enhance the detection capability of challenging samples in remote sensing images with intricate background and fissure texture, we employ a Slide Loss function. Ultimately, the ground fissure dataset of unmanned aerial vehicle (UAV) images in coal mine areas is subjected to experimental analysis. The experimental findings demonstrate that FS_YOLOv8 exhibits exceptional proficiency in segmenting ground fissures within intricate and expansive mining areas

Directory of Open Access Journals

Copernicus Publications

Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

Author: Li Yunhao
Ou Yixin
Wang Wenjin
Zhang Yin
Publication venue
Publication date: 01/06/2023
Field of study

The pre-training-fine-tuning paradigm based on layout-aware multimodal pre-trained models has achieved significant progress on document image question answering. However, domain pre-training and task fine-tuning for additional visual, layout, and task modules prevent them from directly utilizing off-the-shelf instruction-tuning language foundation models, which have recently shown promising potential in zero-shot learning. Contrary to aligning language models to the domain of document image question answering, we align document image question answering to off-the-shell instruction-tuning language foundation models to utilize their zero-shot capability. Specifically, we propose layout and task aware instruction prompt called LATIN-Prompt, which consists of layout-aware document content and task-aware descriptions. The former recovers the layout information among text segments from OCR tools by appropriate spaces and line breaks. The latter ensures that the model generates answers that meet the requirements, especially format requirements, through a detailed description of task. Experimental results on three benchmarks show that LATIN-Prompt can improve the zero-shot performance of instruction-tuning language foundation models on document image question answering and help them achieve comparable levels to SOTAs based on the pre-training-fine-tuning paradigm. Quantitative analysis and qualitative analysis demonstrate the effectiveness of LATIN-Prompt. We provide the code in supplementary and will release the code to facilitate future research.Comment: Code is available at https://github.com/WenjinW/LATIN-Promp

arXiv.org e-Print Archive