1,002 research outputs found
Taking a look at small-scale pedestrians and occluded pedestrians
Small-scale pedestrian detection and occluded pedestrian detection are two challenging tasks. However, most state-of-the-art methods merely handle one single task each time, thus giving rise to relatively poor performance when the two tasks, in practice, are required simultaneously. In this paper, it is found that small-scale pedestrian detection and occluded pedestrian detection actually have a common problem, i.e., an inaccurate location problem. Therefore, solving this problem enables to improve the performance of both tasks. To this end, we pay more attention to the predicted bounding box with worse location precision and extract more contextual information around objects, where two modules (i.e., location bootstrap and semantic transition) are proposed. The location bootstrap is used to reweight regression loss, where the loss of the predicted bounding box far from the corresponding ground-truth is upweighted and the loss of the predicted bounding box near the corresponding ground-truth is downweighted. Additionally, the semantic transition adds more contextual information and relieves semantic inconsistency of the skip-layer fusion. Since the location bootstrap is not used at the test stage and the semantic transition is lightweight, the proposed method does not add many extra computational costs during inference. Experiments on the challenging CityPersons and Caltech datasets show that the proposed method outperforms the state-of-the-art methods on the small-scale pedestrians and occluded pedestrians (e.g., 5.20% and 4.73% improvements on the Caltech)
Scene Text Eraser
The character information in natural scene images contains various personal
information, such as telephone numbers, home addresses, etc. It is a high risk
of leakage the information if they are published. In this paper, we proposed a
scene text erasing method to properly hide the information via an inpainting
convolutional neural network (CNN) model. The input is a scene text image, and
the output is expected to be text erased image with all the character regions
filled up the colors of the surrounding background pixels. This work is
accomplished by a CNN model through convolution to deconvolution with
interconnection process. The training samples and the corresponding inpainting
images are considered as teaching signals for training. To evaluate the text
erasing performance, the output images are detected by a novel scene text
detection method. Subsequently, the same measurement on text detection is
utilized for testing the images in benchmark dataset ICDAR2013. Compared with
direct text detection way, the scene text erasing process demonstrates a
drastically decrease on the precision, recall and f-score. That proves the
effectiveness of proposed method for erasing the text in natural scene images
- …