368 research outputs found

    Impact of adversarial examples on deep learning models for biomedical image segmentation

    Get PDF
    Deep learning models, which are increasingly being used in the field of medical image analysis, come with a major security risk, namely, their vulnerability to adversarial examples. Adversarial examples are carefully crafted samples that force machine learning models to make mistakes during testing time. These malicious samples have been shown to be highly effective in misguiding classification tasks. However, research on the influence of adversarial examples on segmentation is significantly lacking. Given that a large portion of medical imaging problems are effectively segmentation problems, we analyze the impact of adversarial examples on deep learning-based image segmentation models. Specifically, we expose the vulnerability of these models to adversarial examples by proposing the Adaptive Segmentation Mask Attack (ASMA). This novel algorithm makes it possible to craft targeted adversarial examples that come with (1) high intersection-over-union rates between the target adversarial mask and the prediction and (2) with perturbation that is, for the most part, invisible to the bare eye. We lay out experimental and visual evidence by showing results obtained for the ISIC skin lesion segmentation challenge and the problem of glaucoma optic disc segmentation. An implementation of this algorithm and additional examples can be found at https://github.com/utkuozbulak/adaptive-segmentation-mask-attack

    Recovering 6D Object Pose: A Review and Multi-modal Analysis

    Full text link
    A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining "automation" in robotic manipulation? What next steps should the community take for improving "autonomy" in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances' 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem

    Geostatistical merging of weather radar data with a sparse rain gauge network in Queensland

    Get PDF
    Many parts of Australia, including much of Queensland and Northern Australia, tend to have sparse rain gauge coverage. To provide rainfall information across Australia, several gridded daily rainfall datasets such as those available through the Australian Water Availability Project and Scientific Information for Land Owners services have been developed. These daily grids are produced by interpolation of rain gauge data and therefore can provide unrealistic rainfall estimates in areas that have few rain gauges. To obtain rainfall data at a higher spatial resolution, weather radars and satellites can provide coverage over a large area although their measurements come with considerable uncertainty. Various approaches have been developed to adjust radar and satellite data and statistically merge them with rain gauge measurements in interpolation schemes, the goal being to retain the information on the spatial distribution of rainfall provided by remote sensing while also taking advantage of the greater accuracy of the rain gauges, but many of these techniques have been applied primarily on shorter time scales of an hour or less. This paper applies some existing methods for geostatistical merging of radar data with sparse rain gauge networks and evaluates the performance of the approaches using the Mt Stapylton radar in Brisbane and 15 surrounding rain gauges. Summer and winter data from 01/12/2013 to 28/02/2018 are considered. The radar data is corrected for mean field bias using quantile mapping and is used to develop the variogram models for use in Kriging. The performance of Kriging the gauge data using the radar variogram is compared with conditional merging and Kriging with radar values introduced as a drift variable. Leave-one-out cross-validation is used to evaluate the performance of the methods. We find some disagreement between all radar-based approaches and the validation gauge measurements with typical daily root-mean-square errors being between 10mm and 20mm for all approaches. Some outliers with substantially higher RMSE are noted for some days in the unadjusted radar data as well as in the corrected and interpolated data. For winter data the bias-correction and interpolation steps increased the agreement between the radar data and the validation gauges, but this improvement was not observed in the summer data. In addition, due to the low number of gauges the performance of the interpolation is extremely sensitive to the rain gauge values, with certain combinations of rain gauge values and choice of validation gauge leading to extremely large cross-validation errors. The results indicate that while incorporating the radar data makes it possible to perform Kriging with few gauges ona single day's data, this is not an ideal approach for quantitative precipitation estimation and further steps should be taken to improve the radar-gauge correlation

    CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection

    Full text link
    Robust face detection in the wild is one of the ultimate components to support various facial related problems, i.e. unconstrained face recognition, facial periocular recognition, facial landmarking and pose estimation, facial expression recognition, 3D facial model construction, etc. Although the face detection problem has been intensely studied for decades with various commercial applications, it still meets problems in some real-world scenarios due to numerous challenges, e.g. heavy facial occlusions, extremely low resolutions, strong illumination, exceptionally pose variations, image or video compression artifacts, etc. In this paper, we present a face detection approach named Contextual Multi-Scale Region-based Convolution Neural Network (CMS-RCNN) to robustly solve the problems mentioned above. Similar to the region-based CNNs, our proposed network consists of the region proposal component and the region-of-interest (RoI) detection component. However, far apart of that network, there are two main contributions in our proposed network that play a significant role to achieve the state-of-the-art performance in face detection. Firstly, the multi-scale information is grouped both in region proposal and RoI detection to deal with tiny face regions. Secondly, our proposed network allows explicit body contextual reasoning in the network inspired from the intuition of human vision system. The proposed approach is benchmarked on two recent challenging face detection databases, i.e. the WIDER FACE Dataset which contains high degree of variability, as well as the Face Detection Dataset and Benchmark (FDDB). The experimental results show that our proposed approach trained on WIDER FACE Dataset outperforms strong baselines on WIDER FACE Dataset by a large margin, and consistently achieves competitive results on FDDB against the recent state-of-the-art face detection methods

    ICNet for Real-Time Semantic Segmentation on High-Resolution Images

    Full text link
    We focus on the challenging task of real-time semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. We provide in-depth analysis of our framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. Our system yields real-time inference on a single GPU card with decent quality results evaluated on challenging datasets like Cityscapes, CamVid and COCO-Stuff.Comment: ECCV 201

    Grid Loss: Detecting Occluded Faces

    Full text link
    Detection of partially occluded objects is a challenging computer vision problem. Standard Convolutional Neural Network (CNN) detectors fail if parts of the detection window are occluded, since not every sub-part of the window is discriminative on its own. To address this issue, we propose a novel loss layer for CNNs, named grid loss, which minimizes the error rate on sub-blocks of a convolution layer independently rather than over the whole feature map. This results in parts being more discriminative on their own, enabling the detector to recover if the detection window is partially occluded. By mapping our loss layer back to a regular fully connected layer, no additional computational cost is incurred at runtime compared to standard CNNs. We demonstrate our method for face detection on several public face detection benchmarks and show that our method outperforms regular CNNs, is suitable for realtime applications and achieves state-of-the-art performance.Comment: accepted to ECCV 201

    Exploring Food Detection using CNNs

    Full text link
    One of the most common critical factors directly related to the cause of a chronic disease is unhealthy diet consumption. In this sense, building an automatic system for food analysis could allow a better understanding of the nutritional information with respect to the food eaten and thus it could help in taking corrective actions in order to consume a better diet. The Computer Vision community has focused its efforts on several areas involved in the visual food analysis such as: food detection, food recognition, food localization, portion estimation, among others. For food detection, the best results evidenced in the state of the art were obtained using Convolutional Neural Network. However, the results of all these different approaches were gotten on different datasets and therefore are not directly comparable. This article proposes an overview of the last advances on food detection and an optimal model based on GoogLeNet Convolutional Neural Network method, principal component analysis, and a support vector machine that outperforms the state of the art on two public food/non-food datasets

    Image Co-localization by Mimicking a Good Detector's Confidence Score Distribution

    Full text link
    Given a set of images containing objects from the same category, the task of image co-localization is to identify and localize each instance. This paper shows that this problem can be solved by a simple but intriguing idea, that is, a common object detector can be learnt by making its detection confidence scores distributed like those of a strongly supervised detector. More specifically, we observe that given a set of object proposals extracted from an image that contains the object of interest, an accurate strongly supervised object detector should give high scores to only a small minority of proposals, and low scores to most of them. Thus, we devise an entropy-based objective function to enforce the above property when learning the common object detector. Once the detector is learnt, we resort to a segmentation approach to refine the localization. We show that despite its simplicity, our approach outperforms state-of-the-art methods.Comment: Accepted to Proc. European Conf. Computer Vision 201

    Impact of Ground Truth Annotation Quality on Performance of Semantic Image Segmentation of Traffic Conditions

    Full text link
    Preparation of high-quality datasets for the urban scene understanding is a labor-intensive task, especially, for datasets designed for the autonomous driving applications. The application of the coarse ground truth (GT) annotations of these datasets without detriment to the accuracy of semantic image segmentation (by the mean intersection over union - mIoU) could simplify and speedup the dataset preparation and model fine tuning before its practical application. Here the results of the comparative analysis for semantic segmentation accuracy obtained by PSPNet deep learning architecture are presented for fine and coarse annotated images from Cityscapes dataset. Two scenarios were investigated: scenario 1 - the fine GT images for training and prediction, and scenario 2 - the fine GT images for training and the coarse GT images for prediction. The obtained results demonstrated that for the most important classes the mean accuracy values of semantic image segmentation for coarse GT annotations are higher than for the fine GT ones, and the standard deviation values are vice versa. It means that for some applications some unimportant classes can be excluded and the model can be tuned further for some classes and specific regions on the coarse GT dataset without loss of the accuracy even. Moreover, this opens the perspectives to use deep neural networks for the preparation of such coarse GT datasets.Comment: 10 pages, 6 figures, 2 tables, The Second International Conference on Computer Science, Engineering and Education Applications (ICCSEEA2019) 26-27 January 2019, Kiev, Ukrain

    ImageNet Large Scale Visual Recognition Challenge

    Get PDF
    The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional reference
    • …
    corecore