8 research outputs found

    Weakly supervised underwater fish segmentation using affinity LCFCN

    Get PDF
    Estimating fish body measurements like length, width, and mass has received considerable research due to its potential in boosting productivity in marine and aquaculture applications. Some methods are based on manual collection of these measurements using tools like a ruler which is time consuming and labour intensive. Others rely on fully-supervised segmentation models to automatically acquire these measurements but require collecting per-pixel labels which are also time consuming. It can take up to 2 minutes per fish to acquire accurate segmentation labels. To address this problem, we propose a segmentation model that can efficiently train on images labeled with point-level supervision, where each fish is annotated with a single click. This labeling scheme takes an average of only 1 second per fish. Our model uses a fully convolutional neural network with one branch that outputs per-pixel scores and another that outputs an affinity matrix. These two outputs are aggregated using a random walk to get the final, refined per-pixel output. The whole model is trained end-to-end using the localization-based counting fully convolutional neural network (LCFCN) loss and thus we call our method Affinity-LCFCN (A-LCFCN). We conduct experiments on the DeepFish dataset, which contains several fish habitats from north-eastern Australia. The results show that A-LCFCN outperforms a fully-supervised segmentation model when the annotation budget is fixed. They also show that A-LCFCN achieves better segmentation results than LCFCN and a standard baseline

    Dilation-Erosion for Single-Frame Supervised Temporal Action Localization

    Full text link
    To balance the annotation labor and the granularity of supervision, single-frame annotation has been introduced in temporal action localization. It provides a rough temporal location for an action but implicitly overstates the supervision from the annotated-frame during training, leading to the confusion between actions and backgrounds, i.e., action incompleteness and background false positives. To tackle the two challenges, in this work, we present the Snippet Classification model and the Dilation-Erosion module. In the Dilation-Erosion module, we expand the potential action segments with a loose criterion to alleviate the problem of action incompleteness and then remove the background from the potential action segments to alleviate the problem of action incompleteness. Relying on the single-frame annotation and the output of the snippet classification, the Dilation-Erosion module mines pseudo snippet-level ground-truth, hard backgrounds and evident backgrounds, which in turn further trains the Snippet Classification model. It forms a cyclic dependency. Furthermore, we propose a new embedding loss to aggregate the features of action instances with the same label and separate the features of actions from backgrounds. Experiments on THUMOS14 and ActivityNet 1.2 validate the effectiveness of the proposed method. Code has been made publicly available (https://github.com/LingJun123/single-frame-TAL).Comment: 28 pages, 8 figure

    Learning Instance Segmentation from Sparse Supervision

    Get PDF
    Instance segmentation is an important task in many domains of automatic image processing, such as self-driving cars, robotics and microscopy data analysis. Recently, deep learning-based algorithms have brought image segmentation close to human performance. However, most existing models rely on dense groundtruth labels for training, which are expensive, time consuming and often require experienced annotators to perform the labeling. Besides the annotation burden, training complex high-capacity neural networks depends upon non-trivial expertise in the choice and tuning of hyperparameters, making the adoption of these models challenging for researchers in other fields. The aim of this work is twofold. The first is to make the deep learning segmentation methods accessible to non-specialist. The second is to address the dense annotation problem by developing instance segmentation methods trainable with limited groundtruth data. In the first part of this thesis, I bring state-of-the-art instance segmentation methods closer to non-experts by developing PlantSeg: a pipeline for volumetric segmentation of light microscopy images of biological tissues into cells. PlantSeg comes with a large repository of pre-trained models and delivers highly accurate results on a variety of samples and image modalities. We exemplify its usefulness to answer biological questions in several collaborative research projects. In the second part, I tackle the dense annotation bottleneck by introducing SPOCO, an instance segmentation method, which can be trained from just a few annotated objects. It demonstrates strong segmentation performance on challenging natural and biological benchmark datasets at a very reduced manual annotation cost and delivers state-of-the-art results on the CVPPP benchmark. In summary, my contributions enable training of instance segmentation models with limited amounts of labeled data and make these methods more accessible for non-experts, speeding up the process of quantitative data analysis

    ๊ฐ์ฒด ์ธ์‹์˜ ๋ ˆ์ด๋ธ” ํšจ์œจ์  ํ•™์Šต

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2023. 2. ์œค์„ฑ๋กœ.๋”ฅ๋Ÿฌ๋‹์˜ ๋ฐœ์ „์€ ์ด๋ฏธ์ง€ ๋ฌผ์ฒด ์ธ์‹ ๋ถ„์•ผ๋ฅผ ํฌ๊ฒŒ ๋ฐœ์ „์‹œ์ผฐ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ๋ฐœ์ „์€ ์ˆ˜๋งŽ์€ ํ•™์Šต ์ด๋ฏธ์ง€์™€ ๊ฐ ์ด๋ฏธ์ง€์— ์‚ฌ๋žŒ์ด ์ง์ ‘ ์ƒ์„ฑํ•œ ๋ฌผ์ฒด์˜ ์œ„์น˜ ์ •๋ณด์— ๋Œ€ํ•œ ๋ ˆ์ด๋ธ” ๋•๋ถ„์— ๊ฐ€๋Šฅํ•œ ๊ฒƒ์ด์˜€๋‹ค. ์ด๋ฏธ์ง€ ๋ฌผ์ฒด ์ธ์‹ ๋ถ„์•ผ๋ฅผ ์‹ค์ƒํ™œ์—์„œ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค์–‘ํ•œ ๋ฌผ์ฒด์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ์ธ์‹ ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•˜๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด์„  ๊ฐ ์นดํ…Œ๊ณ ๋ฆฌ๋‹น ์ˆ˜๋งŽ์€ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ํ•˜์ง€๋งŒ ๊ฐ ์ด๋ฏธ์ง€๋‹น ๋ฌผ์ฒด์˜ ์œ„์น˜๋ฅผ ๊ฐ ํ”ฝ์…€๋งˆ๋‹ค ์ฃผ์„์„ ๋‹ค๋Š” ๊ฒƒ์€ ๋งŽ์€ ๋น„์šฉ์ด ๋“ค์–ด๊ฐ„๋‹ค. ์ด๋Ÿฌํ•œ ์ •๋ณด๋ฅผ ์–ป์„ ๋•Œ ํ•„์š”ํ•œ ๋น„์šฉ์€ ์•ฝํ•œ์ง€๋„ํ•™์Šต์œผ๋กœ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ์•ฝํ•œ ์ง€๋„ ํ•™์Šต์ด๋ž€, ๋ฌผ์ฒด์˜ ๋ช…์‹œ์ ์ธ ์œ„์น˜ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ๋ ˆ์ด๋ธ”๋ณด๋‹ค ๋” ๊ฐ’์‹ธ๊ฒŒ ์–ป์„ ์ˆ˜๋Š” ์žˆ์ง€๋งŒ, ์•ฝํ•œ ์œ„์น˜ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‰ด๋Ÿด๋„คํŠธ์›Œํฌ๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌผ์ฒด์˜ ์นดํ…Œ๊ณ ๋ฆฌ ์ •๋ณด, ํ•™์Šต ์™ธ ๋ถ„ํฌ ๋ฐ์ดํ„ฐ (out-of-distribution) ๋ฐ์ดํ„ฐ, ๊ทธ๋ฆฌ๊ณ  ๋ฌผ์ฒด์˜ ๋ฐ•์Šค ๋ ˆ์ด๋ธ”์„ ํ™œ์šฉํ•˜๋Š” ์•ฝํ•œ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ๋“ค์„ ๋‹ค๋ฃฌ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๋ฌผ์ฒด์˜ ์นดํ…Œ๊ณ ๋ฆฌ ์ •๋ณด๋ฅผ ์ด์šฉํ•œ ์•ฝํ•œ ์ง€๋„ ํ•™์Šต์„ ๋‹ค๋ฃฌ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ์นดํ…Œ๋กœ๊ธฐ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์€ ํ•™์Šต๋œ ๋ถ„๋ฅ˜๊ธฐ๋กœ๋ถ€ํ„ฐ ์–ป์–ด์ง„ ๊ธฐ์—ฌ๋„๋งต (attribution map) ์„ ํ™œ์šฉํ•˜์ง€๋งŒ, ์ด๋“ค์€ ๋ฌผ์ฒด์˜ ์ผ๋ถ€๋งŒ์„ ์ฐพ์•„๋‚ด๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ๊ทผ๋ณธ ์›์ธ์„ ์ด๋ก ์ ์ธ ๊ด€์ ์—์„œ ์˜๋…ผํ•˜๊ณ , ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ์„ธ ๊ฐ€์ง€์˜ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ๋ฌผ์ฒด์˜ ์นดํ…Œ๊ณ ๋ฆฌ ์ •๋ณด๋งŒ ํ™œ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ์ด๋ฏธ์ง€์˜ ์ „๊ฒฝ๊ณผ ๋ฐฐ๊ฒฝ์ด ์•…์˜์ ์ธ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ์ž˜ ์•Œ๋ ค์ ธ ์žˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฌํ•œ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ํ•™์Šต ์™ธ ๋ถ„ํฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์™„ํ™”ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ฌผ์ฒด์˜ ์นดํ…Œ๊ณ ๋ฆฌ ์ •๋ณด์— ๊ธฐ๋ฐ˜ํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ๊ฐ™์€ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๋ฌผ์ฒด๋ฅผ ๋ถ„๋ฆฌํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ธ์Šคํ„ด์Šค ๋ถ„ํ•  (instance segmentation) ์— ์ ์šฉ๋˜๊ธฐ๋Š” ํž˜๋“ค๋‹ค. ๋”ฐ๋ผ์„œ ๋ฌผ์ฒด์˜ ๋ฐ•์Šค ๋ ˆ์ด๋ธ”์„ ํ™œ์šฉํ•œ ์•ฝํ•œ ์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์„ ํ†ตํ•ด ๋ ˆ์ด๋ธ”์„ ์ œ์ž‘ํ•˜๋Š” ์‹œ๊ฐ„์„ ํš๊ธฐ์ ์œผ๋กœ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์‹คํ—˜๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ํ™•์ธํ–ˆ๋‹ค. ์–ด๋ ค์šด ๋ฐ์ดํ„ฐ์…‹์ธ Pascal VOC ์— ๋Œ€ํ•ด ์šฐ๋ฆฌ๋Š” 91%์˜ ๋ฐ์ดํ„ฐ ๋น„์šฉ์„ ๊ฐ์†Œํ•˜๋ฉด์„œ, ๊ฐ•ํ•œ ๋ ˆ์ด๋ธ”๋กœ ํ•™์Šต๋œ ๋น„๊ต๊ตฐ์˜ 89%์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ๋˜ํ•œ, ๋ฌผ์ฒด์˜ ๋ฐ•์Šค ์ •๋ณด๋ฅผ ํ™œ์šฉํ•ด์„œ๋Š” 83% ์˜ ๋ฐ์ดํ„ฐ ๋น„์šฉ์„ ๊ฐ์†Œํ•˜๋ฉด์„œ, ๊ฐ•ํ•œ ๋ ˆ์ด๋ธ”๋กœ ํ•™์Šต๋œ ๋น„๊ต๊ตฐ์˜ 96%์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ๋“ค์ด ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜์˜ ๋ฌผ์ฒด ์ธ์‹์ด ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์™€ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ์—์„œ ํ™œ์šฉ๋˜๋Š” ๋ฐ์— ์žˆ์–ด ๋„์›€์ด ๋˜๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•œ๋‹ค.Advances in deep neural network approaches have produced tremendous progress in object recognition tasks, but it has come at the cost of annotating a huge amount of training images with explicit localization cues. To use object recognition tasks in real-life applications requires a large variety of object classes and a great deal of labeled data for each class. However, labeling pixel-level annotations of each object class is laborious, and hampers the expansion of object classes. The need for such expensive annotations is sidestepped by weakly supervised learning, in which a DNN is trained on images with some form of abbreviated annotation that is cheaper than explicit localization cues. In the dissertation, we study the methods of using various form of weak supervision, i.e., image-level class labels, out-of-distribution data, and bounding box labels. We first study image-level class labels for weakly supervised semantic segmentation. Most of the weakly supervised methods on image-level class labels depend on attribution maps from a trained classifier, but their focus tends to be restricted to a small discriminative region of the target object. We theoretically discuss the root cause of this problem, and propose three novel techniques to address this issue. However, built on class labels only, the produced localization maps are known to suffer from the confusion between foreground and background cues, i.e., spurious correlation. We address the spurious correlation problem by utilizing out-of-distribution data. Finally, methods based on class labels cannot separate different instance objects of the same class, which is essential for instance segmentation. Therefore, we utilize bounding box labels for weakly supervised instance segmentation as boxes provide information about individual objects and their locations. Experimental results show that annotation cost for learning semantic segmentation and instance segmentation can be significantly reduced: On the challenging Pascal VOC dataset, we have achieved 89% of the performance of the fully supervised equivalent by using only class labels, which reduces the label cost by 91%. In addition, we have achieved 96% of the performance of the fully supervised equivalent by using bounding box labels, which reduces the label cost by 83%. We expect that the methods introduced in this dissertation will be helpful for applying deep learning based object recognition tasks in a variety of domains and scenarios.1 Introduction 1 2 Background 8 2.1 Object Recognition 8 2.2 Weak Supervision 13 2.3 Preliminary Algirothms 16 2.3.1 Attribution Methods for Image Classifier 16 2.3.2 Refinement Techniques of Localization Maps 18 3 Learning with Image-Level Class Labels 22 3.1 Introduction 22 3.2 Related Work 23 3.2.1 FickleNet: Stochastic Inference Approach 23 3.2.2 Other Recent Approaches 26 3.3 Anti-Adversarially Manipulated Attribution 28 3.3.1 Adversarial Attack 28 3.3.2 Proposed Method 29 3.3.3 Experiments 33 3.3.4 Discussion 36 3.3.5 Analysis of Results by Class 42 3.4 Reducing Information Bottleneck 46 3.4.1 Information Bottleneck 46 3.4.2 Motivation 47 3.4.3 Proposed Method 49 3.4.4 Experiments 52 3.5 Summary 60 4 Learning with Auxiliary Data 62 4.1 Introduction 62 4.2 Related Work 65 4.3 Methods 66 4.3.1 Collecting the Hard Out-of-Distribution Data 67 4.3.2 Learning with the Hard Out-of-Distribution Data 69 4.3.3 Training Segmentation Networks 71 4.4 Experiments 73 4.4.1 Experimental Setup 73 4.4.2 Experimental Results 73 4.4.3 Analysis and Discussion 76 4.5 Analysis of OoD Collection Process 81 4.6 Integrating Proposed Methods 82 4.7 Summary 83 5 Learning with Bounding Box Labels 85 5.1 Introduction 85 5.2 Related Work 87 5.3 Methods 89 5.3.1 Revisiting Object Detectors 89 5.3.2 Bounding Box Attribution Map 90 5.3.3 Training the Segmentation Network 91 5.4 Experiments 93 5.4.1 Experimental Setup 93 5.4.2 Weakly Supervised Instance Segmentation 94 5.4.3 Weakly Supervised Semantic Segmentation 96 5.4.4 Ablation Study 98 5.5 Detailed Analysis of the BBAM 100 5.6 Summary 104 6 Conclusion 105 6.1 Dissertation Summary 105 6.2 Limitations and Future Direction 107 Abstract (In Korean) 133๋ฐ•

    River Ice Segmentation under a Limited Compute and Annotation Budget

    Get PDF
    River ice segmentation, used to differentiate ice and water, can give valuable information regarding ice cover and ice distribution. These are important factors when evaluating flooding risks caused by ice jams that may harm local ecosystems and infrastructure. Furthermore, discriminating specifically between anchor ice and frazil ice is important in understanding sediment transport and release events that can affect geomorphology and cause landslide risks. Modern deep learning techniques have proved to deliver promising segmentation results; however, they can require hours of expensive manual image labelling, can show poor generalization ability, and can be inefficient when hardware and computing power are limited. As river ice images are often collected in remote locations by unmanned aerial vehicles with limited computation power, we explore the performance-latency trade-offs for river ice segmentation. We propose a novel convolution block inspired by both depthwise separable convolutions and local binary convolutions giving additional efficiency, parameter savings, and generalization ability to river ice segmentation networks. Our novel convolution block is used in a shallow architecture that has 99.9% fewer trainable parameters, 99% fewer multiply-add operations, and 69.8% less memory usage than a UNet, while achieving virtually the same segmentation performance. We find that this network trains fast and is able to achieve high segmentation performance early in training due to an emphasis on both pixel intensity and texture. When compared to very efficient segmentation networks such as LR-ASPP with a MobileNetV3 backbone, we achieve good performance (mIoU of 64) 91% faster during training on a CPU and and an overall mIoU that is 7.7% higher. We also find that our novel convolution block is able to generalize better to new domains such as snowy environments or datasets with varying illumination. Diving deeper into river ice segmentation with resource constraints, we take on a separate task of training a segmentation model when labelling time is limited. As the ice type, environment, and image quality can vary drastically between rivers of interest, training new segmentation models for new environments can be infeasible due to the laborious task of pixel-wise annotation. We explore a point labelling method leveraging object proposals and a post processing technique that delivers a 14.6% increase in mIoU as compared to a fully supervised UNet with the same labelling budget. Our point labelling method also achieves a mIoU that is only 6.3% lower than a fully supervised model with a annotation budget that is 23x larger
    corecore