181 research outputs found

    Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

    Get PDF
    In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence is reconstructed from the selected region, where the reconstruction difference serves as the loss for back-propagation. The existing methods, however, conduct both the matching and the reconstruction approximately as they ignore the fact that the matching correctness is unknown. To overcome this limitation, a discriminative triad is designed here as the basis to the solution, through which a query can be converted into one or multiple discriminative triads in a very scalable way. Based on the discriminative triad, we further propose the triad-level matching and reconstruction modules which are lightweight yet effective for the weakly-supervised training, making it three times lighter and faster than the previous state-of-the-art methods. One important merit of our work is its superior performance despite the simple and neat design. Specifically, the proposed method achieves a new state-of-the-art accuracy when evaluated on RefCOCO (39.21%), RefCOCO+ (39.18%) and RefCOCOg (43.24%) datasets, that is 4.17%, 4.08% and 7.8% higher than the previous one, respectively.Comment: TPAM

    EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones

    Full text link
    The superior performance of modern deep networks usually comes with a costly training procedure. This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers). Our work is inspired by the inherent learning dynamics of deep networks: we experimentally show that at an earlier training stage, the model mainly learns to recognize some 'easier-to-learn' discriminative patterns within each example, e.g., the lower-frequency components of images and the original information before data augmentation. Driven by this phenomenon, we propose a curriculum where the model always leverages all the training data at each epoch, while the curriculum starts with only exposing the 'easier-to-learn' patterns of each example, and introduces gradually more difficult patterns. To implement this idea, we 1) introduce a cropping operation in the Fourier spectrum of the inputs, which enables the model to learn from only the lower-frequency components efficiently, 2) demonstrate that exposing the features of original images amounts to adopting weaker data augmentation, and 3) integrate 1) and 2) and design a curriculum learning schedule with a greedy-search algorithm. The resulting approach, EfficientTrain, is simple, general, yet surprisingly effective. As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, and CSWin) by >1.5x on ImageNet-1K/22K without sacrificing accuracy. It is also effective for self-supervised learning (e.g., MAE). Code is available at https://github.com/LeapLabTHU/EfficientTrain.Comment: ICCV 202

    ํฌํ† ๋ฆฌ์†Œ๊ทธ๋ž˜ํ”ผ ๊ฒ€์‚ฌ ์‹œ์Šคํ…œ์˜ ์ด๋ฏธ์ง€ ๋ถ„ํ• ์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๊นŠ์€ ์•„ํ‚คํ…์ฒ˜

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(์ง€๋Šฅํ˜•์œตํ•ฉ์‹œ์Šคํ…œ์ „๊ณต), 2021.8. ํ™์„ฑ์ˆ˜.In semiconductor manufacturing, defect detection is critical to maintain high yield. Typically, the defects of semiconductor wafer may be generated from the manufacturing process. Most computer vision systems used in semiconductor photolithography process inspection still have adopt to image processing algorithm, which often occur inspection faults due to sensitivity to external environment changes. Therefore, we intend to tackle this problem by means of converging the advantages of image processing algorithm and deep learning. In this dissertation, we propose Image Segmentation Detector (ISD) to extract the enhanced feature-maps under the situations where training dataset is limited in the specific industry domain, such as semiconductor photolithography inspection. ISD is used as a novel backbone network of state-of-the-art Mask R-CNN framework for image segmentation. ISD consists of four dense blocks and four transition layers. Especially, each dense block in ISD has the shortcut connection and the concatenation of the feature-maps produced in layer with dynamic growth rate for more compactness. ISD is trained from scratch without using recently approached transfer learning method. Additionally, ISD is trained with image dataset pre-processed by means of our designed image filter to extract the better enhanced feature map of Convolutional Neural Network (CNN). In ISD, one of the key design principles is the compactness, plays a critical role for addressing real-time problem and for application on resource bounded devices. To empirically demonstrate the model, this dissertation uses the existing image obtained from the computer vision system embedded in the currently operating semiconductor manufacturing equipment. ISD achieves consistently better results than state-of-the-art methods at the standard mean average precision which is the most common metric used to measure the accuracy of the instance detection. Significantly, our ISD outperforms baseline method DenseNet, while requiring only 1/4 parameters. We also observe that ISD can achieve comparable better results in performance than ResNet, with only much smaller 1/268 parameters, using no extra data or pre-trained models. Our experimental results show that ISD can be useful to many future image segmentation research efforts in diverse fields of semiconductor industry which is requiring real-time and good performance with only limited training dataset.๋ฐ˜๋„์ฒด ์ œ์กฐ์—์„œ ๊ฒฐํ•จ ๊ฒ€์ถœ์€ ๋†’์€ ์ˆ˜์œจ์„ ์œ ์ง€ํ•˜๋Š”๋ฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์ „ํ˜•์ ์œผ๋กœ, ๋ฐ˜๋„์ฒด ์›จ์ดํผ์˜ ๊ฒฐํ•จ์€ ์ œ์กฐ ๊ณต์ •์—์„œ ๋ฐœ์ƒํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋„์ฒด ํฌํ† ๋ฆฌ์†Œ๊ทธ๋ž˜ํ”ผ ๊ณต์ • ๊ฒ€์‚ฌ์— ์‚ฌ์šฉ๋˜๋Š” ๋Œ€๋ถ€๋ถ„์˜ ์ปดํ“จํ„ฐ ๋น„์ „ ์‹œ์Šคํ…œ๋“ค์€ ์—ฌ์ „ํžˆ ์™ธ๋ถ€ ํ™˜๊ฒฝ ๋ณ€ํ™”์— ๋ฏผ๊ฐํ•œ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์–ด์„œ ๊ฒ€์‚ฌ ์˜ค๋ฅ˜๊ฐ€ ์ž์ฃผ ๋ฐœ์ƒํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์žฅ์ ๊ณผ ๋”ฅ ๋Ÿฌ๋‹์˜ ์žฅ์ ์„ ์œตํ•ฉํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ ์šฐ๋ฆฌ๋Š” ๋ฐ˜๋„์ฒด ํฌํ† ๋ฆฌ์†Œ๊ทธ๋ž˜ํ”ผ ๊ฒ€์‚ฌ์™€ ๊ฐ™์ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ์ œํ•œ๋œ ์ƒํ™ฉ์—์„œ ํ–ฅ์ƒ๋œ ๊ธฐ๋Šฅ ๋งต์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€ ๋ถ„ํ•  ๊ฒ€์ถœ๊ธฐ(Image Segmentation Detector, ์ดํ•˜ ISD)๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ISD๋Š” ์ด๋ฏธ์ง€ ๋ถ„ํ• ์„ ์œ„ํ•œ ์ตœ์‹  Mask R-CNN ํ”„๋ ˆ์ž„ ์›Œํฌ์˜ ์ƒˆ๋กœ์šด ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ISD๋Š” 4 ๊ฐœ์˜ ์กฐ๋ฐ€ํ•œ ๋ธ”๋ก๊ณผ 4 ๊ฐœ์˜ ์ „ํ™˜ ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ISD์˜ ๊ฐ ์กฐ๋ฐ€ํ•œ ๋ธ”๋ก์€ ๋ณด๋‹ค ์ปดํŒฉํŠธํ•จ์„ ์œ„ํ•ด ๋‹จ์ถ• ์—ฐ๊ฒฐ ๋ฐ ๋™์  ์„ฑ์žฅ๋ฅ ์„ ๊ฐ€์ง€๊ณ  ๋ ˆ์ด์–ด์—์„œ ์ƒ์„ฑ๋œ ํ”ผ์ณ ๋งต์„ ๊ฒฐํ•ฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ISD๋Š” ์ตœ๊ทผ ์ ์šฉํ•˜๊ณ  ์žˆ๋Š” ์ „์ด ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ์ฒ˜์Œ๋ถ€ํ„ฐ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ISD๋Š” ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(Convolutional Neural Network, ์ดํ•˜ CNN)์˜ ํ–ฅ์ƒ๋œ ๊ธฐ๋Šฅ ๋งต์„ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ๊ฐ€ ์„ค๊ณ„ํ•œ ์ด๋ฏธ์ง€ ํ•„ํ„ฐ๋ฅผ ํ†ตํ•ด ์‚ฌ์ „ ์ฒ˜๋ฆฌ๋œ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋กœ ํ›ˆ๋ จ์„ ํ•ฉ๋‹ˆ๋‹ค. ISD์˜ ์„ค๊ณ„ ํ•ต์‹ฌ ์›์น™ ์ค‘ ํ•˜๋‚˜๋Š” ์†Œํ˜•ํ™”๋กœ ์‹ค์‹œ๊ฐ„ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ๋ฆฌ์†Œ์Šค์— ์ œํ•œ์ด ์žˆ๋Š” ์žฅ์น˜์— ์ ์šฉํ•˜๋Š”๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์„ ์‹ค์ฆ์ ์œผ๋กœ ์ž…์ฆํ•˜๊ธฐ ์œ„ํ•ด ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ํ˜„์žฌ ์šด์˜ ์ค‘์ธ ๋ฐ˜๋„์ฒด ์ œ์กฐ ์žฅ๋น„์— ๋‚ด์žฅ๋œ ์ปดํ“จํ„ฐ ๋น„์ „ ์‹œ์Šคํ…œ์—์„œ ํš๋“ํ•œ ์‹ค์ œ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ISD๋Š” ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ์„ฑ๋Šฅ ์ธก์ • ์ง€ํ‘œ์ธ ํ‰๊ท  ์ •๋ฐ€๋„์—์„œ ์ตœ์ฒจ๋‹จ ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ ๋ณด๋‹ค ์ผ๊ด€๋˜๊ฒŒ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์–ป์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ISD๋Š” ๋ฒ ์ด์Šค ๋ผ์ธ์œผ๋กœ ์‚ผ์€ DenseNet ๋ณด๋‹ค ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์ด 4๋ฐฐ ๋” ์ ์ง€๋งŒ, ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋˜ํ•œ ISD๊ฐ€ Mask R-CNN ๋ฐฑ๋ณธ ๋„คํŠธ์›Œํฌ๋กœ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ResNet ๋ณด๋‹ค 268๋ฐฐ ํ›จ์”ฌ ๋” ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ๊ฐ€์ง€๊ณ , ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ ๋˜๋Š” ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ , ์„ฑ๋Šฅ์—์„œ ๋น„์Šทํ•˜๊ฑฐ๋‚˜ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ๊ด€์ฐฐํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ISD๊ฐ€ ์ œํ•œ๋œ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋งŒ์œผ๋กœ ์‹ค์‹œ๊ฐ„ ๋ฐ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ์š”๊ตฌํ•˜๋Š” ๋ฐ˜๋„์ฒด ์‚ฐ์—…์˜ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ๋“ค์—์„œ ๋งŽ์€ ๋ฏธ๋ž˜์˜ ์ด๋ฏธ์ง€ ๋ถ„ํ•  ์—ฐ๊ตฌ ๋…ธ๋ ฅ์— ์œ ์šฉํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.Chapter 1. Introduction ๏ผ‘ 1.1. Background and Motivation ๏ผ” Chapter 2. Related Work ๏ผ‘๏ผ’ 2.1. Inspection Method ๏ผ‘๏ผ’ 2.2. Instance Segmentation ๏ผ‘๏ผ– 2.3. Backbone Structure ๏ผ’๏ผ” 2.4. Enhanced Feature Map ๏ผ“๏ผ• 2.5. Detection Performance Evaluation ๏ผ”๏ผ— 2.6. Learning Network Model from Scratch ๏ผ•๏ผ Chapter 3. Proposed Method ๏ผ•๏ผ’ 3.1. ISD Architecture ๏ผ•๏ผ’ 3.2. Pre-processing ๏ผ–๏ผ“ 3.3. Model Training ๏ผ—๏ผ‘ 3.4. Training Objective ๏ผ—๏ผ“ 3.5. Setting and Configurations ๏ผ—๏ผ• Chapter 4. Experimental Evaluation ๏ผ—๏ผ˜ 4.1. Classification Results on ISD ๏ผ˜๏ผ‘ 4.2. Comparison with Pre-processing ๏ผ˜๏ผ• 4.3. Image Segmentation Results on ISD ๏ผ™๏ผ” 4.3.1. Results on Suck-back State ๏ผ™๏ผ” 4.3.2. Results on Dispensing State ๏ผ‘๏ผ๏ผ” 4.4. Comparison with State-of-the-art Methods ๏ผ‘๏ผ‘๏ผ“ Chapter 5. Conclusion ๏ผ‘๏ผ’๏ผ‘ Bibliography ๏ผ‘๏ผ’๏ผ— ์ดˆ๋ก ๏ผ‘๏ผ”๏ผ–๋ฐ•

    Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

    Get PDF
    The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving

    Learning with delayed reinforcement in an exploratory probabilistic logic neural network

    Get PDF
    Imperial Users onl
    • โ€ฆ
    corecore