Search CORE

181 research outputs found

Deep learning for semantic segmentation of airborne laser scanning point clouds

Author: Lin Yaping
Publication venue: University of Twente
Publication date: 01/01/2022
Field of study

University of Twente Research Information

Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

Author: Goulermas John Y
Lim Eng Gee
Liu Si
Sun Mingjie
Xiao Jimin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/06/2021
Field of study

In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence is reconstructed from the selected region, where the reconstruction difference serves as the loss for back-propagation. The existing methods, however, conduct both the matching and the reconstruction approximately as they ignore the fact that the matching correctness is unknown. To overcome this limitation, a discriminative triad is designed here as the basis to the solution, through which a query can be converted into one or multiple discriminative triads in a very scalable way. Based on the discriminative triad, we further propose the triad-level matching and reconstruction modules which are lightweight yet effective for the weakly-supervised training, making it three times lighter and faster than the previous state-of-the-art methods. One important merit of our work is its superior performance despite the simple and neat design. Specifically, the proposed method achieves a new state-of-the-art accuracy when evaluated on RefCOCO (39.21%), RefCOCO+ (39.18%) and RefCOCOg (43.24%) datasets, that is 4.17%, 4.08% and 7.8% higher than the previous one, respectively.Comment: TPAM

arXiv.org e-Print Archive

University of Liverpool Repository

EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones

Author: Huang Gao
Liu Tianjiao
Lu Rui
Song Shiji
Wang Yulin
Yue Yang
Zhong Zhao
Publication venue
Publication date: 16/08/2023
Field of study

The superior performance of modern deep networks usually comes with a costly training procedure. This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers). Our work is inspired by the inherent learning dynamics of deep networks: we experimentally show that at an earlier training stage, the model mainly learns to recognize some 'easier-to-learn' discriminative patterns within each example, e.g., the lower-frequency components of images and the original information before data augmentation. Driven by this phenomenon, we propose a curriculum where the model always leverages all the training data at each epoch, while the curriculum starts with only exposing the 'easier-to-learn' patterns of each example, and introduces gradually more difficult patterns. To implement this idea, we 1) introduce a cropping operation in the Fourier spectrum of the inputs, which enables the model to learn from only the lower-frequency components efficiently, 2) demonstrate that exposing the features of original images amounts to adopting weaker data augmentation, and 3) integrate 1) and 2) and design a curriculum learning schedule with a greedy-search algorithm. The resulting approach, EfficientTrain, is simple, general, yet surprisingly effective. As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models (e.g., ResNet, ConvNeXt, DeiT, PVT, Swin, and CSWin) by >1.5x on ImageNet-1K/22K without sacrificing accuracy. It is also effective for self-supervised learning (e.g., MAE). Code is available at https://github.com/LeapLabTHU/EfficientTrain.Comment: ICCV 202

arXiv.org e-Print Archive

Recommended from our members

Understanding of Visual Domains via the Lens of Natural Language

Author: Wu Chenyun
Publication venue: ScholarWorks@UMass Amherst
Publication date: 21/10/2021
Field of study

A joint understanding of vision and language can enable intelligent systems to perceive, act, and communicate with humans for a wide range of applications. For example, they can assist a human to navigate in an environment, edit the content of an image through natural language commands, or search through image collections using natural language queries. In this thesis, we aim to improve our understanding of visual domains through the lens of natural language. We specifically look into (1) images of categories within a fine-grained taxonomy such as species of birds or variants of aircraft, (2) images of textures that describe local color, shape, and patterns, and (3) regions in images that correspond to objects, materials, and textures. In one line of work, we investigate ways to discover a domain-specific language by asking annotators to describe visual differences between instances within a fine-grained taxonomy. We show that a system trained to describe these differences leads to an accurate and interpretable basis for categorization. In another line of work, we investigate the effectiveness of language and vision models for describing textures, a problem that, despite the ubiquity of textures, has not been sufficiently studied in the literature. Textures are diverse, yet their local nature allows for the description of appearance of a wide range of visual categories. The locality also allows us to systematically generate synthetic variations to investigate how disentangled visual representations are for properties such as shape, color, and figure-ground segmentation. Finally, instead of modeling an image as a whole, we design a system that allows descriptions of regions within an image. A challenge is to handle the long-tail distribution of names and appearances of concepts within natural scenes. We design a modular framework that integrates object detection, semantic segmentation, and contextual reasoning with language that leads to better performance. In addition to methods and analysis, we contribute datasets and benchmarks to evaluate the performance of models in each of these domains. The availability of large-scale pre-trained models for vision (e.g., ResNet) and language (e.g., BERT) have catalyzed improvements and novel applications in computer vision and natural language processing, but until recently similar models that could jointly reason about language and vision were not available. This has changed through the availability of models such as CLIP, which have been trained on a massive number of images with associated texts. Therefore, we analyze the effectiveness of CLIP-based representations for tasks posed in our earlier work. By comparing and contrasting these with domain-specific ones we presented in the earlier chapters, we shed some light on the nature of the learned representations and the biases they encode

ScholarWorks@UMass Amherst

포토리소그래피 검사 시스템의 이미지 분할을 위한 새로운 깊은 아키텍처

Author: 한정희
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 융합과학부(지능형융합시스템전공), 2021.8. 홍성수.In semiconductor manufacturing, defect detection is critical to maintain high yield. Typically, the defects of semiconductor wafer may be generated from the manufacturing process. Most computer vision systems used in semiconductor photolithography process inspection still have adopt to image processing algorithm, which often occur inspection faults due to sensitivity to external environment changes. Therefore, we intend to tackle this problem by means of converging the advantages of image processing algorithm and deep learning. In this dissertation, we propose Image Segmentation Detector (ISD) to extract the enhanced feature-maps under the situations where training dataset is limited in the specific industry domain, such as semiconductor photolithography inspection. ISD is used as a novel backbone network of state-of-the-art Mask R-CNN framework for image segmentation. ISD consists of four dense blocks and four transition layers. Especially, each dense block in ISD has the shortcut connection and the concatenation of the feature-maps produced in layer with dynamic growth rate for more compactness. ISD is trained from scratch without using recently approached transfer learning method. Additionally, ISD is trained with image dataset pre-processed by means of our designed image filter to extract the better enhanced feature map of Convolutional Neural Network (CNN). In ISD, one of the key design principles is the compactness, plays a critical role for addressing real-time problem and for application on resource bounded devices. To empirically demonstrate the model, this dissertation uses the existing image obtained from the computer vision system embedded in the currently operating semiconductor manufacturing equipment. ISD achieves consistently better results than state-of-the-art methods at the standard mean average precision which is the most common metric used to measure the accuracy of the instance detection. Significantly, our ISD outperforms baseline method DenseNet, while requiring only 1/4 parameters. We also observe that ISD can achieve comparable better results in performance than ResNet, with only much smaller 1/268 parameters, using no extra data or pre-trained models. Our experimental results show that ISD can be useful to many future image segmentation research efforts in diverse fields of semiconductor industry which is requiring real-time and good performance with only limited training dataset.반도체 제조에서 결함 검출은 높은 수율을 유지하는데 중요합니다. 전형적으로, 반도체 웨이퍼의 결함은 제조 공정에서 발생하고 있습니다. 반도체 포토리소그래피 공정 검사에 사용되는 대부분의 컴퓨터 비전 시스템들은 여전히 외부 환경 변화에 민감한 이미지 처리 알고리즘을 사용하고 있어서 검사 오류가 자주 발생하고 있습니다. 따라서, 이미지 처리 알고리즘의 장점과 딥 러닝의 장점을 융합하여 이 문제를 해결하려고 합니다. 이 논문에서 우리는 반도체 포토리소그래피 검사와 같이 훈련 데이터 세트가 제한된 상황에서 향상된 기능 맵을 추출하기 위해 이미지 분할 검출기(Image Segmentation Detector, 이하 ISD)를 제안합니다. ISD는 이미지 분할을 위한 최신 Mask R-CNN 프레임 워크의 새로운 백본 네트워크로 사용합니다. ISD는 4 개의 조밀한 블록과 4 개의 전환 레이어로 구성합니다. 특히, ISD의 각 조밀한 블록은 보다 컴팩트함을 위해 단축 연결 및 동적 성장률을 가지고 레이어에서 생성된 피쳐 맵을 결합하고 있습니다. ISD는 최근 적용하고 있는 전이 학습 방법을 사용하지 않고 처음부터 훈련합니다. 또한, ISD는 합성곱 신경망(Convolutional Neural Network, 이하 CNN)의 향상된 기능 맵을 추출하기 위해 우리가 설계한 이미지 필터를 통해 사전 처리된 이미지 데이터 세트로 훈련을 합니다. ISD의 설계 핵심 원칙 중 하나는 소형화로 실시간 문제를 해결하고 리소스에 제한이 있는 장치에 적용하는데 중요한 역할을 하게 합니다. 모델을 실증적으로 입증하기 위해 이 논문에서는 현재 운영 중인 반도체 제조 장비에 내장된 컴퓨터 비전 시스템에서 획득한 실제 이미지를 사용합니다. ISD는 가장 일반적인 성능 측정 지표인 평균 정밀도에서 최첨단 백본 네트워크 보다 일관되게 더 나은 성능을 얻습니다. 특히, ISD는 베이스 라인으로 삼은 DenseNet 보다 파라미터들이 4배 더 적지만, 성능이 우수 합니다. 우리는 또한 ISD가 Mask R-CNN 백본 네트워크로 주로 사용하는 ResNet 보다 268배 훨씬 더 적은 파라미터들을 가지고, 추가 데이터 또는 사전 훈련된 모델을 사용하지 않고, 성능에서 비슷하거나 더 나은 결과를 얻을 수 있음을 관찰합니다. 우리의 실험 결과들은 ISD가 제한된 훈련 데이터 세트만으로 실시간 및 우수한 성능을 요구하는 반도체 산업의 다양한 분야들에서 많은 미래의 이미지 분할 연구 노력에 유용할 수 있음을 보여줍니다.Chapter 1. Introduction １ 1.1. Background and Motivation ４ Chapter 2. Related Work １２ 2.1. Inspection Method １２ 2.2. Instance Segmentation １６ 2.3. Backbone Structure ２４ 2.4. Enhanced Feature Map ３５ 2.5. Detection Performance Evaluation ４７ 2.6. Learning Network Model from Scratch ５０ Chapter 3. Proposed Method ５２ 3.1. ISD Architecture ５２ 3.2. Pre-processing ６３ 3.3. Model Training ７１ 3.4. Training Objective ７３ 3.5. Setting and Configurations ７５ Chapter 4. Experimental Evaluation ７８ 4.1. Classification Results on ISD ８１ 4.2. Comparison with Pre-processing ８５ 4.3. Image Segmentation Results on ISD ９４ 4.3.1. Results on Suck-back State ９４ 4.3.2. Results on Dispensing State １０４ 4.4. Comparison with State-of-the-art Methods １１３ Chapter 5. Conclusion １２１ Bibliography １２７ 초록 １４６박

SNU Open Repository and Archive

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Institute of Transport Research:Publications

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving

Institute of Transport Research:Publications

Learning with delayed reinforcement in an exploratory probabilistic logic neural network

Author: Myers Catherine E.
Myers Catherine E.
Publication venue: Department of Electrical Engineering, Imperial College London
Publication date: 01/01/1990
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

30th International Symposium on Theoretical Aspects of Computer Science: STACS '13, February 27th to March 2nd, 2013, Kiel, Germany

Author: STACS <30 2013, Kiel>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/02/2013
Field of study

Digitale Bibliothek Thüringen