284 research outputs found

    Incremental Learning Techniques for Part-Based Semantic Segmentation

    Get PDF
    In this work we use an Incremental Learning approach to try to develop a model for the Part-Based Semantic Segmentation. Our framework uses two networks where the second one is fed with the output of the first. The first network performs the Semantic Segmentation task with 21 classes from the Pascal-VOC, while the second one performs the Part-Based Semantic Segmentation task with the Part-Based Semantic Segmentation with 108 classes/parts on the Pascal-Part dataset

    Deep Adversarial Attention Alignment for Unsupervised Domain Adaptation: The Benefit of Target Expectation Maximization

    Full text link
    ยฉ 2018, Springer Nature Switzerland AG. In this paper, we make two contributions to unsupervised domain adaptation (UDA) using the convolutional neural network (CNN). First, our approach transfers knowledge in all the convolutional layers through attention alignment. Most previous methods align high-level representations, e.g., activations of the fully connected (FC) layers. In these methods, however, the convolutional layers which underpin critical low-level domain knowledge cannot be updated directly towards reducing domain discrepancy. Specifically, we assume that the discriminative regions in an image are relatively invariant to image style changes. Based on this assumption, we propose an attention alignment scheme on all the target convolutional layers to uncover the knowledge shared by the source domain. Second, we estimate the posterior label distribution of the unlabeled data for target network training. Previous methods, which iteratively update the pseudo labels by the target network and refine the target network by the updated pseudo labels, are vulnerable to label estimation errors. Instead, our approach uses category distribution to calculate the cross-entropy loss for training, thereby ameliorating the error accumulation of the estimated labels. The two contributions allow our approach to outperform the state-of-the-art methods by +2.6% on the Office-31 dataset

    Does a Neural Network Really Encode Symbolic Concepts?

    Full text link
    Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition

    ๋ณ€ํ˜•๋œ FusionNet์„ ์ด์šฉํ•œ ํšŒ์ƒ‰์กฐ ์ด๋ฏธ์ง€์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ์ฑ„์ƒ‰

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ๊ณ„์‚ฐ๊ณผํ•™์ „๊ณต, 2021. 2. ๊ฐ•๋ช…์ฃผ.In this paper, we propose a grayscale image colorizing technique. The colorization task can be divided into three main ways, the Scribble-based method, Exemplar-based method and Fully automatic method. Our proposed method is included in the third one. We use a deep learning model that is widely used in the colorization eld recently. We propose Encoder-Docoder model using Convolutional Neural Networks. In particular, we modify the FusionNet with good performance to suit this purpose. Also, in order to get better results, we do not use MSE loss function. Instead, we use the loss function suitable for the colorizing purpose. We use a subset of the ImageNet dataset as the training, validation and test dataset. We take some existing methods from Fully automatic Deep Learning method and compared them with our models. Our algorithm is evaluated using a quantitative metric called PSNR (Peak Signal-to-Noise Ratio). In addition, in order to evaluate the results qualitatively, our model was applied to the test dataset and compared with various other models. Our model has better performance both quantitatively and qualitatively than other models. Finally, we apply our model to old black and white photographs.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํšŒ์ƒ‰์กฐ ์ด๋ฏธ์ง€๋“ค์— ๋Œ€ํ•œ ์ฑ„์ƒ‰ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฑ„์ƒ‰ ์ž‘์—…์€ ํฌ๊ฒŒ Scribble ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•, Exemplar ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•, ์™„์ „ ์ž๋™ ๋ฐฉ๋ฒ•์˜ ์„ธ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์„ธ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ์ตœ๊ทผ์— ์ฑ„์ƒ‰ ๋ถ„์•ผ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค. Convolutional Neural Networks๋ฅผ ์ด์šฉํ•œ Encoder-Docoder ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ ๊ธฐ์กด์— image segmetation ๋ถ„์•ผ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” FusionNet์„ ์ž๋™ ์ฑ„์ƒ‰ ๋ชฉ์ ์— ๋งž๊ฒŒ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์ˆ˜์ •ํ–ˆ๋‹ค. ๋˜ํ•œ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด MSE ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•˜๋‹ค. ๋Œ€์‹ , ์šฐ๋ฆฌ๋Š” ์ž๋™ ์ฑ„์ƒ‰ ๋ชฉ์ ์— ์ ํ•ฉํ•œ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ImageNet ๋ฐ์ดํ„ฐ์…‹์˜ ๋ถ€๋ถ„ ์ง‘ํ•ฉ์„ ํ›ˆ๋ จ, ๊ฒ€์ฆ ๋ฐ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์‚ฌ์šฉํ–ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์™„์ „ ์ž๋™ ๋”ฅ ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•์—์„œ ๊ธฐ์กด ๋ฐฉ๋ฒ•์„ ๊ฐ€์ ธ์™€ ์šฐ๋ฆฌ์˜ ๋ชจ๋ธ๊ณผ ๋น„๊ตํ–ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ PSNR (Peak Signal-to-Noise Ratio)์ด๋ผ๋Š” ์ •๋Ÿ‰์  ์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ‰๊ฐ€๋˜์—ˆ๋‹ค. ๋˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ •์„ฑ์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์— ๋ชจ๋ธ์„ ์ ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ๊ณผ ๋น„๊ตํ–ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ ๋‹ค๋ฅธ ๋ชจ๋ธ์— ๋น„ํ•ด ์ •์„ฑ์ ์œผ๋กœ๋„, ์ •๋Ÿ‰์ ์œผ๋กœ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์˜ค๋ž˜๋œ ํ‘๋ฐฑ ์‚ฌ์ง„๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ์ด๋ฏธ์ง€์— ์ ์šฉํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ œ์‹œํ–ˆ๋‹ค.Abstract i 1 Introduction 1 2 Related Works 4 2.1 Scribble-based method . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Exemplar-based method . . . . . . . . . . . . . . . . . . . . . 5 2.3 Fully automatic method . . . . . . . . . . . . . . . . . . . . . 6 3 Proposed Method 8 3.1 Method Overview . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Architecture detail . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3.1 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.2 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.3 Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 Experiments 14 4.1 CIE Lab Color Space . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . 17 4.4 Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . 18 4.5 Legacy Old image Colorization . . . . . . . . . . . . . . . . . . 20 5 Conclusion 23 The bibliography 24 Abstract (in Korean) 28Maste

    Deep learning for remote sensing image classification:A survey

    Get PDF
    Remote sensing (RS) image classification plays an important role in the earth observation technology using RS data, having been widely exploited in both military and civil fields. However, due to the characteristics of RS data such as high dimensionality and relatively small amounts of labeled samples available, performing RS image classification faces great scientific and practical challenges. In recent years, as new deep learning (DL) techniques emerge, approaches to RS image classification with DL have achieved significant breakthroughs, offering novel opportunities for the research and development of RS image classification. In this paper, a brief overview of typical DL models is presented first. This is followed by a systematic review of pixel?wise and scene?wise RS image classification approaches that are based on the use of DL. A comparative analysis regarding the performances of typical DL?based RS methods is also provided. Finally, the challenges and potential directions for further research are discussedpublishersversionPeer reviewe

    Short-term motion prediction of autonomous vehicles in complex environments: A Deep Learning approach

    Get PDF
    Complex environments manifest a high level of complexity and it is of critical importance that the safety systems embedded within autonomous vehicles (AVs) are able to accurately anticipate short-term future motion of agents in close proximity. This problem can be further understood as generating a sequence of coordinates describing the plausible future motion of the tracked agent. Number of recently proposed techniques that present satisfactory performance exploit the learning capabilities of novel deep learning (DL) architectures to tackle the discussed task. Nonetheless, there still exists a vast number of challenging issues that must be resolved to further advance capabilities of motion prediction models.This thesis explores novel deep learning techniques within the area of short-term motion prediction of on-road participants, specifically other vehicles from a points of autonomous vehicles. First and foremost, various approaches in the literature demonstrate significant benefits of using a rasterised top-down image of the road to encode the context of tracked vehicleโ€™s surroundings which generally encapsulates a large, global portion of the environment. This work on the other hand explores a use of local regions of the rasterised map to more explicitly focus on the encoding of the tracked vehicleโ€™s state. The proposed technique demonstrates plausible results against several baseline models and in addition outperforms the same model that instead uses global maps. Next, the typical method for extracting features from rasterised maps involves employing one of the popular vision models (e.g. ResNet-50) that has been previously pre-trained on a distinct task such as image classification. Recently however, it has been demonstrated that this approach can be sub-optimal for tasks that strongly rely on precise localisation of features and it can be more advantageous to train the model from scratch directly on the task at hand. In contrast, the subsequent part of this thesis investigates an alternative method for processing and encoding of spatial data based on the capsule networks in order to eradicate several issues that standard vision models exhibit. Through several experiments it is established that the novel capsule based motion predictor that is trained from scratch is able to achieve competitive results against numerous popular vision models. Finally, the proposed model is further extended with the use of generative framework to account for the fact that the space of possible movements of the tracked vehicle is not strictly limited to single trajectory. More specifically, to account for the multi-modality of the problem a conditional variational auto-encoder (CVAE) is employed which enables to sample an arbitrary amount of diverse trajectories. The final model is examined against methods from literature on a publicly available dataset and as presented it significantly outperforms other models whilst drastically reducing the number of trainable parameters

    Logging Trail Segmentation via a Novel U-Net Convolutional Neural Network and High-Density Laser Scanning Data

    Get PDF
    Logging trails are one of the main components of modern forestry. However, spotting the accurate locations of old logging trails through common approaches is challenging and time consuming. This study was established to develop an approach, using cutting-edge deep-learning convolutional neural networks and high-density laser scanning data, to detect logging trails in different stages of commercial thinning, in Southern Finland. We constructed a U-Net architecture, consisting of encoder and decoder paths with several convolutional layers, pooling and non-linear operations. The canopy height model (CHM), digital surface model (DSM), and digital elevation models (DEMs) were derived from the laser scanning data and were used as image datasets for training the model. The labeled dataset for the logging trails was generated from different references as well. Three forest areas were selected to test the efficiency of the algorithm that was developed for detecting logging trails. We designed 21 routes, including 390 samples of the logging trails and non-logging trails, covering all logging trails inside the stands. The results indicated that the trained U-Net using DSM (k = 0.846 and IoU = 0.867) shows superior performance over the trained model using CHM (k = 0.734 and IoU = 0.782), DEMavg (k = 0.542 and IoU = 0.667), and DEMmin (k = 0.136 and IoU = 0.155) in distinguishing logging trails from non-logging trails. Although the efficiency of the developed approach in young and mature stands that had undergone the commercial thinning is approximately perfect, it needs to be improved in old stands that have not received the second or third commercial thinning

    Logging Trail Segmentation via a Novel U-Net Convolutional Neural Network and High-Density Laser Scanning Data

    Get PDF
    Logging trails are one of the main components of modern forestry. However, spotting the accurate locations of old logging trails through common approaches is challenging and time consuming. This study was established to develop an approach, using cutting-edge deep-learning convolutional neural networks and high-density laser scanning data, to detect logging trails in different stages of commercial thinning, in Southern Finland. We constructed a U-Net architecture, consisting of encoder and decoder paths with several convolutional layers, pooling and non-linear operations. The canopy height model (CHM), digital surface model (DSM), and digital elevation models (DEMs) were derived from the laser scanning data and were used as image datasets for training the model. The labeled dataset for the logging trails was generated from different references as well. Three forest areas were selected to test the efficiency of the algorithm that was developed for detecting logging trails. We designed 21 routes, including 390 samples of the logging trails and non-logging trails, covering all logging trails inside the stands. The results indicated that the trained U-Net using DSM (k = 0.846 and IoU = 0.867) shows superior performance over the trained model using CHM (k = 0.734 and IoU = 0.782), DEMavg (k = 0.542 and IoU = 0.667), and DEMmin (k = 0.136 and IoU = 0.155) in distinguishing logging trails from non-logging trails. Although the efficiency of the developed approach in young and mature stands that had undergone the commercial thinning is approximately perfect, it needs to be improved in old stands that have not received the second or third commercial thinning
    • โ€ฆ
    corecore