300 research outputs found

    Deep learning for large-scale fine-grained recognition of cars

    Get PDF
    Deep learning (DL) is widely used nowadays, with several applications in image classification and object detection. Among many of these applications is the use of Convolutional Neural Networks (CNNs) whose operation is: for a given input (image) and output (label/class), generate representations that define and allow to distinguish different kinds of objects. Neural Networks are computationally demanding, taking hours to train. Convolutional Neural Networks are even more demanding since their input data are usually images – a rich data type that holds a lot of information. The fast evolution in Computer Vision, using deep learning techniques, and computing power recently allowed to train CNNs which can classify images with high precision. In car classifieds websites images are one of the most important types of content. However, until today, little knowledge/metadata is produced from such images. In order to insert an advert in the platform, the user must upload an image of the car for sale and fill a certain number of fields, among them the vehicle category, the color of the car and its respective make, model and version. In this dissertation, CNNs are used for the recognition of the make, model and version of cars where transfer learning and fine-tuning are two approaches used for transferring the knowledge learned in one task and adapting it to another. We extend the work to also validate the efficacy of these neural networks on the tasks of vehicle category and cars’ color recognition. We pretend to validate how CNNs behave in these different tasks. Approaches like background removal and data augmentation are explored for reducing overfitting. We collected one of the largest datasets to date for the task of make, model and version recognition of cars, composed of 1.2 million images belonging to 790 labels.The results obtained in the scope of this dissertation set a new state-of-the-art performance for this type of task (accuracy of 92.7% on an ensemble method) considering the number of classes to classify and the number of images used. It is demonstrated the efficacy of the recent advances in CNN architectures in fine-grained classification where intra-class variation is small and viewpoint variation is high, when a largescale dataset is used.Deep Learning (DL) é um termo cada vez mais mencionado nos dias de hoje, com vastas aplicações em classificação de imagens e detecção de objectos. Por detrás de muitas destas aplicações está a utilização de Convolutional Neural Networks (CNN) cujo funcionamento é, para um dado input (imagem) e output (nome do objecto representado/classe), produzir representações que definem e permitem distinguir vários tipos de objectos. As redes neuronais são computacionalmente exigentes e podem levar horas a ser treinadas. Convolutional Neural Networks são ainda mais exigentes visto o seu input ser, usualmente, imagens - um tipo de dados rico que contém muita informação. Com a rápida evolução do poder computacional aliada à evolução no campo de Computer Vision com recurso a CNNs é possível, somente nos últimos anos, treinar CNNs para classificação de imagens com alto nível de precisão. Em sites de classificados de carros as imagens são um dos tipos de conteúdo mais importante. Todavia até aos dias de hoje, pouco conhecimento/metadados são gerados a partir das mesmas. O utilizador tem sempre que, para inserir um anúncio na plataforma, preencher um vasto número de campos, entre eles a categoria do veículo, a cor do carro e a respectiva marca, modelo e versão, e inserir uma imagem do carro para venda. Nesta dissertação são utilizadas CNNs para o reconhecimento da marca, modelo e versão de carros em que se utiliza transfer learning e fine-tuning para transferir o conhecimento “aprendido” numa tarefa e adaptá-lo para outra. O trabalho é estendido de forma a demonstrar, também, a eficácia destas redes neuronais para as tarefas de reconhecimento da categoria do veículo e reconhecimento de cor de carros. Pretendemos validar como as CNNs se comportam nestes diferentes tipos de tarefas. Abordagens como remoção do fundo da imagem e data augmentation são utilizadas para reduzir overfitting.É obtido um dos maiores datasets para a tarefa de reconhecimento de marca, modelo e versão de carros, composto por 1,2 milhões de imagens pertencentes a 790 classes. Os resultados apresentados são dos melhores para este tipo de tarefa (precisão de 92.7% com um ensemble) considerando tanto o número de classes a classificar como o número de imagens utilizadas. Os resultados obtidos evidenciam a eficácia das arquitecturas de CNNs modernas para a classificação granular onde a variação intra-classe é reduzida e a variação da perspectiva é elevada, quando é utilizado um dataset de grandes dimensões

    Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank

    Get PDF
    For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50%.Comment: Accepted at TPAMI. (Keywords: Learning from rankings, image quality assessment, crowd counting, active learning). arXiv admin note: text overlap with arXiv:1803.0309

    Continual Learning with Pretrained Backbones by Tuning in the Input Space

    Full text link
    The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks. This issue is critical in practical supervised learning settings, such as the ones in which a pre-trained model computes projections toward a latent space where different task predictors are sequentially learned over time. As a matter of fact, incrementally fine-tuning the whole model to better adapt to new tasks usually results in catastrophic forgetting, with decreasing performance over the past experiences and losing valuable knowledge from the pre-training stage. In this paper, we propose a novel strategy to make the fine-tuning procedure more effective, by avoiding to update the pre-trained part of the network and learning not only the usual classification head, but also a set of newly-introduced learnable parameters that are responsible for transforming the input data. This process allows the network to effectively leverage the pre-training knowledge and find a good trade-off between plasticity and stability with modest computational efforts, thus especially suitable for on-the-edge settings. Our experiments on four image classification problems in a continual learning setting confirm the quality of the proposed approach when compared to several fine-tuning procedures and to popular continual learning methods

    Image Embeddings Extracted from CNNs Outperform Other Transfer Learning Approaches in Classification of Chest Radiographs

    Get PDF
    To identify the best transfer learning approach for the identification of the most frequent abnormalities on chest radiographs (CXRs), we used embeddings extracted from pretrained convolutional neural networks (CNNs). An explainable AI (XAI) model was applied to interpret black-box model predictions and assess its performance. Seven CNNs were trained on CheXpert. Three transfer learning approaches were thereafter applied to a local dataset. The classification results were ensembled using simple and entropy-weighted averaging. We applied Grad-CAM (an XAI model) to produce a saliency map. Grad-CAM maps were compared to manually extracted regions of interest, and the training time was recorded. The best transfer learning model was that which used image embeddings and random forest with simple averaging, with an average AUC of 0.856. Grad-CAM maps showed that the models focused on specific features of each CXR. CNNs pretrained on a large public dataset of medical images can be exploited as feature extractors for tasks of interest. The extracted image embeddings contain relevant information that can be used to train an additional classifier with satisfactory performance on an independent dataset, demonstrating it to be the optimal transfer learning strategy and overcoming the need for large private datasets, extensive computational resources, and long training times

    Multi-Domain Adaptation for Image Classification, Depth Estimation, and Semantic Segmentation

    Get PDF
    The appearance of scenes may change for many reasons, including the viewpoint, the time of day, the weather, and the seasons. Traditionally, deep neural networks are trained and evaluated using images from the same scene and domain to avoid the domain gap. Recent advances in domain adaptation have led to a new type of method that bridges such domain gaps and learns from multiple domains. This dissertation proposes methods for multi-domain adaptation for various computer vision tasks, including image classification, depth estimation, and semantic segmentation. The first work focuses on semi-supervised domain adaptation. I address this semi-supervised setting and propose to use dynamic feature alignment to address both inter- and intra-domain discrepancy. The second work addresses the task of monocular depth estimation in the multi-domain setting. I propose to address this task with a unified approach that includes adversarial knowledge distillation and uncertainty-guided self-supervised reconstruction. The third work considers the problem of semantic segmentation for aerial imagery with diverse environments and viewing geometries. I present CrossSeg: a novel framework that learns a semantic segmentation network that can generalize well in a cross-scene setting with only a few labeled samples. I believe this line of work can be applicable to many domain adaptation scenarios and aerial applications

    Combining simulated and real images in deep learning

    Get PDF
    To train a deep learning (DL) model, considerable amounts of data are required to generalize to unseen cases successfully. Furthermore, such data is often manually labeled, making its annotation process costly and time-consuming. We propose the use of simulated data, obtained from simulators, as a way to surpass the increasing need for annotated data. Although the use of simulated environments represents an unlimited and cost-effective supply of automatically annotated data, we are still referring to synthetic information. As such, it differs in representation and distribution comparatively to real-world data. The field which addresses the problem of merging the useful features from each of these domains is called domain adaptation (DA), a branch of transfer learning. In this field, several advances have been made, from fine-tuning existing networks to sample-reconstruction approaches. Adversarial DA methods, which make use of Generative Adversarial Networks (GANs), are state-of-the-art and the most widely used. With previous approaches, training data was being sourced from already existent datasets, and the usage of simulators as a means to obtain new observations was an alternative not fully explored. We aim to survey possible DA techniques and apply them to this context of obtaining simulated data with the purpose of training DL models. Stemming from a previous project, aimed to automate quality control at the end of a vehicle's production line, a proof-of-concept will be developed. Previously, a DL model that identified vehicle parts was trained using only data obtained through a simulator. By making use of DA techniques to combine simulated and real images, a new model will be trained to be applied to the real-world more effectively. The model's performance, using both types of data, will be compared to its performance when using exclusively one of the two types. We believe this can be expanded to new areas where, until now, the usage of DL was not feasible due to the constraints imposed by data collection
    corecore