8 research outputs found

    Distraction-Aware Feature Learning for Human Attribute Recognition via Coarse-to-Fine Attention Mechanism

    Full text link
    Recently, Human Attribute Recognition (HAR) has become a hot topic due to its scientific challenges and application potentials, where localizing attributes is a crucial stage but not well handled. In this paper, we propose a novel deep learning approach to HAR, namely Distraction-aware HAR (Da-HAR). It enhances deep CNN feature learning by improving attribute localization through a coarse-to-fine attention mechanism. At the coarse step, a self-mask block is built to roughly discriminate and reduce distractions, while at the fine step, a masked attention branch is applied to further eliminate irrelevant regions. Thanks to this mechanism, feature learning is more accurate, especially when heavy occlusions and complex backgrounds exist. Extensive experiments are conducted on the WIDER-Attribute and RAP databases, and state-of-the-art results are achieved, demonstrating the effectiveness of the proposed approach.Comment: 8 pages, 5 figures, accepted by AAAI-20 as an oral presentatio

    Aplicação de máquinas de vector suporte para classificação de ratos transgénicos através de imagem da retina

    Get PDF
    O objetivo deste trabalho consistiu na criação de modelos de aprendizagem supervisionada baseados nas técnicas de Support Vector Machine (SVM) e Support Vector Machine com informação privilegiada (SVM+) capazes de distinguir entre ratos saudáveis (C) e transgénicos (D) por meio de análise de textura da imagem de tomografia de coerência óptica (OCT) de retinas do olho direito. A amostra é composta por 74 ratos, sendo 40 saudáveis e 34 transgénicos. A tomografia de coerência óptica foi utilizada para obtenção da imagem da retina dos ratos que, por sua vez, foi dividida em 4 quadrantes. A partir destes, obteve-se uma imagem de fundo 2D e foram aplicados 20 indicadores de análise de textura de imagem de fundo, usados como features para o modelo SVM. As features com maior capacidade de separação entre grupos e que possuem coeficiente de correlação inferior a 0,7 entre elas foram Inertia (primeiro, segundo e quarto quadrantes), INN (Inverse difference normalized; terceiro quadrante), IMC2 (Information measure of correlation; terceiro quadrante) e ClusterShade (terceiro quadrante). Considerando as 6 features mais relevantes foram criados os modelos SVM e SVM+ cujos parâmetros foram afinados de maneira a obter os modelos com a melhor precisão na classificação dos ratos nas categorias saudável e transgénico. A técnica de validação cruzada em 5 grupos foi utilizada para validar os resultados dos modelos criados. Tanto para o conjunto de teste como para o conjunto de dados total o modelo SVM obteve 100% precisão, enquanto que a precisão obtida pelo modelo SVM+ foi de 93,33% (erro de apenas 1 caso em 15 – conjunto de teste) na classificação dos dados do conjunto de teste e 98,65% (erro de apenas 1 caso em 74 – conjunto de dados total) no conjunto de dados total.The aim of this work was to create supervised learning models based on the Support Vector Machine (SVM) and Support Vector Machine with privileged information (SVM +) capable of distinguishing between healthy (C) and transgenic (D) mice through texture analysis of the optical coherence tomography (OCT) image of the retinas of the right eye. The sample consists of 74 mice, 40 healthy and 34 transgenic. Optical coherence tomography was used to obtain the image of the mice's retina, which in turn was divided into 4 quadrants. From these, a 2D background image was obtained and 20 background image texture analysis indicators were applied, used as features for the SVM model. The features with greater separation capacity between groups and which have a less than 0.7 correlation coefficient between each other were Inertia (first, second and fourth quadrants), INN (Inverse difference normalized; third quadrant), IMC2 (Information measure of correlation; third quadrant) and ClusterShade (third quadrant). Regarding the 6 most relevant features, the SVM and SVM + models were created, whose parameters were adjusted in order to obtain the models with the best precision in the classification of mice in the healthy and transgenic categories. The 5 fold crossvalidation technique was used to validate the results of the models created. For both, the test set and the total data set, the SVM model obtained 100% accuracy, while the precision obtained by the SVM + model was 93.33% (error of only 1 case in 15 - test set) in the classification of the test set data and 98.65% (error of only 1 case in 74 - total data set) in the total data set

    Domain Adaptation and Privileged Information for Visual Recognition

    Get PDF
    The automatic identification of entities like objects, people or their actions in visual data, such as images or video, has significantly improved, and is now being deployed in access control, social media, online retail, autonomous vehicles, and several other applications. This visual recognition capability leverages supervised learning techniques, which require large amounts of labeled training data from the target distribution representative of the particular task at hand. However, collecting such training data might be expensive, require too much time, or even be impossible. In this work, we introduce several novel approaches aiming at compensating for the lack of target training data. Rather than leveraging prior knowledge for building task-specific models, typically easier to train, we focus on developing general visual recognition techniques, where the notion of prior knowledge is better identified by additional information, available during training. Depending on the nature of such information, the learning problem may turn into domain adaptation (DA), domain generalization (DG), leaning using privileged information (LUPI), or domain adaptation with privileged information (DAPI).;When some target data samples are available and additional information in the form of labeled data from a different source is also available, the learning problem becomes domain adaptation. Unlike previous DA work, we introduce two novel approaches for the few-shot learning scenario, which require only very few labeled target samples, and even one can be very effective. The first method exploits a Siamese deep neural network architecture for learning an embedding where visual categories from the source and target distributions are semantically aligned and yet maximally separated. The second approach instead, extends adversarial learning to simultaneously maximize the confusion between source and target domains while achieving semantic alignment.;In complete absence of target data, several cheaply available source datasets related to the target distribution can be leveraged as additional information for learning a task. This is the domain generalization setting. We introduce the first deep learning approach to address the DG problem, by extending a Siamese network architecture for learning a representation of visual categories that is invariant with respect to the sources, while imposing semantic alignment and class separation to maximize generalization performance on unseen target domains.;There are situations in which target data for training might come equipped with additional information that can be modeled as an auxiliary view of the data, and that unfortunately is not available during testing. This is the LUPI scenario. We introduce a novel framework based on the information bottleneck that leverages the auxiliary view to improve the performance of visual classifiers. We do so by introducing a formulation that is general, in the sense that can be used with any visual classifier.;Finally, when the available target data is unlabeled, and there is closely related labeled source data, which is also equipped with an auxiliary view as additional information, we pose the question of how to leverage the source data views to train visual classifiers for unseen target data. This is the DAPI scenario. We extend the LUPI framework based on the information bottleneck to learn visual classifiers in DAPI settings and show that privileged information can be leveraged to improve the learning on new domains. Also, the novel DAPI framework is general and can be used with any visual classifier.;Every use of auxiliary information has been validated extensively using publicly available benchmark datasets, and several new state-of-the-art accuracy performance values have been set. Examples of application domains include visual object recognition from RGB images and from depth data, handwritten digit recognition, and gesture recognition from video
    corecore