438 research outputs found

    Artistic ideation based on computer vision methods

    Get PDF
    Exploramos el problema de clasificar las categorías escénicas que constituyen la base de la ideación y el diseño de la producción cultural de un artista. El objetivo principal es evaluar el desempeño de los descriptores de SIFT, la representación de la bolsa de imágenes y la correspondencia espacial de la pirámide cuando estas metodologías de visión computacional se enfrentan a este tipo de imágenes. Los resultados son prometedores, en promedio la puntuación de rendimiento es de alrededor del 70% y su desviación estándar es de aproximadamente el 5%. Explorem el problema de classificar les categories escèniques que constitueixen la base de la ideació i el disseny de la producció cultural d'un artista. L'objectiu principal és avaluar l'acompliment dels descriptors de SIFT, la representació de la borsa d'imatges i la correspondència espacial de la piràmide quan aquestes metodologies de visió computacional s'enfronten a aquest tipus d'imatges. Els resultats són prometedors, de mitjana la puntuació de rendiment és del voltant del 70% i la seva desviació estàndard és d'aproximadament el 5%

    Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis

    Full text link
    The availability of large-scale annotated image datasets and recent advances in supervised deep learning methods enable the end-to-end derivation of representative image features that can impact a variety of image analysis problems. Such supervised approaches, however, are difficult to implement in the medical domain where large volumes of labelled data are difficult to obtain due to the complexity of manual annotation and inter- and intra-observer variability in label assignment. We propose a new convolutional sparse kernel network (CSKN), which is a hierarchical unsupervised feature learning framework that addresses the challenge of learning representative visual features in medical image analysis domains where there is a lack of annotated training data. Our framework has three contributions: (i) We extend kernel learning to identify and represent invariant features across image sub-patches in an unsupervised manner. (ii) We initialise our kernel learning with a layer-wise pre-training scheme that leverages the sparsity inherent in medical images to extract initial discriminative features. (iii) We adapt a multi-scale spatial pyramid pooling (SPP) framework to capture subtle geometric differences between learned visual features. We evaluated our framework in medical image retrieval and classification on three public datasets. Our results show that our CSKN had better accuracy when compared to other conventional unsupervised methods and comparable accuracy to methods that used state-of-the-art supervised convolutional neural networks (CNNs). Our findings indicate that our unsupervised CSKN provides an opportunity to leverage unannotated big data in medical imaging repositories.Comment: Accepted by Medical Image Analysis (with a new title 'Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis'). The manuscript is available from following link (https://doi.org/10.1016/j.media.2019.06.005

    Triagem robusta de melanoma : em defesa dos descritores aprimorados de nível médio

    Get PDF
    Orientadores: Eduardo Alves do Valle Junior, Sandra Eliza Fontes de AvilaDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Melanoma é o tipo de câncer de pele que mais leva à morte, mesmo sendo o mais curável, se detectado precocemente. Considerando que a presença de um dermatologista em tempo integral não é economicamente viável para muitas cidades e especialmente em comunidades carentes, ferramentas de auxílio ao diagnóstico para a triagem do melanoma têm sido um tópico de pesquisa ativo. Muitos trabalhos existentes são baseados no modelo Bag-of-Visual-Words (BoVW), combinando descritores de cor e textura. No entanto, o modelo BoVW vem se aprimorando e hoje existem várias extensões que levam a melhores taxas de acerto em tarefas gerais de classificação de imagens. Estes modelos avançados ainda não foram explorados para rastreio de melanoma, motivando assim este trabalho. Aqui nós apresentamos uma nova abordagem para rastreio de melanoma baseado nos descritores BossaNova, que são estado-da-arte, mostrando resultados muito promissores, com uma AUC de 93,7%. Este trabalho também propõe uma nova estratégia de pooling espacial especialmente desenhada para rastreio de melanoma. Outra contribuição dessa pesquisa é o uso inédito do BossaNova na classificação de melanoma. Isso abre oportunidades de exploração deste descritor em outros contextos médicosAbstract: Melanoma is the type of skin cancer that most leads to death, even being the most curable, if detected early. Since the presence of a full time dermatologist is not economical feasible for many small cities and specially in underserved communities, computer-aided diagnosis for melanoma screening has been a topic of active research. Much of the existing art is based on the Bag-of-Visual-Words (BoVW) model, combining color and texture descriptors. However, the BoVW model has been improving and nowadays there are several extensions that perform better classification rates in general image classification tasks. These enhanced models were not explored yet for melanoma screening, thus motivating our work. Here we present a new approach for melanoma screening, based upon the state-of-the-art BossaNova descriptors, showing very promising results for screening, reaching an AUC of up to 93.7%. This work also proposes a new spatial pooling strategy specially designed for melanoma screening. Other contribution of this research is the unprecedented use of BossaNova in melanoma classification. This opens the opportunity to explore this enhanced mid-level descriptors in other medical contextsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    Automatic Food Intake Assessment Using Camera Phones

    Get PDF
    Obesity is becoming an epidemic phenomenon in most developed countries. The fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended. It is essential to monitor everyday food intake for obesity prevention and management. Existing dietary assessment methods usually require manually recording and recall of food types and portions. Accuracy of the results largely relies on many uncertain factors such as user\u27s memory, food knowledge, and portion estimations. As a result, the accuracy is often compromised. Accurate and convenient dietary assessment methods are still blank and needed in both population and research societies. In this thesis, an automatic food intake assessment method using cameras, inertial measurement units (IMUs) on smart phones was developed to help people foster a healthy life style. With this method, users use their smart phones before and after a meal to capture images or videos around the meal. The smart phone will recognize food items and calculate the volume of the food consumed and provide the results to users. The technical objective is to explore the feasibility of image based food recognition and image based volume estimation. This thesis comprises five publications that address four specific goals of this work: (1) to develop a prototype system with existing methods to review the literature methods, find their drawbacks and explore the feasibility to develop novel methods; (2) based on the prototype system, to investigate new food classification methods to improve the recognition accuracy to a field application level; (3) to design indexing methods for large-scale image database to facilitate the development of new food image recognition and retrieval algorithms; (4) to develop novel convenient and accurate food volume estimation methods using only smart phones with cameras and IMUs. A prototype system was implemented to review existing methods. Image feature detector and descriptor were developed and a nearest neighbor classifier were implemented to classify food items. A reedit card marker method was introduced for metric scale 3D reconstruction and volume calculation. To increase recognition accuracy, novel multi-view food recognition algorithms were developed to recognize regular shape food items. To further increase the accuracy and make the algorithm applicable to arbitrary food items, new food features, new classifiers were designed. The efficiency of the algorithm was increased by means of developing novel image indexing method in large-scale image database. Finally, the volume calculation was enhanced through reducing the marker and introducing IMUs. Sensor fusion technique to combine measurements from cameras and IMUs were explored to infer the metric scale of the 3D model as well as reduce noises from these sensors

    Vehicle make and model recognition for intelligent transportation monitoring and surveillance.

    Get PDF
    Vehicle Make and Model Recognition (VMMR) has evolved into a significant subject of study due to its importance in numerous Intelligent Transportation Systems (ITS), such as autonomous navigation, traffic analysis, traffic surveillance and security systems. A highly accurate and real-time VMMR system significantly reduces the overhead cost of resources otherwise required. The VMMR problem is a multi-class classification task with a peculiar set of issues and challenges like multiplicity, inter- and intra-make ambiguity among various vehicles makes and models, which need to be solved in an efficient and reliable manner to achieve a highly robust VMMR system. In this dissertation, facing the growing importance of make and model recognition of vehicles, we present a VMMR system that provides very high accuracy rates and is robust to several challenges. We demonstrate that the VMMR problem can be addressed by locating discriminative parts where the most significant appearance variations occur in each category, and learning expressive appearance descriptors. Given these insights, we consider two data driven frameworks: a Multiple-Instance Learning-based (MIL) system using hand-crafted features and an extended application of deep neural networks using MIL. Our approach requires only image level class labels, and the discriminative parts of each target class are selected in a fully unsupervised manner without any use of part annotations or segmentation masks, which may be costly to obtain. This advantage makes our system more intelligent, scalable, and applicable to other fine-grained recognition tasks. We constructed a dataset with 291,752 images representing 9,170 different vehicles to validate and evaluate our approach. Experimental results demonstrate that the localization of parts and distinguishing their discriminative powers for categorization improve the performance of fine-grained categorization. Extensive experiments conducted using our approaches yield superior results for images that were occluded, under low illumination, partial camera views, or even non-frontal views, available in our real-world VMMR dataset. The approaches presented herewith provide a highly accurate VMMR system for rea-ltime applications in realistic environments.\\ We also validate our system with a significant application of VMMR to ITS that involves automated vehicular surveillance. We show that our application can provide law inforcement agencies with efficient tools to search for a specific vehicle type, make, or model, and to track the path of a given vehicle using the position of multiple cameras

    Retrieval and classification methods for textured 3D models: a comparative study

    No full text
    International audienceThis paper presents a comparative study of six methods for the retrieval and classification of tex-tured 3D models, which have been selected as representative of the state of the art. To better analyse and control how methods deal with specific classes of geometric and texture deformations, we built a collection of 572 synthetic textured mesh models, in which each class includes multiple texture and geometric modifications of a small set of null models. Results show a challenging, yet lively, scenario and also reveal interesting insights in how to deal with texture information according to different approaches, possibly working in the CIELab as well as in modifications of the RGB colour space

    Esquemas de transferência para aprendizado profundo em classificação de imagens

    Get PDF
    Orientadores: Eduardo Alves do Valle Junior, Sandra Eliza Fontes de AvilaDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Em Visão Computacional, a tarefa de classificação é complexa, pois visa a detecção da presença de categorias em imagens, dependendo criticamente da habilidade de aprender modelos computacionais generalistas a partir de amostras de treinamento. Aprendizado Profundo (AP) para tarefas visuais geralmente envolve o aprendizado de todos os passos deste processo, da extração de características até a atribuição de rótulos. Este tipo pervasivo de aprendizado garante aos modelos de AP maior capacidade de generalização, mas também traz novos desafios: um modelo de AP deverá estimar um grande número de parâmetros, exigindo um imenso conjunto de dados anotados e grandes quantidades de recursos computacionais. Neste contexto, a Transferência de Aprendizado emerge como uma solução promissora, permitindo a reciclagem de parâmetros aprendidos por modelos diferentes. Motivados pela crescente quantidade de evidências para o potencial de tais técnicas, estudamos de maneira abrangente a transferência de conhecimento de arquiteturas profundas aplicada ao reconhecimento de imagens. Nossos experimentos foram desenvolvidos para explorar representações internas de uma arquitetura profunda, testando sua robustez, redundância e precisão, com aplicações nos problemas de rastreio automático de melanoma, reconhecimento de cenas (MIT Indoors) e detecção de objetos (Pascal VOC). Também levamos a transferência a extremos, introduzindo a Transferência de Aprendizado Completa, que preserva a maior parte do modelo original, mostrando que esquemas agressivos de transferência podem atingir resultados competitivosAbstract: In Computer Vision, the task of classification is complex, as it aims to identify the presence of high-level categories in images, depending critically upon learning general models from a set of training samples. Deep Learning (DL) for visual tasks usually involves seamlessly learning every step of this process, from feature extraction to label assignment. This pervasive learning improves DL generalization abilities, but brings its own challenges: a DL model will have a huge number of parameters to estimate, thus requiring large amounts of annotated data and computational resources. In this context, transfer learning emerges as a promising solution, allowing one to recycle parameters learned among different models. Motivated by the growing amount of evidence for the potential of such techniques, we study transfer learning for deep architectures applied to image recognition. Our experiments are designed to explore the internal representations of DL architectures, testing their robustness, redundancy and precision, with applications to the problems of automated melanoma screening, scene recognition (MIT Indoors) and object detection (Pascal VOC). We also take transfer learning to extremes, introducing Complete Transfer Learning, which preserves most of the original model, showing that aggressive transfer schemes can reach competitive resultsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    MMFL-Net: Multi-scale and Multi-granularity Feature Learning for Cross-domain Fashion Retrieval

    Full text link
    Instance-level image retrieval in fashion is a challenging issue owing to its increasing importance in real-scenario visual fashion search. Cross-domain fashion retrieval aims to match the unconstrained customer images as queries for photographs provided by retailers; however, it is a difficult task due to a wide range of consumer-to-shop (C2S) domain discrepancies and also considering that clothing image is vulnerable to various non-rigid deformations. To this end, we propose a novel multi-scale and multi-granularity feature learning network (MMFL-Net), which can jointly learn global-local aggregation feature representations of clothing images in a unified framework, aiming to train a cross-domain model for C2S fashion visual similarity. First, a new semantic-spatial feature fusion part is designed to bridge the semantic-spatial gap by applying top-down and bottom-up bidirectional multi-scale feature fusion. Next, a multi-branch deep network architecture is introduced to capture global salient, part-informed, and local detailed information, and extracting robust and discrimination feature embedding by integrating the similarity learning of coarse-to-fine embedding with the multiple granularities. Finally, the improved trihard loss, center loss, and multi-task classification loss are adopted for our MMFL-Net, which can jointly optimize intra-class and inter-class distance and thus explicitly improve intra-class compactness and inter-class discriminability between its visual representations for feature learning. Furthermore, our proposed model also combines the multi-task attribute recognition and classification module with multi-label semantic attributes and product ID labels. Experimental results demonstrate that our proposed MMFL-Net achieves significant improvement over the state-of-the-art methods on the two datasets, DeepFashion-C2S and Street2Shop.Comment: 27 pages, 12 figures, Published by <Multimedia Tools and Applications

    Pre-trained Convolutional Networks and generative statiscial models: a study in semi-supervised learning

    Get PDF
    Comparative study between the performance of Convolutional Networks using pretrained models and statistical generative models on tasks of image classification in semi-supervised enviroments.Study of multiple ensembles using these techniques and generated data from estimated pdfs.Pretrained Convents, LDA, pLSA, Fisher Vectors, Sparse-coded SPMs, TSVMs being the key models worked upon
    corecore