160 research outputs found

    Group Invariant Deep Representations for Image Instance Retrieval

    Get PDF
    Most image instance retrieval pipelines are based on comparison of vectors known as global image descriptors between a query image and the database images. Due to their success in large scale image classification, representations extracted from Convolutional Neural Networks (CNN) are quickly gaining ground on Fisher Vectors (FVs) as state-of-the-art global descriptors for image instance retrieval. While CNN-based descriptors are generally remarked for good retrieval performance at lower bitrates, they nevertheless present a number of drawbacks including the lack of robustness to common object transformations such as rotations compared with their interest point based FV counterparts. In this paper, we propose a method for computing invariant global descriptors from CNNs. Our method implements a recently proposed mathematical theory for invariance in a sensory cortex modeled as a feedforward neural network. The resulting global descriptors can be made invariant to multiple arbitrary transformation groups while retaining good discriminativeness. Based on a thorough empirical evaluation using several publicly available datasets, we show that our method is able to significantly and consistently improve retrieval results every time a new type of invariance is incorporated. We also show that our method which has few parameters is not prone to overfitting: improvements generalize well across datasets with different properties with regard to invariances. Finally, we show that our descriptors are able to compare favourably to other state-of-the-art compact descriptors in similar bitranges, exceeding the highest retrieval results reported in the literature on some datasets. A dedicated dimensionality reduction step --quantization or hashing-- may be able to further improve the competitiveness of the descriptors

    Feature Encoding Strategies for Multi-View Image Classification

    Get PDF
    Machine vision systems can vary greatly in size and complexity depending on the task at hand. However, the purpose of inspection, quality and reliability remains the same. This work sets out to bridge the gap between traditional machine vision and computer vision. By applying powerful computer vision techniques, we are able to achieve more robust solutions in manufacturing settings. This thesis presents a framework for applying powerful new image classification techniques used for image retrieval in the Bag of Words (BoW) framework. In addition, an exhaustive evaluation of commonly used feature pooling approaches is conducted with results showing that spatial augmentation can outperform mean and max descriptor pooling on an in-house dataset and the CalTech 3D dataset. The results for the experiments contained within, details a framework that performs classification using multiple view points. The results show that the feature encoding method known as Triangulation Embedding outperforms the Vector of Locally Aggregated Descriptors (VLAD) and the standard BoW framework with an accuracy of 99.28%. This improvement is also seen on the public Caltech 3D dataset where the improvement over VLAD and BoW was 5.64% and 12.23% respectively. This proposed multiple view classification system is also robust enough to handle the real world problem of camera failure and still classify with a high reliability. A missing camera input was simulated and showed that using the Triangulation Embedding method, the system could perform classification with a very minor reduction in accuracy at 98.89%, compared to the BoW baseline at 96.60% using the same techniques. The presented solution tackles the traditional machine vision problem of object identification and also allows for the training of a machine vision system that can be done without any expert level knowledge

    Automatic Synchronization of Multi-User Photo Galleries

    Full text link
    In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solution that achieves better generalization performance for the synchronization task compared to the available literature. The method is characterized by three stages: at first, deep convolutional neural network features are used to assess the visual similarity among the photos; then, pairs of similar photos are detected across different galleries and used to construct a graph; eventually, a probabilistic graphical model is used to estimate the temporal offset of each pair of galleries, by traversing the minimum spanning tree extracted from this graph. The experimental evaluation is conducted on four publicly available datasets covering different types of events, demonstrating the strength of our proposed method. A thorough discussion of the obtained results is provided for a critical assessment of the quality in synchronization.Comment: ACCEPTED to IEEE Transactions on Multimedi

    Structuring User-Generated Content on Social Media with Multimodal Aspect-Based Sentiment Analysis

    Full text link
    People post their opinions and experiences on social media, yielding rich databases of end users' sentiments. This paper shows to what extent machine learning can analyze and structure these databases. An automated data analysis pipeline is deployed to provide insights into user-generated content for researchers in other domains. First, the domain expert can select an image and a term of interest. Then, the pipeline uses image retrieval to find all images showing similar contents and applies aspect-based sentiment analysis to outline users' opinions about the selected term. As part of an interdisciplinary project between architecture and computer science researchers, an empirical study of Hamburg's Elbphilharmonie was conveyed on 300 thousand posts from the platform Flickr with the hashtag 'hamburg'. Image retrieval methods generated a subset of slightly more than 1.5 thousand images displaying the Elbphilharmonie. We found that these posts mainly convey a neutral or positive sentiment towards it. With this pipeline, we suggest a new big data analysis method that offers new insights into end-users opinions, e.g., for architecture domain experts.Comment: 9 pages, 5 figures, short paper version to be published at 9th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT2022

    Scalable Image Retrieval by Sparse Product Quantization

    Get PDF
    Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional feature indexing and retrieval is the crux of large-scale image retrieval. A recent promising technique is Product Quantization, which attempts to index high-dimensional image features by decomposing the feature space into a Cartesian product of low dimensional subspaces and quantizing each of them separately. Despite the promising results reported, their quantization approach follows the typical hard assignment of traditional quantization methods, which may result in large quantization errors and thus inferior search performance. Unlike the existing approaches, in this paper, we propose a novel approach called Sparse Product Quantization (SPQ) to encoding the high-dimensional feature vectors into sparse representation. We optimize the sparse representations of the feature vectors by minimizing their quantization errors, making the resulting representation is essentially close to the original data in practice. Experiments show that the proposed SPQ technique is not only able to compress data, but also an effective encoding technique. We obtain state-of-the-art results for ANN search on four public image datasets and the promising results of content-based image retrieval further validate the efficacy of our proposed method.Comment: 12 page

    IRIM at TRECVID 2012: Semantic Indexing and Instance Search

    Get PDF
    International audienceThe IRIM group is a consortium of French teams work- ing on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2012 se- mantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likeli- hood of a video shot to contain a target concept. These scores are then used for producing a ranked list of im- ages or shots that are the most likely to contain the tar- get concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classi cation, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of dif- ferent descriptors and tried di erent fusion strategies. The best IRIM run has a Mean Inferred Average Pre- cision of 0.2378, which ranked us 4th out of 16 partici- pants. For the instance search task, our approach uses two steps. First individual methods of participants are used to compute similrity between an example image of in- stance and keyframes of a video clip. Then a two-step fusion method is used to combine these individual re- sults and obtain a score for the likelihood of an instance to appear in a video clip. These scores are used to ob- tain a ranked list of clips the most likely to contain the queried instance. The best IRIM run has a MAP of 0.1192, which ranked us 29th on 79 fully automatic runs

    Triagem robusta de melanoma : em defesa dos descritores aprimorados de nível médio

    Get PDF
    Orientadores: Eduardo Alves do Valle Junior, Sandra Eliza Fontes de AvilaDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Melanoma é o tipo de câncer de pele que mais leva à morte, mesmo sendo o mais curável, se detectado precocemente. Considerando que a presença de um dermatologista em tempo integral não é economicamente viável para muitas cidades e especialmente em comunidades carentes, ferramentas de auxílio ao diagnóstico para a triagem do melanoma têm sido um tópico de pesquisa ativo. Muitos trabalhos existentes são baseados no modelo Bag-of-Visual-Words (BoVW), combinando descritores de cor e textura. No entanto, o modelo BoVW vem se aprimorando e hoje existem várias extensões que levam a melhores taxas de acerto em tarefas gerais de classificação de imagens. Estes modelos avançados ainda não foram explorados para rastreio de melanoma, motivando assim este trabalho. Aqui nós apresentamos uma nova abordagem para rastreio de melanoma baseado nos descritores BossaNova, que são estado-da-arte, mostrando resultados muito promissores, com uma AUC de 93,7%. Este trabalho também propõe uma nova estratégia de pooling espacial especialmente desenhada para rastreio de melanoma. Outra contribuição dessa pesquisa é o uso inédito do BossaNova na classificação de melanoma. Isso abre oportunidades de exploração deste descritor em outros contextos médicosAbstract: Melanoma is the type of skin cancer that most leads to death, even being the most curable, if detected early. Since the presence of a full time dermatologist is not economical feasible for many small cities and specially in underserved communities, computer-aided diagnosis for melanoma screening has been a topic of active research. Much of the existing art is based on the Bag-of-Visual-Words (BoVW) model, combining color and texture descriptors. However, the BoVW model has been improving and nowadays there are several extensions that perform better classification rates in general image classification tasks. These enhanced models were not explored yet for melanoma screening, thus motivating our work. Here we present a new approach for melanoma screening, based upon the state-of-the-art BossaNova descriptors, showing very promising results for screening, reaching an AUC of up to 93.7%. This work also proposes a new spatial pooling strategy specially designed for melanoma screening. Other contribution of this research is the unprecedented use of BossaNova in melanoma classification. This opens the opportunity to explore this enhanced mid-level descriptors in other medical contextsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric
    corecore