335 research outputs found

    Efficient On-the-fly Category Retrieval using ConvNets and GPUs

    Full text link
    We investigate the gains in precision and speed, that can be obtained by using Convolutional Networks (ConvNets) for on-the-fly retrieval - where classifiers are learnt at run time for a textual query from downloaded images, and used to rank large image or video datasets. We make three contributions: (i) we present an evaluation of state-of-the-art image representations for object category retrieval over standard benchmark datasets containing 1M+ images; (ii) we show that ConvNets can be used to obtain features which are incredibly performant, and yet much lower dimensional than previous state-of-the-art image representations, and that their dimensionality can be reduced further without loss in performance by compression using product quantization or binarization. Consequently, features with the state-of-the-art performance on large-scale datasets of millions of images can fit in the memory of even a commodity GPU card; (iii) we show that an SVM classifier can be learnt within a ConvNet framework on a GPU in parallel with downloading the new training images, allowing for a continuous refinement of the model as more images become available, and simultaneous training and ranking. The outcome is an on-the-fly system that significantly outperforms its predecessors in terms of: precision of retrieval, memory requirements, and speed, facilitating accurate on-the-fly learning and ranking in under a second on a single GPU.Comment: Published in proceedings of ACCV 201

    PlaNet - Photo Geolocation with Convolutional Neural Networks

    Full text link
    Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model

    Solution Counting for CSP and SAT with Large Tree-Width

    No full text
    Рассмотрена проблема подсчета количества решений задачи совместимости ограничений (Constraint Satisfaction Problem). Для ее решения был адаптирован метод обратного прослеживания с ацикличным представлением графа ограничений (Backtracking with Tree-Decomposition). Предложен точный алгоритм, сложность которого экспоненциально зависит от ширины дерева, и приближенный алгоритм, экспоненциально зависящий от размера максимальной клики.The problem of counting the number of solutions of a CSP is considered. For solving the problem the Backtracking with a Tree-Decomposition method was adapted. The exact algorithm is suggested which has the worst-time complexity exponential in a tree width, as well as iterative algorithm that has computational complexity exponential in a maximum clique size.Розглянуто проблему підрахунку кількості розв’язків задачі сумісності обмежень (Constraint Satisfaction Problem). Для її розв’язку було адаптовано метод зворотного простеження з ациклічним поданням графа обмежень (Backtracking with Tree-Decomposition). Запропоновано точний алгоритм, складність якого експоненційно залежить від ширини дерева, і наближений алгоритм, експоненційно залежний від розміру максимальної кліки

    A Dense-Depth Representation for VLAD descriptors in Content-Based Image Retrieval

    Full text link
    The recent advances brought by deep learning allowed to improve the performance in image retrieval tasks. Through the many convolutional layers, available in a Convolutional Neural Network (CNN), it is possible to obtain a hierarchy of features from the evaluated image. At every step, the patches extracted are smaller than the previous levels and more representative. Following this idea, this paper introduces a new detector applied on the feature maps extracted from pre-trained CNN. Specifically, this approach lets to increase the number of features in order to increase the performance of the aggregation algorithms like the most famous and used VLAD embedding. The proposed approach is tested on different public datasets: Holidays, Oxford5k, Paris6k and UKB

    Diagnosis of focal liver lesions from ultrasound using deep learning

    Get PDF
    PURPOSE: The purpose of this study was to create an algorithm that simultaneously detects and characterizes (benign vs. malignant) focal liver lesion (FLL) using deep learning. MATERIALS AND METHODS: We trained our algorithm on a dataset proposed during a data challenge organized at the 2018 Journées Francophones de Radiologie. The dataset was composed of 367 two-dimensional ultrasound images from 367 individual livers, captured at various institutions. The algorithm was guided using an attention mechanism with annotations made by a radiologist. The algorithm was then tested on a new data set from 177 patients. RESULTS: The models reached mean ROC-AUC scores of 0.935 for FLL detection and 0.916 for FLL characterization over three shuffled three-fold cross-validations performed with the training data. On the new dataset of 177 patients, our models reached a weighted mean ROC-AUC scores of 0.891 for seven different tasks. CONCLUSION: This study that uses a supervised-attention mechanism focused on FLL detection and characterization from liver ultrasound images. This method could prove to be highly relevant for medical imaging once validated on a larger independent cohort

    Re-ranking for Writer Identification and Writer Retrieval

    Full text link
    Automatic writer identification is a common problem in document analysis. State-of-the-art methods typically focus on the feature extraction step with traditional or deep-learning-based techniques. In retrieval problems, re-ranking is a commonly used technique to improve the results. Re-ranking refines an initial ranking result by using the knowledge contained in the ranked result, e. g., by exploiting nearest neighbor relations. To the best of our knowledge, re-ranking has not been used for writer identification/retrieval. A possible reason might be that publicly available benchmark datasets contain only few samples per writer which makes a re-ranking less promising. We show that a re-ranking step based on k-reciprocal nearest neighbor relationships is advantageous for writer identification, even if only a few samples per writer are available. We use these reciprocal relationships in two ways: encode them into new vectors, as originally proposed, or integrate them in terms of query-expansion. We show that both techniques outperform the baseline results in terms of mAP on three writer identification datasets

    Compact Deep Aggregation for Set Retrieval

    Full text link
    The objective of this work is to learn a compact embedding of a set of descriptors that is suitable for efficient retrieval and ranking, whilst maintaining discriminability of the individual descriptors. We focus on a specific example of this general problem -- that of retrieving images containing multiple faces from a large scale dataset of images. Here the set consists of the face descriptors in each image, and given a query for multiple identities, the goal is then to retrieve, in order, images which contain all the identities, all but one, \etc To this end, we make the following contributions: first, we propose a CNN architecture -- {\em SetNet} -- to achieve the objective: it learns face descriptors and their aggregation over a set to produce a compact fixed length descriptor designed for set retrieval, and the score of an image is a count of the number of identities that match the query; second, we show that this compact descriptor has minimal loss of discriminability up to two faces per image, and degrades slowly after that -- far exceeding a number of baselines; third, we explore the speed vs.\ retrieval quality trade-off for set retrieval using this compact descriptor; and, finally, we collect and annotate a large dataset of images containing various number of celebrities, which we use for evaluation and is publicly released.Comment: 20 page

    The ReproGenomics Viewer: an integrative cross-species toolbox for the reproductive science community.

    No full text
    International audienceWe report the development of the ReproGenomics Viewer (RGV), a multi-and cross-species working environment for the visualization, mining and comparison of published omics data sets for the reproductive science community. The system currently embeds 15 published data sets related to gametogenesis from nine model organisms. Data sets have been curated and conveniently organized into broad categories including biological topics, technologies, species and publications. RGV's modular design for both organisms and genomic tools enables users to upload and compare their data with that from the data sets embedded in the system in a cross-species manner. The RGV is freely available at http://rgv.genouest.org
    corecore