337 research outputs found
Efficient On-the-fly Category Retrieval using ConvNets and GPUs
We investigate the gains in precision and speed, that can be obtained by
using Convolutional Networks (ConvNets) for on-the-fly retrieval - where
classifiers are learnt at run time for a textual query from downloaded images,
and used to rank large image or video datasets.
We make three contributions: (i) we present an evaluation of state-of-the-art
image representations for object category retrieval over standard benchmark
datasets containing 1M+ images; (ii) we show that ConvNets can be used to
obtain features which are incredibly performant, and yet much lower dimensional
than previous state-of-the-art image representations, and that their
dimensionality can be reduced further without loss in performance by
compression using product quantization or binarization. Consequently, features
with the state-of-the-art performance on large-scale datasets of millions of
images can fit in the memory of even a commodity GPU card; (iii) we show that
an SVM classifier can be learnt within a ConvNet framework on a GPU in parallel
with downloading the new training images, allowing for a continuous refinement
of the model as more images become available, and simultaneous training and
ranking. The outcome is an on-the-fly system that significantly outperforms its
predecessors in terms of: precision of retrieval, memory requirements, and
speed, facilitating accurate on-the-fly learning and ranking in under a second
on a single GPU.Comment: Published in proceedings of ACCV 201
PlaNet - Photo Geolocation with Convolutional Neural Networks
Is it possible to build a system to determine the location where a photo was
taken using just its pixels? In general, the problem seems exceptionally
difficult: it is trivial to construct situations where no location can be
inferred. Yet images often contain informative cues such as landmarks, weather
patterns, vegetation, road markings, and architectural details, which in
combination may allow one to determine an approximate location and occasionally
an exact location. Websites such as GeoGuessr and View from your Window suggest
that humans are relatively good at integrating these cues to geolocate images,
especially en-masse. In computer vision, the photo geolocation problem is
usually approached using image retrieval methods. In contrast, we pose the
problem as one of classification by subdividing the surface of the earth into
thousands of multi-scale geographic cells, and train a deep network using
millions of geotagged images. While previous approaches only recognize
landmarks or perform approximate matching using global image descriptors, our
model is able to use and integrate multiple visible cues. We show that the
resulting model, called PlaNet, outperforms previous approaches and even
attains superhuman levels of accuracy in some cases. Moreover, we extend our
model to photo albums by combining it with a long short-term memory (LSTM)
architecture. By learning to exploit temporal coherence to geolocate uncertain
photos, we demonstrate that this model achieves a 50% performance improvement
over the single-image model
Solution Counting for CSP and SAT with Large Tree-Width
Рассмотрена проблема подсчета количества решений задачи совместимости ограничений (Constraint Satisfaction Problem). Для ее решения был адаптирован метод обратного прослеживания с ацикличным представлением графа ограничений (Backtracking with Tree-Decomposition). Предложен точный алгоритм, сложность которого экспоненциально зависит от ширины дерева, и приближенный алгоритм, экспоненциально зависящий от размера максимальной клики.The problem of counting the number of solutions of a CSP is considered. For solving the problem the Backtracking with a Tree-Decomposition method was adapted. The exact algorithm is suggested which has the worst-time complexity exponential in a tree width, as well as iterative algorithm that has computational complexity exponential in a maximum clique size.Розглянуто проблему підрахунку кількості розв’язків задачі сумісності обмежень (Constraint Satisfaction Problem). Для її розв’язку було адаптовано метод зворотного простеження з ациклічним поданням графа обмежень (Backtracking with Tree-Decomposition). Запропоновано точний алгоритм, складність якого експоненційно залежить від ширини дерева, і наближений алгоритм, експоненційно залежний від розміру максимальної кліки
A Dense-Depth Representation for VLAD descriptors in Content-Based Image Retrieval
The recent advances brought by deep learning allowed to improve the
performance in image retrieval tasks. Through the many convolutional layers,
available in a Convolutional Neural Network (CNN), it is possible to obtain a
hierarchy of features from the evaluated image. At every step, the patches
extracted are smaller than the previous levels and more representative.
Following this idea, this paper introduces a new detector applied on the
feature maps extracted from pre-trained CNN. Specifically, this approach lets
to increase the number of features in order to increase the performance of the
aggregation algorithms like the most famous and used VLAD embedding. The
proposed approach is tested on different public datasets: Holidays, Oxford5k,
Paris6k and UKB
Diagnosis of focal liver lesions from ultrasound using deep learning
PURPOSE: The purpose of this study was to create an algorithm that simultaneously detects and characterizes (benign vs. malignant) focal liver lesion (FLL) using deep learning.
MATERIALS AND METHODS: We trained our algorithm on a dataset proposed during a data challenge organized at the 2018 Journées Francophones de Radiologie. The dataset was composed of 367 two-dimensional ultrasound images from 367 individual livers, captured at various institutions. The algorithm was guided using an attention mechanism with annotations made by a radiologist. The algorithm was then tested on a new data set from 177 patients.
RESULTS: The models reached mean ROC-AUC scores of 0.935 for FLL detection and 0.916 for FLL characterization over three shuffled three-fold cross-validations performed with the training data. On the new dataset of 177 patients, our models reached a weighted mean ROC-AUC scores of 0.891 for seven different tasks.
CONCLUSION: This study that uses a supervised-attention mechanism focused on FLL detection and characterization from liver ultrasound images. This method could prove to be highly relevant for medical imaging once validated on a larger independent cohort
Re-ranking for Writer Identification and Writer Retrieval
Automatic writer identification is a common problem in document analysis.
State-of-the-art methods typically focus on the feature extraction step with
traditional or deep-learning-based techniques. In retrieval problems,
re-ranking is a commonly used technique to improve the results. Re-ranking
refines an initial ranking result by using the knowledge contained in the
ranked result, e. g., by exploiting nearest neighbor relations. To the best of
our knowledge, re-ranking has not been used for writer
identification/retrieval. A possible reason might be that publicly available
benchmark datasets contain only few samples per writer which makes a re-ranking
less promising. We show that a re-ranking step based on k-reciprocal nearest
neighbor relationships is advantageous for writer identification, even if only
a few samples per writer are available. We use these reciprocal relationships
in two ways: encode them into new vectors, as originally proposed, or integrate
them in terms of query-expansion. We show that both techniques outperform the
baseline results in terms of mAP on three writer identification datasets
Compact Deep Aggregation for Set Retrieval
The objective of this work is to learn a compact embedding of a set of
descriptors that is suitable for efficient retrieval and ranking, whilst
maintaining discriminability of the individual descriptors. We focus on a
specific example of this general problem -- that of retrieving images
containing multiple faces from a large scale dataset of images. Here the set
consists of the face descriptors in each image, and given a query for multiple
identities, the goal is then to retrieve, in order, images which contain all
the identities, all but one, \etc
To this end, we make the following contributions: first, we propose a CNN
architecture -- {\em SetNet} -- to achieve the objective: it learns face
descriptors and their aggregation over a set to produce a compact fixed length
descriptor designed for set retrieval, and the score of an image is a count of
the number of identities that match the query; second, we show that this
compact descriptor has minimal loss of discriminability up to two faces per
image, and degrades slowly after that -- far exceeding a number of baselines;
third, we explore the speed vs.\ retrieval quality trade-off for set retrieval
using this compact descriptor; and, finally, we collect and annotate a large
dataset of images containing various number of celebrities, which we use for
evaluation and is publicly released.Comment: 20 page
The ReproGenomics Viewer: an integrative cross-species toolbox for the reproductive science community.
International audienceWe report the development of the ReproGenomics Viewer (RGV), a multi-and cross-species working environment for the visualization, mining and comparison of published omics data sets for the reproductive science community. The system currently embeds 15 published data sets related to gametogenesis from nine model organisms. Data sets have been curated and conveniently organized into broad categories including biological topics, technologies, species and publications. RGV's modular design for both organisms and genomic tools enables users to upload and compare their data with that from the data sets embedded in the system in a cross-species manner. The RGV is freely available at http://rgv.genouest.org
- …