2,070 research outputs found
Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting
For person re-identification, existing deep networks often focus on
representation learning. However, without transfer learning, the learned model
is fixed as is, which is not adaptable for handling various unseen scenarios.
In this paper, beyond representation learning, we consider how to formulate
person image matching directly in deep feature maps. We treat image matching as
finding local correspondences in feature maps, and construct query-adaptive
convolution kernels on the fly to achieve local matching. In this way, the
matching process and results are interpretable, and this explicit matching is
more generalizable than representation features to unseen scenarios, such as
unknown misalignments, pose or viewpoint changes. To facilitate end-to-end
training of this architecture, we further build a class memory module to cache
feature maps of the most recent samples of each class, so as to compute image
matching losses for metric learning. Through direct cross-dataset evaluation,
the proposed Query-Adaptive Convolution (QAConv) method gains large
improvements over popular learning methods (about 10%+ mAP), and achieves
comparable results to many transfer learning methods. Besides, a model-free
temporal cooccurrence based score weighting method called TLift is proposed,
which improves the performance to a further extent, achieving state-of-the-art
results in cross-dataset person re-identification. Code is available at
https://github.com/ShengcaiLiao/QAConv.Comment: This is the ECCV 2020 version, including the appendi
RPNet: an End-to-End Network for Relative Camera Pose Estimation
This paper addresses the task of relative camera pose estimation from raw
image pixels, by means of deep neural networks. The proposed RPNet network
takes pairs of images as input and directly infers the relative poses, without
the need of camera intrinsic/extrinsic. While state-of-the-art systems based on
SIFT + RANSAC, are able to recover the translation vector only up to scale,
RPNet is trained to produce the full translation vector, in an end-to-end way.
Experimental results on the Cambridge Landmark dataset show very promising
results regarding the recovery of the full translation vector. They also show
that RPNet produces more accurate and more stable results than traditional
approaches, especially for hard images (repetitive textures, textureless
images, etc). To the best of our knowledge, RPNet is the first attempt to
recover full translation vectors in relative pose estimation
A Compact Representation of Histopathology Images using Digital Stain Separation & Frequency-Based Encoded Local Projections
In recent years, histopathology images have been increasingly used as a
diagnostic tool in the medical field. The process of accurately diagnosing a
biopsy sample requires significant expertise in the field, and as such can be
time-consuming and is prone to uncertainty and error. With the advent of
digital pathology, using image recognition systems to highlight problem areas
or locate similar images can aid pathologists in making quick and accurate
diagnoses. In this paper, we specifically consider the encoded local
projections (ELP) algorithm, which has previously shown some success as a tool
for classification and recognition of histopathology images. We build on the
success of the ELP algorithm as a means for image classification and
recognition by proposing a modified algorithm which captures the local
frequency information of the image. The proposed algorithm estimates local
frequencies by quantifying the changes in multiple projections in local windows
of greyscale images. By doing so we remove the need to store the full
projections, thus significantly reducing the histogram size, and decreasing
computation time for image retrieval and classification tasks. Furthermore, we
investigate the effectiveness of applying our method to histopathology images
which have been digitally separated into their hematoxylin and eosin stain
components. The proposed algorithm is tested on the publicly available invasive
ductal carcinoma (IDC) data set. The histograms are used to train an SVM to
classify the data. The experiments showed that the proposed method outperforms
the original ELP algorithm in image retrieval tasks. On classification tasks,
the results are found to be comparable to state-of-the-art deep learning
methods and better than many handcrafted features from the literature.Comment: Accepted for publication in the International Conference on Image
Analysis and Recognition (ICIAR 2019
Inverse modelling of Köhler theory – Part 1: A response surface analysis of CCN spectra with respect to surface-active organic species
This is the final version of the article. Available from European Geosciences Union (EGU) and Copernicus Publications via the DOI in this record.In this study a novel framework for inverse modelling of cloud condensation nuclei (CCN) spectra is developed using Köhler theory. The framework is established by using model-generated synthetic measurements as calibration data for a parametric sensitivity analysis. Assessment of the relative importance of aerosol physicochemical parameters, while accounting for bulk–surface partitioning of surface-active organic species, is carried out over a range of atmospherically relevant supersaturations. By introducing an objective function that provides a scalar metric for diagnosing the deviation of modelled CCN concentrations from synthetic observations, objective function response surfaces are presented as a function of model input parameters. Crucially, for the chosen calibration data, aerosol–CCN spectrum closure is confirmed as a well-posed inverse modelling exercise for a subset of the parameters explored herein. The response surface analysis indicates that the appointment of appropriate calibration data is particularly important. To perform an inverse aerosol–CCN closure analysis and constrain parametric uncertainties, it is shown that a high-resolution CCN spectrum definition of the calibration data is required where single-valued definitions may be expected to fail. Using Köhler theory to model CCN concentrations requires knowledge of many physicochemical parameters, some of which are difficult to measure in situ on the scale of interest and introduce a considerable amount of parametric uncertainty to model predictions. For all partitioning schemes and environments modelled, model output showed significant sensitivity to perturbations in aerosol log-normal parameters describing the accumulation mode, surface tension, organic : inorganic mass ratio, insoluble fraction, and solution ideality. Many response surfaces pertaining to these parameters contain well-defined minima and are therefore good candidates for calibration using a Monte Carlo Markov Chain (MCMC) approach to constraining parametric uncertainties.
A complete treatment of bulk–surface partitioning is shown to predict CCN spectra similar to those calculated using classical Köhler theory with the surface tension of a pure water drop, as found in previous studies. In addition, model sensitivity to perturbations in the partitioning parameters was found to be negligible. As a result, this study supports previously held recommendations that complex surfactant effects might be neglected, and the continued use of classical Köhler theory in global climate models (GCMs) is recommended to avoid an additional computational burden. The framework developed is suitable for application to many additional composition-dependent processes that might impact CCN activation potential. However, the focus of this study is to demonstrate the efficacy of the applied sensitivity analysis to identify important parameters in those processes and will be extended to facilitate a global sensitivity analysis and inverse aerosol–CCN closure analysis.This work was supported by the UK Natural Environment Research Council grants NE/I020148/1 (AerosolCloud Interactions – A Directed Programme to Reduce Uncertainty in Forcing) and NE/J024252/1 (Global Aerosol Synthesis And
Science Project). P. Stier would like to acknowledge funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) ERC project ACCLAIM (grant agreement no. FP7-280025)
On Recognizing Transparent Objects in Domestic Environments Using Fusion of Multiple Sensor Modalities
Current object recognition methods fail on object sets that include both
diffuse, reflective and transparent materials, although they are very common in
domestic scenarios. We show that a combination of cues from multiple sensor
modalities, including specular reflectance and unavailable depth information,
allows us to capture a larger subset of household objects by extending a state
of the art object recognition method. This leads to a significant increase in
robustness of recognition over a larger set of commonly used objects.Comment: 12 page
Feature-Guided Black-Box Safety Testing of Deep Neural Networks
Despite the improved accuracy of deep neural networks, the discovery of
adversarial examples has raised serious safety concerns. Most existing
approaches for crafting adversarial examples necessitate some knowledge
(architecture, parameters, etc.) of the network at hand. In this paper, we
focus on image classifiers and propose a feature-guided black-box approach to
test the safety of deep neural networks that requires no such knowledge. Our
algorithm employs object detection techniques such as SIFT (Scale Invariant
Feature Transform) to extract features from an image. These features are
converted into a mutable saliency distribution, where high probability is
assigned to pixels that affect the composition of the image with respect to the
human visual system. We formulate the crafting of adversarial examples as a
two-player turn-based stochastic game, where the first player's objective is to
minimise the distance to an adversarial example by manipulating the features,
and the second player can be cooperative, adversarial, or random. We show that,
theoretically, the two-player game can con- verge to the optimal strategy, and
that the optimal strategy represents a globally minimal adversarial image. For
Lipschitz networks, we also identify conditions that provide safety guarantees
that no adversarial examples exist. Using Monte Carlo tree search we gradually
explore the game state space to search for adversarial examples. Our
experiments show that, despite the black-box setting, manipulations guided by a
perception-based saliency distribution are competitive with state-of-the-art
methods that rely on white-box saliency matrices or sophisticated optimization
procedures. Finally, we show how our method can be used to evaluate robustness
of neural networks in safety-critical applications such as traffic sign
recognition in self-driving cars.Comment: 35 pages, 5 tables, 23 figure
Product recognition in store shelves as a sub-graph isomorphism problem
The arrangement of products in store shelves is carefully planned to maximize
sales and keep customers happy. However, verifying compliance of real shelves
to the ideal layout is a costly task routinely performed by the store
personnel. In this paper, we propose a computer vision pipeline to recognize
products on shelves and verify compliance to the planned layout. We deploy
local invariant features together with a novel formulation of the product
recognition problem as a sub-graph isomorphism between the items appearing in
the given image and the ideal layout. This allows for auto-localizing the given
image within the aisle or store and improving recognition dramatically.Comment: Slightly extended version of the paper accepted at ICIAP 2017. More
information @project_page -->
http://vision.disi.unibo.it/index.php?option=com_content&view=article&id=111&catid=7
The Conditional Lucas & Kanade Algorithm
The Lucas & Kanade (LK) algorithm is the method of choice for efficient dense
image and object alignment. The approach is efficient as it attempts to model
the connection between appearance and geometric displacement through a linear
relationship that assumes independence across pixel coordinates. A drawback of
the approach, however, is its generative nature. Specifically, its performance
is tightly coupled with how well the linear model can synthesize appearance
from geometric displacement, even though the alignment task itself is
associated with the inverse problem. In this paper, we present a new approach,
referred to as the Conditional LK algorithm, which: (i) directly learns linear
models that predict geometric displacement as a function of appearance, and
(ii) employs a novel strategy for ensuring that the generative pixel
independence assumption can still be taken advantage of. We demonstrate that
our approach exhibits superior performance to classical generative forms of the
LK algorithm. Furthermore, we demonstrate its comparable performance to
state-of-the-art methods such as the Supervised Descent Method with
substantially less training examples, as well as the unique ability to "swap"
geometric warp functions without having to retrain from scratch. Finally, from
a theoretical perspective, our approach hints at possible redundancies that
exist in current state-of-the-art methods for alignment that could be leveraged
in vision systems of the future.Comment: 17 pages, 11 figure
A CNN cascade for landmark guided semantic part segmentation
This paper proposes a CNN cascade for semantic part segmentation guided by pose-specifc information encoded in terms of a set of landmarks (or keypoints). There is large amount of prior work on each of these tasks separately, yet, to the best of our knowledge, this is the first time in literature that the interplay between pose estimation and semantic part segmentation is investigated. To address this limitation of prior work, in this paper, we propose a CNN cascade of tasks that firstly performs landmark localisation and then uses this information as input for guiding semantic part segmentation. We applied our architecture to the problem of facial part segmentation and report large performance improvement over the standard unguided network on the most challenging face datasets. Testing code and models will be published online at http://cs.nott.ac.uk/~psxasj/
Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization
Image-based camera relocalization is an important problem in computer vision
and robotics. Recent works utilize convolutional neural networks (CNNs) to
regress for pixels in a query image their corresponding 3D world coordinates in
the scene. The final pose is then solved via a RANSAC-based optimization scheme
using the predicted coordinates. Usually, the CNN is trained with ground truth
scene coordinates, but it has also been shown that the network can discover 3D
scene geometry automatically by minimizing single-view reprojection loss.
However, due to the deficiencies of the reprojection loss, the network needs to
be carefully initialized. In this paper, we present a new angle-based
reprojection loss, which resolves the issues of the original reprojection loss.
With this new loss function, the network can be trained without careful
initialization, and the system achieves more accurate results. The new loss
also enables us to utilize available multi-view constraints, which further
improve performance.Comment: ECCV 2018 Workshop (Geometry Meets Deep Learning
- …