7,620 research outputs found
A location-aware embedding technique for accurate landmark recognition
The current state of the research in landmark recognition highlights the good
accuracy which can be achieved by embedding techniques, such as Fisher vector
and VLAD. All these techniques do not exploit spatial information, i.e.
consider all the features and the corresponding descriptors without embedding
their location in the image. This paper presents a new variant of the
well-known VLAD (Vector of Locally Aggregated Descriptors) embedding technique
which accounts, at a certain degree, for the location of features. The driving
motivation comes from the observation that, usually, the most interesting part
of an image (e.g., the landmark to be recognized) is almost at the center of
the image, while the features at the borders are irrelevant features which do
no depend on the landmark. The proposed variant, called locVLAD (location-aware
VLAD), computes the mean of the two global descriptors: the VLAD executed on
the entire original image, and the one computed on a cropped image which
removes a certain percentage of the image borders. This simple variant shows an
accuracy greater than the existing state-of-the-art approach. Experiments are
conducted on two public datasets (ZuBuD and Holidays) which are used both for
training and testing. Morever a more balanced version of ZuBuD is proposed.Comment: 6 pages, 5 figures, ICDSC 201
An accurate retrieval through R-MAC+ descriptors for landmark recognition
The landmark recognition problem is far from being solved, but with the use
of features extracted from intermediate layers of Convolutional Neural Networks
(CNNs), excellent results have been obtained. In this work, we propose some
improvements on the creation of R-MAC descriptors in order to make the
newly-proposed R-MAC+ descriptors more representative than the previous ones.
However, the main contribution of this paper is a novel retrieval technique,
that exploits the fine representativeness of the MAC descriptors of the
database images. Using this descriptors called "db regions" during the
retrieval stage, the performance is greatly improved. The proposed method is
tested on different public datasets: Oxford5k, Paris6k and Holidays. It
outperforms the state-of-the- art results on Holidays and reached excellent
results on Oxford5k and Paris6k, overcame only by approaches based on
fine-tuning strategies
Efficient Nearest Neighbors Search for Large-Scale Landmark Recognition
The problem of landmark recognition has achieved excellent results in
small-scale datasets. When dealing with large-scale retrieval, issues that were
irrelevant with small amount of data, quickly become fundamental for an
efficient retrieval phase. In particular, computational time needs to be kept
as low as possible, whilst the retrieval accuracy has to be preserved as much
as possible. In this paper we propose a novel multi-index hashing method called
Bag of Indexes (BoI) for Approximate Nearest Neighbors (ANN) search. It allows
to drastically reduce the query time and outperforms the accuracy results
compared to the state-of-the-art methods for large-scale landmark recognition.
It has been demonstrated that this family of algorithms can be applied on
different embedding techniques like VLAD and R-MAC obtaining excellent results
in very short times on different public datasets: Holidays+Flickr1M, Oxford105k
and Paris106k
Detection and Rectification of Arbitrary Shaped Scene Texts by using Text Keypoints and Links
Detection and recognition of scene texts of arbitrary shapes remain a grand
challenge due to the super-rich text shape variation in text line orientations,
lengths, curvatures, etc. This paper presents a mask-guided multi-task network
that detects and rectifies scene texts of arbitrary shapes reliably. Three
types of keypoints are detected which specify the centre line and so the shape
of text instances accurately. In addition, four types of keypoint links are
detected of which the horizontal links associate the detected keypoints of each
text instance and the vertical links predict a pair of landmark points (for
each keypoint) along the upper and lower text boundary, respectively. Scene
texts can be located and rectified by linking up the associated landmark points
(giving localization polygon boxes) and transforming the polygon boxes via thin
plate spline, respectively. Extensive experiments over several public datasets
show that the use of text keypoints is tolerant to the variation in text
orientations, lengths, and curvatures, and it achieves superior scene text
detection and rectification performance as compared with state-of-the-art
methods
- …