956 research outputs found
A location-aware embedding technique for accurate landmark recognition
The current state of the research in landmark recognition highlights the good
accuracy which can be achieved by embedding techniques, such as Fisher vector
and VLAD. All these techniques do not exploit spatial information, i.e.
consider all the features and the corresponding descriptors without embedding
their location in the image. This paper presents a new variant of the
well-known VLAD (Vector of Locally Aggregated Descriptors) embedding technique
which accounts, at a certain degree, for the location of features. The driving
motivation comes from the observation that, usually, the most interesting part
of an image (e.g., the landmark to be recognized) is almost at the center of
the image, while the features at the borders are irrelevant features which do
no depend on the landmark. The proposed variant, called locVLAD (location-aware
VLAD), computes the mean of the two global descriptors: the VLAD executed on
the entire original image, and the one computed on a cropped image which
removes a certain percentage of the image borders. This simple variant shows an
accuracy greater than the existing state-of-the-art approach. Experiments are
conducted on two public datasets (ZuBuD and Holidays) which are used both for
training and testing. Morever a more balanced version of ZuBuD is proposed.Comment: 6 pages, 5 figures, ICDSC 201
Unsupervised Object Discovery and Tracking in Video Collections
This paper addresses the problem of automatically localizing dominant objects
as spatio-temporal tubes in a noisy collection of videos with minimal or even
no supervision. We formulate the problem as a combination of two complementary
processes: discovery and tracking. The first one establishes correspondences
between prominent regions across videos, and the second one associates
successive similar object regions within the same video. Interestingly, our
algorithm also discovers the implicit topology of frames associated with
instances of the same object class across different videos, a role normally
left to supervisory information in the form of class labels in conventional
image and video understanding methods. Indeed, as demonstrated by our
experiments, our method can handle video collections featuring multiple object
classes, and substantially outperforms the state of the art in colocalization,
even though it tackles a broader problem with much less supervision
- …