2 research outputs found
End-to-end training of deep kernel map networks for image classification
Deep kernel map networks have shown excellent performances in various
classification problems including image annotation. Their general recipe
consists in aggregating several layers of singular value decompositions (SVDs)
-- that map data from input spaces into high dimensional spaces -- while
preserving the similarity of the underlying kernels. However, the potential of
these deep map networks has not been fully explored as the original setting of
these networks focuses mainly on the approximation quality of their kernels and
ignores their discrimination power. In this paper, we introduce a novel
"end-to-end" design for deep kernel map learning that balances the
approximation quality of kernels and their discrimination power. Our method
proceeds in two steps; first, layerwise SVD is applied in order to build
initial deep kernel map approximations and then an "end-to-end" supervised
learning is employed to further enhance their discrimination power while
maintaining their efficiency. Extensive experiments, conducted on the
challenging ImageCLEF annotation benchmark, show the high efficiency and the
out-performance of this two-step process with respect to different related
methods
The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval
In this paper, we describe in details VISIONE, a video search system that
allows users to search for videos using textual keywords, occurrence of objects
and their spatial relationships, occurrence of colors and their spatial
relationships, and image similarity. These modalities can be combined together
to express complex queries and satisfy user needs. The peculiarity of our
approach is that we encode all the information extracted from the keyframes,
such as visual deep features, tags, color and object locations, using a
convenient textual encoding indexed in a single text retrieval engine. This
offers great flexibility when results corresponding to various parts of the
query (visual, text and locations) have to be merged. In addition, we report an
extensive analysis of the system retrieval performance, using the query logs
generated during the Video Browser Showdown (VBS) 2019 competition. This
allowed us to fine-tune the system by choosing the optimal parameters and
strategies among the ones that we tested.Comment: 22 pages, 12 figure