2 research outputs found
DeepStyle: Multimodal Search Engine for Fashion and Interior Design
In this paper, we propose a multimodal search engine that combines visual and
textual cues to retrieve items from a multimedia database aesthetically similar
to the query. The goal of our engine is to enable intuitive retrieval of
fashion merchandise such as clothes or furniture. Existing search engines treat
textual input only as an additional source of information about the query image
and do not correspond to the real-life scenario where the user looks for 'the
same shirt but of denim'. Our novel method, dubbed DeepStyle, mitigates those
shortcomings by using a joint neural network architecture to model contextual
dependencies between features of different modalities. We prove the robustness
of this approach on two different challenging datasets of fashion items and
furniture where our DeepStyle engine outperforms baseline methods by 18-21% on
the tested datasets. Our search engine is commercially deployed and available
through a Web-based application.Comment: Copyright held by IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
Visual Localization for Autonomous Driving: Mapping the Accurate Location in the City Maze
Accurate localization is a foundational capacity, required for autonomous
vehicles to accomplish other tasks such as navigation or path planning. It is a
common practice for vehicles to use GPS to acquire location information.
However, the application of GPS can result in severe challenges when vehicles
run within the inner city where different kinds of structures may shadow the
GPS signal and lead to inaccurate location results. To address the localization
challenges of urban settings, we propose a novel feature voting technique for
visual localization. Different from the conventional front-view-based method,
our approach employs views from three directions (front, left, and right) and
thus significantly improves the robustness of location prediction. In our work,
we craft the proposed feature voting method into three state-of-the-art visual
localization networks and modify their architectures properly so that they can
be applied for vehicular operation. Extensive field test results indicate that
our approach can predict location robustly even in challenging inner-city
settings. Our research sheds light on using the visual localization approach to
help autonomous vehicles to find accurate location information in a city maze,
within a desirable time constraint