4,662 research outputs found

    FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

    Full text link
    Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvement of 12.6% on Multi-turn Shoes -- an extension of the single-turn Shoes dataset that we created in this work. Further analysis of the model in a real-world interactive setting demonstrates two important capabilities of our model -- memory retention across turns, and agnosticity to turn order for non-contradictory feedback. Finally, user study results show that images retrieved by FashionNTM were favored by 83.1% over other multi-turn models. Project page: https://sites.google.com/eng.ucsd.edu/fashionntmComment: Paper accepted at ICCV-202

    Interactive context-aware user-driven metadata correction in digital libraries

    Get PDF
    Personal name variants are a common problem in digital libraries, reducing the precision of searches and complicating browsing-based interaction. The book-centric approach of name authority control has not scaled to match the growth and diversity of digital repositories. In this paper, we present a novel system for user-driven integration of name variants when interacting with web-based information-in particular digital library-systems. We approach these issues via a client-side JavaScript browser extension that can reorganize web content and also integrate remote data sources. Designed to be agnostic towards the web sites it is applied to, we illustrate the developed proof-of-concept system through worked examples using three different digital libraries. We discuss the extensibility of the approach in the context of other user-driven information systems and the growth of the Semantic Web

    Real-time indoor assistive localization with mobile omnidirectional vision and cloud GPU acceleration

    Full text link
    In this paper we propose a real-time assistive localization approach to help blind and visually impaired people in navigating an indoor environment. The system consists of a mobile vision front end with a portable panoramic lens mounted on a smart phone, and a remote image feature-based database of the scene on a GPU-enabled server. Compact and elective omnidirectional image features are extracted and represented in the smart phone front end, and then transmitted to the server in the cloud. These features of a short video clip are used to search the database of the indoor environment via image-based indexing to find the location of the current view within the database, which is associated with floor plans of the environment. A median-filter-based multi-frame aggregation strategy is used for single path modeling, and a 2D multi-frame aggregation strategy based on the candidates’ distribution densities is used for multi-path environmental modeling to provide a final location estimation. To deal with the high computational cost in searching a large database for a realistic navigation application, data parallelism and task parallelism properties are identified in the database indexing process, and computation is accelerated by using multi-core CPUs and GPUs. User-friendly HCI particularly for the visually impaired is designed and implemented on an iPhone, which also supports system configurations and scene modeling for new environments. Experiments on a database of an eight-floor building are carried out to demonstrate the capacity of the proposed system, with real-time response (14 fps) and robust localization results

    Semantic image retrieval using relevance feedback and transaction logs

    Get PDF
    Due to the recent improvements in digital photography and storage capacity, storing large amounts of images has been made possible, and efficient means to retrieve images matching a user’s query are needed. Content-based Image Retrieval (CBIR) systems automatically extract image contents based on image features, i.e. color, texture, and shape. Relevance feedback methods are applied to CBIR to integrate users’ perceptions and reduce the gap between high-level image semantics and low-level image features. The precision of a CBIR system in retrieving semantically rich (complex) images is improved in this dissertation work by making advancements in three areas of a CBIR system: input, process, and output. The input of the system includes a mechanism that provides the user with required tools to build and modify her query through feedbacks. Users behavioral in CBIR environments are studied, and a new feedback methodology is presented to efficiently capture users’ image perceptions. The process element includes image learning and retrieval algorithms. A Long-term image retrieval algorithm (LTL), which learns image semantics from prior search results available in the system’s transaction history, is developed using Factor Analysis. Another algorithm, a short-term learner (STL) that captures user’s image perceptions based on image features and user’s feedbacks in the on-going transaction, is developed based on Linear Discriminant Analysis. Then, a mechanism is introduced to integrate these two algorithms to one retrieval procedure. Finally, a retrieval strategy that includes learning and searching phases is defined for arranging images in the output of the system. The developed relevance feedback methodology proved to reduce the effect of human subjectivity in providing feedbacks for complex images. Retrieval algorithms were applied to images with different degrees of complexity. LTL is efficient in extracting the semantics of complex images that have a history in the system. STL is suitable for query and images that can be effectively represented by their image features. Therefore, the performance of the system in retrieving images with visual and conceptual complexities was improved when both algorithms were applied simultaneously. Finally, the strategy of retrieval phases demonstrated promising results when the query complexity increases
    • 

    corecore