21 research outputs found

    Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval

    Full text link
    Free-hand sketch-based image retrieval (SBIR) is a specific cross-view retrieval task, in which queries are abstract and ambiguous sketches while the retrieval database is formed with natural images. Work in this area mainly focuses on extracting representative and shared features for sketches and natural images. However, these can neither cope well with the geometric distortion between sketches and images nor be feasible for large-scale SBIR due to the heavy continuous-valued distance computation. In this paper, we speed up SBIR by introducing a novel binary coding method, named \textbf{Deep Sketch Hashing} (DSH), where a semi-heterogeneous deep architecture is proposed and incorporated into an end-to-end binary coding framework. Specifically, three convolutional neural networks are utilized to encode free-hand sketches, natural images and, especially, the auxiliary sketch-tokens which are adopted as bridges to mitigate the sketch-image geometric distortion. The learned DSH codes can effectively capture the cross-view similarities as well as the intrinsic semantic correlations between different categories. To the best of our knowledge, DSH is the first hashing work specifically designed for category-level SBIR with an end-to-end deep architecture. The proposed DSH is comprehensively evaluated on two large-scale datasets of TU-Berlin Extension and Sketchy, and the experiments consistently show DSH's superior SBIR accuracies over several state-of-the-art methods, while achieving significantly reduced retrieval time and memory footprint.Comment: This paper will appear as a spotlight paper in CVPR201

    Mobile Interface for Content-Based Image Management

    Get PDF
    People make more and more use of digital image acquisition devices to capture screenshots of their everyday life. The growing number of personal pictures raise the problem of their classification. Some of the authors proposed an automatic technique for personal photo album management dealing with multiple aspects (i.e., people, time and background) in a homogenous way. In this paper we discuss a solution that allows mobile users to remotely access such technique by means of their mobile phones, almost from everywhere, in a pervasive fashion. This allows users to classify pictures they store on their devices. The whole solution is presented, with particular regard to the user interface implemented on the mobile phone, along with some experimental results

    Penerapan Deskriptor Warna Dominan untuk Temu Kembali Citra Busana pada Peranti Bergerak

    Get PDF
    Nowadays, clothes with various designs and color combinations are available for purchasing through an online shop, which is mostly equipped with keyword-based item retrieval. Here, the object in the online database is retrieved based on the keyword inputted by the potential buyers. The keyword-based search may bring potential customers on difficulties to describe the clothes they want to buy. This paper presents a new searching approach, using an image instead of text, as the query into an online shop. This method is known as content-based image retrieval (CBIR).  Particularly, we focused on using color as the feature in our Muslimah clothes image retrieval. The dominant color descriptor (DCD) extracts the wardrobe's color. Then, image matching is accomplished by calculating the Euclidean distance between the query and image in the database, and the last step is to evaluate the performance of the DWD by calculating precision and recall. To determine the performance of the DCD in extracting color features, the DCD is compared with another color descriptor, that is dominant color correlogram descriptor (DCCD). The values of precision and recall of DCD ranged from 0.7 to 0.9 while the precision and recall of DCCD ranged from 0.7 to 0.8. These results showed that the DCD produce a superior performance compared to DCCD in retrieving a set of clothing image, either plain or patterned colored clothes.Nowadays, clothes with various designs and color combinations are available for purchasing through an online shop, which is mostly equipped with keyword-based item retrieval. Here, the object in the online database is retrieved based on the keyword inputted by the potential buyers. The keyword-based search may bring potential customers on difficulties to describe the clothes they want to buy. This paper presents a new searching approach, using an image instead of text, as the query into an online shop. This method is known as content-based image retrieval (CBIR).  Particularly, we focused on using color as the feature in our Muslimah clothes image retrieval. The dominant color descriptor (DCD) extracts the wardrobe's color. Then, image matching is accomplished by calculating the Euclidean distance between the query and image in the database, and the last step is to evaluate the performance of the DCD by calculating precision and recall. To determine the performance of the DCD in extracting color features, the DCD is compared with another color descriptor, that is dominant color correlogram descriptor (DCCD). The values of precision and recall of DCD ranged from 0.7 to 0.9 while the precision and recall of DCCD ranged from 0.7 to 0.8. These results showed that the DCD produces a superior performance compared to DCCD in retrieving a set of clothing image, either plain or patterned colored clothes

    Penerapan Deskriptor Warna Dominan untuk Temu Kembali Citra Busana pada Peranti Bergerak

    Full text link

    Browse-to-search

    Full text link
    This demonstration presents a novel interactive online shopping application based on visual search technologies. When users want to buy something on a shopping site, they usually have the requirement of looking for related information from other web sites. Therefore users need to switch between the web page being browsed and other websites that provide search results. The proposed application enables users to naturally search products of interest when they browse a web page, and make their even causal purchase intent easily satisfied. The interactive shopping experience is characterized by: 1) in session - it allows users to specify the purchase intent in the browsing session, instead of leaving the current page and navigating to other websites; 2) in context - -the browsed web page provides implicit context information which helps infer user purchase preferences; 3) in focus - users easily specify their search interest using gesture on touch devices and do not need to formulate queries in search box; 4) natural-gesture inputs and visual-based search provides users a natural shopping experience. The system is evaluated against a data set consisting of several millions commercial product images. © 2012 Authors

    Listen, Look, and Gotcha: Instant Video Search with Mobile Phones by Layered Audio-Video Indexing *

    Get PDF
    ABSTRACT Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video content while on the move. In this paper, we have developed an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching. The system is able to index large-scale video data using a new layered audio-video indexing approach in the cloud, as well as extract light-weight joint audio-video signatures in real time and perform progressive search on mobile devices. Unlike most existing mobile video search applications that simply send the original video query to the cloud, the proposed mobile system is one of the first attempts at instant and progressive video search leveraging the light-weight computing capacity of mobile devices. The system is characterized by four unique properties: 1) a joint audio-video signature to deal with the large aural and visual variances associated with the query video captured by the mobile phone, 2) layered audio-video indexing to holistically exploit the complementary nature of audio and video signals, 3) light-weight fingerprinting to comply with mobile processing capacity, and 4) a progressive query process to significantly reduce computational costs and improve the user experience-the search process can stop anytime once a confident result is achieved. We have collected 1,400 query videos captured by 25 mobile users from a dataset of 600 hours of video. The experiments show that our system outperforms state-of-the-art methods by achieving 90.79% precision when the query video is less than 10 seconds and 70.07% even when the query video is less than 5 seconds. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. The search process can stop anytime once a confident search result is achieved. Thus, the user does not need to wait for a fixed time lag. The proposed system is characterized by its unique features such as layered audio-video indexing, as well as instant and progressive search. Categories and Subject Descriptor

    Image Labeling and Classification by Semantic Tag Analysis

    Get PDF
    Image classification and retrieval plays a significant role in dealing with large multimedia data on the Internet. Social networks, image sharing websites and mobile application require categorizing multimedia items for more efficient search and storage. Therefore, image classification and retrieval methods gained a great importance for researchers and companies. Image classification can be performed in a supervised and semi-supervised manner and in order to categorize an unknown image, a statistical model created using pre-labeled samples is fed with the numerical representation of the visual features of images. A supervised approach requires a set of labeled data to create a statistical model, and subsequently classify an unlabeled test set. However, labeling images manually requires a great deal of time and effort. Therefore, a major research activity has gravitated to wards finding efficient methods to reduce the time and effort for image labeling. Most images on social websites have associated tags that somewhat describe their content. These tags may provide significant content descriptors if a semantic bridge can be established between image content and tags. In this thesis, we focus on cases where accurate class labels are scarce or even absent while some associated tags are only present. The goal is to analyze and utilize available tags to categorize database images to form a training dataset over which a dedicated classifier is trained and then used for image classification. Our framework contains a semantic text analysis tool based on WordNet to measure the semantic relatedness between the associated image tags and predefined class labels, and a novel method for labeling the corresponding images. The classifier is trained using only low-level visual image features. The experimental results using 7 classes from MirFlickr dataset demonstrate that semantically analyzing tags attached to images significantly improves the image classification accuracy by providing additional training data
    corecore