3,149 research outputs found

    Context-aware hybrid classification system for fine-grained retail product recognition

    Get PDF
    We present a context-aware hybrid classification system for the problem of fine-grained product class recognition in computer vision. Recently, retail product recognition has become an interesting computer vision research topic. We focus on the classification of products on shelves in a store. This is a very challenging classification problem because many product classes are visually similar in terms of shape, color, texture, and metric size. In shelves, same or similar products are more likely to appear adjacent to each other and displayed in certain arrangements rather than at random. The arrangement of the products on the shelves has a spatial continuity both in brand and metric size. By using this context information, the co-occurrence of the products and the adjacency relations between the products can be statistically modeled. The proposed hybrid approach improves the accuracy of context-free image classifiers such as Support Vector Machines (SVMs), by combining them with a probabilistic graphical model such as Hidden Markov Models (HMMs) or Conditional Random Fields (CRFs). The fundamental goal of this paper is using contextual relationships in retail shelves to improve the classification accuracy by executing a context-aware approach

    Statistical methods for fine-grained retail product recognition

    Get PDF
    In recent years, computer vision has become a major instrument in automating retail processes with emerging smart applications such as shopper assistance, visual product search (e.g., Google Lens), no-checkout stores (e.g., Amazon Go), real-time inventory tracking, out-of-stock detection, and shelf execution. At the core of these applications lies the problem of product recognition, which poses a variety of new challenges in contrast to generic object recognition. Product recognition is a special instance of fine-grained classification. Considering the sheer diversity of packaged goods in a typical hypermarket, we are confronted with up to tens of thousands of classes, which, particularly if under the same product brand, tend to have only minute visual differences in shape, packaging texture, metric size, etc., making them very difficult to discriminate from one another. Another challenge is the limited number of available datasets, which either have only a few training examples per class that are taken under ideal studio conditions, hence requiring cross-dataset generalization, or are captured from the shelf in an actual retail environment and thus suffer from issues like blur, low resolution, occlusions, unexpected backgrounds, etc. Thus, an effective product classification system requires substantially more information in addition to the knowledge obtained from product images alone. In this thesis, we propose statistical methods for a fine-grained retail product recognition. In our first framework, we propose a novel context-aware hybrid classification system for the fine-grained retail product recognition problem. In the second framework, state-of-the-art convolutional neural networks are explored and adapted to fine-grained recognition of products. The third framework, which is the most significant contribution of this thesis, presents a new approach for fine-grained classification of retail products that learns and exploits statistical context information about likely product arrangements on shelves, incorporates visual hierarchies across brands, and returns recognition results as "confidence sets" that are guaranteed to contain the true class at a given confidence leve

    Retail product recognition with a graphical shelf model (Çizgisel raf modeli ile perakende ürün tanıma)

    Get PDF
    Recently, retail product recognition has become an interesting computer vision research topic. The classification of products on shelves is a very challenging classification problem because many product classes are visually similar in terms of shape, color, texture, and metric size. In shelves, same or similar products are more likely to appear adjacent to each other and displayed in certain arrangements rather than at random. The arrangement of the products on the shelves has a spatial continuity both in brand and metric size. By using this context information, the co-occurrence of the products and the adjacency relations between the products can be statistically modeled. In this work, we present a context-aware hybrid classification system for the problem of fine-grained product class recognition. The proposed hybrid approach improves the accuracy of the context-free image classifiers, by combining them with a probabilistic graphical model based on Hidden Markov Models. The fundamental goal of this paper is to use contextual relationships in retail shelves to improve accuracy of the product classifier

    Context-aware confidence sets for fine-grained product recognition

    Get PDF
    We present a new approach for fine-grained classification of retail products, which learns and exploits statistical context information about likely product arrangements on shelves, incorporates visual hierarchies across brands, and returns recognition results as “confidence sets” that are guaranteed to contain the true class at a given confidence level. Our system consists of three important components: 1) a nested hierarchy of product classes are automatically constructed based on visual similarities, 2) a confidence set predictor is trained based on class posteriors by using coarse-to-fine binary classifiers to discriminate each nested cluster of the hierarchy from the remainder of classes and a Bayesian network (BN) model that encodes the joint distribution of classifier scores with the fine-level class variable, and 3) n hidden Markov model (HMM) is trained with nested hidden states from the class hierarchy to model spatial transition across the nodes of product class hierarchy and resolve errors in the context-free confidence set results. Novel aspects of the proposed method include 1) combining confidence sets and context information via a HMM, 2) applying this concept to fine grained recognition of products arranged in retail shelves, and 3) presenting experimental results on four large datasets, collected from actual retail stores. We compare our approach with existing confidence set approaches and state-of-the-art convolutional neural networks classifiers including SENet-154, DenseNet-161, B-CNN, and Inception-Resnet-v2. Our approach performs comparably or better than state-of-the-art deep classifiers and exhibits high accuracy for relatively small confidence set sizes

    Hierarchical Attention Network for Action Segmentation

    Full text link
    The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the capacity to effectively map the temporal relationships in between the frames as they only capture a limited span of temporal dependencies. To this end we propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time, thus improving the overall segmentation performance. The proposed hierarchical recurrent attention framework analyses the input video at multiple temporal scales, to form embeddings at frame level and segment level, and perform fine-grained action segmentation. This generates a simple, lightweight, yet extremely effective architecture for segmenting continuous video streams and has multiple application domains. We evaluate our system on multiple challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets, and achieves state-of-the-art performance. The evaluated datasets encompass numerous video capture settings which are inclusive of static overhead camera views and dynamic, ego-centric head-mounted camera views, demonstrating the direct applicability of the proposed framework in a variety of settings.Comment: Published in Pattern Recognition Letter

    DeepStore: an interaction-aware Wide&Deep model for store site recommendation with attentional spatial embeddings

    Get PDF
    International audienceStore site recommendation is one of the essential business services in smart cities for brick-and-mortar enterprises. In recent years, the proliferation of multisource data in cities has fostered unprecedented opportunities to the data-driven store site recommendation, which aims at leveraging large-scale user-generated data to analyze and mine users' preferences for identifying the optimal location for a new store. However, most works in store site recommendation pay more attention to a single data source which lacks some significant data (e.g., consumption data and user profile data). In this paper, we aim to study the store site recommendation in a fine-grained manner. Specifically, we predict the consumption level of different users at the store based on multisource data, which can not only help the store placement but also benefit analyzing customer behavior in the store at different time periods. To solve this problem, we design a novel model based on the deep neural network, named DeepStore, which learns low-and high-order feature interactions explicitly and implicitly from dense and sparse features simultaneously. In particular, DeepStore incorporates three modules: 1) the cross network; 2) the deep network; and 3) the linear component. In addition, to learn the latent feature representation from multisource data, we propose two embedding methods for different types of data: 1) the filed embedding and 2) attention-based spatial embedding. Extensive experiments are conducted on a real-world dataset including store data, user data, and point-of-interest data, the results demonstrate that DeepStore outperforms the state-of-the-art models

    Fine-Grained Image Analysis with Deep Learning: A Survey

    Get PDF
    Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas -- fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.Comment: Accepted by IEEE TPAM
    corecore