Search CORE

749 research outputs found

Hierarchical Attention Network for Visually-aware Food Recommendation

Author: Chua Tat-Seng
Feng Chong
Feng Fuli
Gao Xiaoyan
Guan Xinyu
He Xiangnan
Huang Heyan
Ming Zhaoyan
Publication venue
Publication date: 06/01/2019
Field of study

Food recommender systems play an important role in assisting users to identify the desired food to eat. Deciding what food to eat is a complex and multi-faceted process, which is influenced by many factors such as the ingredients, appearance of the recipe, the user's personal preference on food, and various contexts like what had been eaten in the past meals. In this work, we formulate the food recommendation problem as predicting user preference on recipes based on three key factors that determine a user's choice on food, namely, 1) the user's (and other users') history; 2) the ingredients of a recipe; and 3) the descriptive image of a recipe. To address this challenging problem, we develop a dedicated neural network based solution Hierarchical Attention based Food Recommendation (HAFR) which is capable of: 1) capturing the collaborative filtering effect like what similar users tend to eat; 2) inferring a user's preference at the ingredient level; and 3) learning user preference from the recipe's visual images. To evaluate our proposed method, we construct a large-scale dataset consisting of millions of ratings from AllRecipes.com. Extensive experiments show that our method outperforms several competing recommender solutions like Factorization Machine and Visual Bayesian Personalized Ranking with an average improvement of 12%, offering promising results in predicting user preference for food. Codes and dataset will be released upon acceptance

arXiv.org e-Print Archive

ScholarBank@NUS

Secure and Robust Image Watermarking Scheme Using Homomorphic Transform, SVD and Arnold Transform in RDWT Domain

Author: Khare Priyank
Srivastava Vinay Kumar
Publication venue: 'VSB Technical University of Ostrava, Faculty of Electrical Engineering and Computer Sciences'
Publication date: 01/01/2019
Field of study

The main objective for a watermarking technique is to attain imperceptibility, robustness and security against various malicious attacks applied by illicit users. To fulfil these basic requirements for a scheme is a big issue of concern. So, in this paper, a new image watermarking method is proposed which utilizes properties of homomorphic transform, Redundant Discrete Wavelet Transform (RDWT), Arnold Transform (AT) along with Singular Value Decomposition (SVD) to attain these required properties. RDWT is performed on host image to achieve LL subband. This LL subband image is further decomposed into illumination and reflectance components by homomorphic transform. In order to strengthen security of proposed scheme, AT is used to scramble watermark. This scrambled watermark is embedded with Singular Values (SVs) of reflectance component which are obtained by applying SVD to it. Since reflectance component contains important features of image, therefore, embedding of watermark in this part provides excellent imperceptibility. Proposed scheme is comprehensively examined against different attacks like scaling, shearing etc. for its robustness. Comparative study with other prevailing algorithms clearly reveals superiority of proposed scheme in terms of robustness and imperceptibility

Directory of Open Access Journals

DSpace at VSB Technical University of Ostrava

Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

Author: Shao Ling
Wang Yang
Wu Lin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/10/2018
Field of study

In this paper, we propose a novel deep generative approach to cross-modal retrieval to learn hash functions in the absence of paired training samples through the cycle consistency loss. Our proposed approach employs adversarial training scheme to lean a couple of hash functions enabling translation between modalities while assuming the underlying semantic relationship. To induce the hash codes with semantics to the input-output pair, cycle consistency loss is further proposed upon the adversarial training to strengthen the correlations between inputs and corresponding outputs. Our approach is generative to learn hash functions such that the learned hash codes can maximally correlate each input-output correspondence, meanwhile can also regenerate the inputs so as to minimize the information loss. The learning to hash embedding is thus performed to jointly optimize the parameters of the hash functions across modalities as well as the associated generative models. Extensive experiments on a variety of large-scale cross-modal data sets demonstrate that our proposed method achieves better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1703.10593 by other author

arXiv.org e-Print Archive

University of Queensland eSpace

Human-machine cooperation in large-scale multimedia retrieval : a survey

Author: Grzegorzek Marcin
Indurkhya Bipin
Shirahama Kimiaki
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2015
Field of study

Large-Scale Multimedia Retrieval(LSMR) is the task to fast analyze a large amount of multimedia data like images or videos and accurately find the ones relevant to a certain semantic meaning. Although LSMR has been investigated for more than two decades in the fields of multimedia processing and computer vision, a more interdisciplinary approach is necessary to develop an LSMR system that is really meaningful for humans. To this end, this paper aims to stimulate attention to the LSMR problem from diverse research fields. By explaining basic terminologies in LSMR, we first survey several representative methods in chronological order. This reveals that due to prioritizing the generality and scalability for large-scale data, recent methods interpret semantic meanings with a completely different mechanism from humans, though such humanlike mechanisms were used in classical heuristic-based methods. Based on this, we discuss human-machine cooperation, which incorporates knowledge about human interpretation into LSMR without sacrificing the generality and scalability. In particular, we present three approaches to human-machine cooperation (cognitive, ontological, and adaptive), which are attributed to cognitive science, ontology engineering, and metacognition, respectively. We hope that this paper will create a bridge to enable researchers in different fields to communicate about the LSMR problem and lead to a ground-breaking next generation of LSMR systems

Jagiellonian Univeristy Repository

Skeleton-Based Gesture Recognition With Learnable Paths and Signature Features

Author: Cheng Jiale
Jin Lianwen
Li Chenyang
Li Yu
Ni Hao
Shi Dongzi
Zhang Xin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/09/2023
Field of study

For the skeleton-based gesture recognition, graph convolutional networks (GCNs) have achieved remarkable performance since the human skeleton is a natural graph. However, the biological structure might not be the crucial one for motion analysis. Also, spatial differential information like joint distance and angle between bones may be overlooked during the graph convolution. In this paper, we focus on obtaining meaningful joint groups and extracting their discriminative features by the path signature (PS) theory. Firstly, to characterize the constraints and dependencies of various joints, we propose three types of paths, i.e., spatial, temporal, and learnable path. Especially, a learnable path generation mechanism can group joints together that are not directly connected or far away, according to their kinematic characteristic. Secondly, to obtain informative and compact features, a deep integration of PS with few parameters are introduced. All the computational process is packed into two modules, i.e., spatial-temporal path signature module (ST-PSM) and learnable path signature module (L-PSM) for the convenience of utilization. They are plug-and-play modules available for any neural network like CNNs and GCNs to enhance the feature extraction ability. Extensive experiments have conducted on three mainstream datasets (ChaLearn 2013, ChaLearn 2016, and AUTSL). We achieved the state-of-the-art results with simpler framework and much smaller model size. By inserting our two modules into the several GCN-based networks, we can observe clear improvements demonstrating the great effectiveness of our proposed method

UCL Discovery

Human-Machine Cooperation in Large-Scale Multimedia Retrieval: A Survey

Author: Grzegorzek Marcin
Indurkhya Bipin
Shirahama Kimiaki
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2015
Field of study

Purdue E-Pubs

Jagiellonian Univeristy Repository