749 research outputs found
Hierarchical Attention Network for Visually-aware Food Recommendation
Food recommender systems play an important role in assisting users to
identify the desired food to eat. Deciding what food to eat is a complex and
multi-faceted process, which is influenced by many factors such as the
ingredients, appearance of the recipe, the user's personal preference on food,
and various contexts like what had been eaten in the past meals. In this work,
we formulate the food recommendation problem as predicting user preference on
recipes based on three key factors that determine a user's choice on food,
namely, 1) the user's (and other users') history; 2) the ingredients of a
recipe; and 3) the descriptive image of a recipe. To address this challenging
problem, we develop a dedicated neural network based solution Hierarchical
Attention based Food Recommendation (HAFR) which is capable of: 1) capturing
the collaborative filtering effect like what similar users tend to eat; 2)
inferring a user's preference at the ingredient level; and 3) learning user
preference from the recipe's visual images. To evaluate our proposed method, we
construct a large-scale dataset consisting of millions of ratings from
AllRecipes.com. Extensive experiments show that our method outperforms several
competing recommender solutions like Factorization Machine and Visual Bayesian
Personalized Ranking with an average improvement of 12%, offering promising
results in predicting user preference for food. Codes and dataset will be
released upon acceptance
Secure and Robust Image Watermarking Scheme Using Homomorphic Transform, SVD and Arnold Transform in RDWT Domain
The main objective for a watermarking technique is to attain imperceptibility, robustness and security against various malicious attacks applied by illicit users. To fulfil these basic requirements for a scheme is a big issue of concern. So, in this paper, a new image watermarking method is proposed which utilizes properties of homomorphic transform, Redundant Discrete Wavelet Transform (RDWT), Arnold Transform (AT) along with Singular Value Decomposition (SVD) to attain these required properties. RDWT is performed on host image to achieve LL subband. This LL subband image is further decomposed into illumination and reflectance components by homomorphic transform. In order to strengthen security of proposed scheme, AT is used to scramble watermark. This scrambled watermark is embedded with Singular Values (SVs) of reflectance component which are obtained by applying SVD to it. Since reflectance component contains important features of image, therefore, embedding of watermark in this part provides excellent imperceptibility. Proposed scheme is comprehensively examined against different attacks like scaling, shearing etc. for its robustness. Comparative study with other prevailing algorithms clearly reveals superiority of proposed scheme in terms of robustness and imperceptibility
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other author
Human-machine cooperation in large-scale multimedia retrieval : a survey
Large-Scale Multimedia Retrieval(LSMR) is the task to fast analyze a large amount of multimedia data like images or videos and accurately find the ones relevant to a certain semantic meaning. Although LSMR has been investigated for more than two decades in the fields of multimedia processing and computer vision, a more interdisciplinary approach is necessary to develop an LSMR system that is really meaningful for humans. To this end, this paper aims to stimulate attention to the LSMR problem from diverse research fields. By explaining basic terminologies in LSMR, we first survey several representative methods in chronological order. This reveals that due to prioritizing the generality and scalability for large-scale data, recent methods interpret semantic meanings with a completely different mechanism from humans, though such humanlike mechanisms were used in classical heuristic-based methods. Based on this, we discuss human-machine cooperation, which incorporates knowledge about human interpretation into LSMR without sacrificing the generality and scalability. In particular, we present three approaches to human-machine cooperation (cognitive, ontological, and adaptive), which are attributed to cognitive science, ontology engineering, and metacognition, respectively. We hope that this paper will create a bridge to enable researchers in different fields to communicate about the LSMR problem and lead to a ground-breaking next generation of LSMR systems
Skeleton-Based Gesture Recognition With Learnable Paths and Signature Features
For the skeleton-based gesture recognition, graph
convolutional networks (GCNs) have achieved remarkable performance since the human skeleton is a natural graph. However,
the biological structure might not be the crucial one for motion
analysis. Also, spatial differential information like joint distance
and angle between bones may be overlooked during the graph
convolution. In this paper, we focus on obtaining meaningful joint
groups and extracting their discriminative features by the path
signature (PS) theory. Firstly, to characterize the constraints and
dependencies of various joints, we propose three types of paths,
i.e., spatial, temporal, and learnable path. Especially, a learnable
path generation mechanism can group joints together that are not
directly connected or far away, according to their kinematic characteristic. Secondly, to obtain informative and compact features,
a deep integration of PS with few parameters are introduced.
All the computational process is packed into two modules, i.e.,
spatial-temporal path signature module (ST-PSM) and learnable
path signature module (L-PSM) for the convenience of utilization.
They are plug-and-play modules available for any neural network
like CNNs and GCNs to enhance the feature extraction ability.
Extensive experiments have conducted on three mainstream
datasets (ChaLearn 2013, ChaLearn 2016, and AUTSL). We
achieved the state-of-the-art results with simpler framework and
much smaller model size. By inserting our two modules into the
several GCN-based networks, we can observe clear improvements
demonstrating the great effectiveness of our proposed method
Human-Machine Cooperation in Large-Scale Multimedia Retrieval: A Survey
Large-Scale Multimedia Retrieval(LSMR) is the task to fast analyze a large amount of multimedia data like images or videos and accurately find the ones relevant to a certain semantic meaning. Although LSMR has been investigated for more than two decades in the fields of multimedia processing and computer vision, a more interdisciplinary approach is necessary to develop an LSMR system that is really meaningful for humans. To this end, this paper aims to stimulate attention to the LSMR problem from diverse research fields. By explaining basic terminologies in LSMR, we first survey several representative methods in chronological order. This reveals that due to prioritizing the generality and scalability for large-scale data, recent methods interpret semantic meanings with a completely different mechanism from humans, though such humanlike mechanisms were used in classical heuristic-based methods. Based on this, we discuss human-machine cooperation, which incorporates knowledge about human interpretation into LSMR without sacrificing the generality and scalability. In particular, we present three approaches to human-machine cooperation (cognitive, ontological, and adaptive), which are attributed to cognitive science, ontology engineering, and metacognition, respectively. We hope that this paper will create a bridge to enable researchers in different fields to communicate about the LSMR problem and lead to a ground-breaking next generation of LSMR systems
- …