22,949 research outputs found
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
Hierarchical Attention Network for Visually-aware Food Recommendation
Food recommender systems play an important role in assisting users to
identify the desired food to eat. Deciding what food to eat is a complex and
multi-faceted process, which is influenced by many factors such as the
ingredients, appearance of the recipe, the user's personal preference on food,
and various contexts like what had been eaten in the past meals. In this work,
we formulate the food recommendation problem as predicting user preference on
recipes based on three key factors that determine a user's choice on food,
namely, 1) the user's (and other users') history; 2) the ingredients of a
recipe; and 3) the descriptive image of a recipe. To address this challenging
problem, we develop a dedicated neural network based solution Hierarchical
Attention based Food Recommendation (HAFR) which is capable of: 1) capturing
the collaborative filtering effect like what similar users tend to eat; 2)
inferring a user's preference at the ingredient level; and 3) learning user
preference from the recipe's visual images. To evaluate our proposed method, we
construct a large-scale dataset consisting of millions of ratings from
AllRecipes.com. Extensive experiments show that our method outperforms several
competing recommender solutions like Factorization Machine and Visual Bayesian
Personalized Ranking with an average improvement of 12%, offering promising
results in predicting user preference for food. Codes and dataset will be
released upon acceptance
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Object-based attention mechanism for color calibration of UAV remote sensing images in precision agriculture.
Color calibration is a critical step for unmanned aerial vehicle (UAV) remote sensing, especially in precision agriculture, which relies mainly on correlating color changes to specific quality attributes, e.g. plant health, disease, and pest stresses. In UAV remote sensing, the exemplar-based color transfer is popularly used for color calibration, where the automatic search for the semantic correspondences is the key to ensuring the color transfer accuracy. However, the existing attention mechanisms encounter difficulties in building the precise semantic correspondences between the reference image and the target one, in which the normalized cross correlation is often computed for feature reassembling. As a result, the color transfer accuracy is inevitably decreased by the disturbance from the semantically unrelated pixels, leading to semantic mismatch due to the absence of semantic correspondences. In this article, we proposed an unsupervised object-based attention mechanism (OBAM) to suppress the disturbance of the semantically unrelated pixels, along with a further introduced weight-adjusted Adaptive Instance Normalization (AdaIN) (WAA) method to tackle the challenges caused by the absence of semantic correspondences. By embedding the proposed modules into a photorealistic style transfer method with progressive stylization, the color transfer accuracy can be improved while better preserving the structural details. We evaluated our approach on the UAV data of different crop types including rice, beans, and cotton. Extensive experiments demonstrate that our proposed method outperforms several state-of-the-art methods. As our approach requires no annotated labels, it can be easily embedded into the off-the-shelf color transfer approaches. Relevant codes and configurations will be available at https://github.com/huanghsheng/object-based-attention-mechanis
- …