21,341 research outputs found
Visual Search at eBay
In this paper, we propose a novel end-to-end approach for scalable visual
search infrastructure. We discuss the challenges we faced for a massive
volatile inventory like at eBay and present our solution to overcome those. We
harness the availability of large image collection of eBay listings and
state-of-the-art deep learning techniques to perform visual search at scale.
Supervised approach for optimized search limited to top predicted categories
and also for compact binary signature are key to scale up without compromising
accuracy and precision. Both use a common deep neural network requiring only a
single forward inference. The system architecture is presented with in-depth
discussions of its basic components and optimizations for a trade-off between
search relevance and latency. This solution is currently deployed in a
distributed cloud infrastructure and fuels visual search in eBay ShopBot and
Close5. We show benchmark on ImageNet dataset on which our approach is faster
and more accurate than several unsupervised baselines. We share our learnings
with the hope that visual search becomes a first class citizen for all large
scale search engines rather than an afterthought.Comment: To appear in 23rd SIGKDD Conference on Knowledge Discovery and Data
Mining (KDD), 2017. A demonstration video can be found at
https://youtu.be/iYtjs32vh4
Region-Based Image Retrieval Revisited
Region-based image retrieval (RBIR) technique is revisited. In early attempts
at RBIR in the late 90s, researchers found many ways to specify region-based
queries and spatial relationships; however, the way to characterize the
regions, such as by using color histograms, were very poor at that time. Here,
we revisit RBIR by incorporating semantic specification of objects and
intuitive specification of spatial relationships. Our contributions are the
following. First, to support multiple aspects of semantic object specification
(category, instance, and attribute), we propose a multitask CNN feature that
allows us to use deep learning technique and to jointly handle multi-aspect
object specification. Second, to help users specify spatial relationships among
objects in an intuitive way, we propose recommendation techniques of spatial
relationships. In particular, by mining the search results, a system can
recommend feasible spatial relationships among the objects. The system also can
recommend likely spatial relationships by assigned object category names based
on language prior. Moreover, object-level inverted indexing supports very fast
shortlist generation, and re-ranking based on spatial constraints provides
users with instant RBIR experiences.Comment: To appear in ACM Multimedia 2017 (Oral
Ranking News-Quality Multimedia
News editors need to find the photos that best illustrate a news piece and
fulfill news-media quality standards, while being pressed to also find the most
recent photos of live events. Recently, it became common to use social-media
content in the context of news media for its unique value in terms of immediacy
and quality. Consequently, the amount of images to be considered and filtered
through is now too much to be handled by a person. To aid the news editor in
this process, we propose a framework designed to deliver high-quality,
news-press type photos to the user. The framework, composed of two parts, is
based on a ranking algorithm tuned to rank professional media highly and a
visual SPAM detection module designed to filter-out low-quality media. The core
ranking algorithm is leveraged by aesthetic, social and deep-learning semantic
features. Evaluation showed that the proposed framework is effective at finding
high-quality photos (true-positive rate) achieving a retrieval MAP of 64.5% and
a classification precision of 70%.Comment: To appear in ICMR'1
Semantically Consistent Regularization for Zero-Shot Recognition
The role of semantics in zero-shot learning is considered. The effectiveness
of previous approaches is analyzed according to the form of supervision
provided. While some learn semantics independently, others only supervise the
semantic subspace explained by training classes. Thus, the former is able to
constrain the whole space but lacks the ability to model semantic correlations.
The latter addresses this issue but leaves part of the semantic space
unsupervised. This complementarity is exploited in a new convolutional neural
network (CNN) framework, which proposes the use of semantics as constraints for
recognition.Although a CNN trained for classification has no transfer ability,
this can be encouraged by learning an hidden semantic layer together with a
semantic code for classification. Two forms of semantic constraints are then
introduced. The first is a loss-based regularizer that introduces a
generalization constraint on each semantic predictor. The second is a codeword
regularizer that favors semantic-to-class mappings consistent with prior
semantic knowledge while allowing these to be learned from data. Significant
improvements over the state-of-the-art are achieved on several datasets.Comment: Accepted to CVPR 201
Fireground location understanding by semantic linking of visual objects and building information models
This paper presents an outline for improved localization and situational awareness in fire emergency situations based on semantic technology and computer vision techniques. The novelty of our methodology lies in the semantic linking of video object recognition results from visual and thermal cameras with Building Information Models (BIM). The current limitations and possibilities of certain building information streams in the context of fire safety or fire incident management are addressed in this paper. Furthermore, our data management tools match higher-level semantic metadata descriptors of BIM and deep-learning based visual object recognition and classification networks. Based on these matches, estimations can be generated of camera, objects and event positions in the BIM model, transforming it from a static source of information into a rich, dynamic data provider. Previous work has already investigated the possibilities to link BIM and low-cost point sensors for fireground understanding, but these approaches did not take into account the benefits of video analysis and recent developments in semantics and feature learning research. Finally, the strengths of the proposed approach compared to the state-of-the-art is its (semi -)automatic workflow, generic and modular setup and multi-modal strategy, which allows to automatically create situational awareness, to improve localization and to facilitate the overall fire understanding
Replication issues in syntax-based aspect extraction for opinion mining
Reproducing experiments is an important instrument to validate previous work
and build upon existing approaches. It has been tackled numerous times in
different areas of science. In this paper, we introduce an empirical
replicability study of three well-known algorithms for syntactic centric
aspect-based opinion mining. We show that reproducing results continues to be a
difficult endeavor, mainly due to the lack of details regarding preprocessing
and parameter setting, as well as due to the absence of available
implementations that clarify these details. We consider these are important
threats to validity of the research on the field, specifically when compared to
other problems in NLP where public datasets and code availability are critical
validity components. We conclude by encouraging code-based research, which we
think has a key role in helping researchers to understand the meaning of the
state-of-the-art better and to generate continuous advances.Comment: Accepted in the EACL 2017 SR
- …